Szehon Ho created SPARK-53527:
---------------------------------
Summary: Improve fallback of analyzeExistenceDefaultValue
Key: SPARK-53527
URL: https://issues.apache.org/jira/browse/SPARK-53527
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.0.1
Reporter: Szehon Ho
Fix For: 4.1.0
https://issues.apache.org/jira/browse/SPARK-51119 skips analysis for
EXISTS_DEFAULT. In most case, it works because EXISTS_DEFAULT column metadata
is supposed to be resolved.
But there's some known bugs where it is persisted un-resolved. For example,
something like 'current_database, current_user, current_timestamp' , these are
non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user
expects the value resolved when they set the default.
There is fallback in https://issues.apache.org/jira/browse/SPARK-51119 to
handle corrupt EXISTS_DEFAULT, but it miss some case. In this case one where
there are nested function calls.
Example: EXISTS_DEFAULT is :
{code:java}
CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0')){code}
the current code `Literal.fromSQL(defaultSQL)` will throw the exception before
getting to the fallback:
{code:java}
Caused by: java.lang.AssertionError: assertion failed: function arguments must
be resolved.
at scala.Predef$.assert(Predef.scala:279)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.$anonfun$expressionBuilder$1(FunctionRegistry.scala:1278)
at
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:251)
at
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:245)
at
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:317)
at
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:325)
at
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:317)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:586)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:586)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:579)
at scala.collection.immutable.List.map(List.scala:251)
at scala.collection.immutable.List.map(List.scala:79)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:768)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:579)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:556)
at
org.apache.spark.sql.catalyst.expressions.Literal$.fromSQL(literals.scala:317)
at
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.analyzeExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:393)
at
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:529)
at
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$getExistenceDefaultValues$1(ResolveDefaultColumnsUtil.scala:524)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
at
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValues(ResolveDefaultColumnsUtil.scala:524)
at
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$existenceDefaultValues$2(ResolveDefaultColumnsUtil.scala:594)
at scala.Option.getOrElse(Option.scala:201)
at
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.existenceDefaultValues(ResolveDefaultColumnsUtil.scala:592)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]