Szehon Ho created SPARK-53527:
---------------------------------

             Summary: Improve fallback of analyzeExistenceDefaultValue
                 Key: SPARK-53527
                 URL: https://issues.apache.org/jira/browse/SPARK-53527
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.0.1
            Reporter: Szehon Ho
             Fix For: 4.1.0


https://issues.apache.org/jira/browse/SPARK-51119  skips analysis for 
EXISTS_DEFAULT.  In most case, it works because EXISTS_DEFAULT column metadata 
is supposed to be resolved.
 
But there's some known bugs where it is persisted un-resolved.  For example, 
something like 'current_database, current_user, current_timestamp' , these are 
non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user 
expects the value resolved when they set the default.
 
There is fallback in https://issues.apache.org/jira/browse/SPARK-51119  to 
handle corrupt EXISTS_DEFAULT, but it miss some case.  In this case one where 
there are nested function calls.
 
Example: EXISTS_DEFAULT is :
{code:java}
 CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0')){code}
 
 
the current code `Literal.fromSQL(defaultSQL)` will throw the exception before 
getting to the fallback:
{code:java}
Caused by: java.lang.AssertionError: assertion failed: function arguments must 
be resolved.
at scala.Predef$.assert(Predef.scala:279)
at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.$anonfun$expressionBuilder$1(FunctionRegistry.scala:1278)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:251)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:245)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:317)
at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:325)
at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:317)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:586)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:586)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:579)
at scala.collection.immutable.List.map(List.scala:251)
at scala.collection.immutable.List.map(List.scala:79)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:768)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:579)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:556)
at 
org.apache.spark.sql.catalyst.expressions.Literal$.fromSQL(literals.scala:317)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.analyzeExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:393)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:529)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$getExistenceDefaultValues$1(ResolveDefaultColumnsUtil.scala:524)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValues(ResolveDefaultColumnsUtil.scala:524)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$existenceDefaultValues$2(ResolveDefaultColumnsUtil.scala:594)
at scala.Option.getOrElse(Option.scala:201)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.existenceDefaultValues(ResolveDefaultColumnsUtil.scala:592)
 {code}
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to