[ 
https://issues.apache.org/jira/browse/SPARK-53527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-53527:
-----------------------------------

    Assignee: Szehon Ho

> Improve fallback of analyzeExistenceDefaultValue
> ------------------------------------------------
>
>                 Key: SPARK-53527
>                 URL: https://issues.apache.org/jira/browse/SPARK-53527
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.1
>            Reporter: Szehon Ho
>            Assignee: Szehon Ho
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> https://issues.apache.org/jira/browse/SPARK-51119  skips analysis for 
> EXISTS_DEFAULT.  In most case, it works because EXISTS_DEFAULT column 
> metadata is supposed to be resolved.
>  
> But there's some known bugs where it is persisted un-resolved.  For example, 
> something like 'current_database, current_user, current_timestamp' , these 
> are non-deterministic and will bring wrong results in EXISTS_DEFAULT, where 
> user expects the value resolved when they set the default.
>  
> There is fallback in https://issues.apache.org/jira/browse/SPARK-51119  to 
> handle corrupt EXISTS_DEFAULT by running full analysis, but it miss some 
> case.  In this case one where there are nested function calls.
>  
> Example: EXISTS_DEFAULT has some nested function call like :
> {code:java}
>  CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0')){code}
>  
>  
> the current code `Literal.fromSQL(defaultSQL)` will throw the exception 
> before getting to the fallback:
> {code:java}
> Caused by: java.lang.AssertionError: assertion failed: function arguments 
> must be resolved.
> at scala.Predef$.assert(Predef.scala:279)
> at 
> org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.$anonfun$expressionBuilder$1(FunctionRegistry.scala:1278)
> at 
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:251)
> at 
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:245)
> at 
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:317)
> at 
> org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:325)
> at 
> org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:317)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:586)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:586)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:579)
> at scala.collection.immutable.List.map(List.scala:251)
> at scala.collection.immutable.List.map(List.scala:79)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:768)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:579)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:556)
> at 
> org.apache.spark.sql.catalyst.expressions.Literal$.fromSQL(literals.scala:317)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.analyzeExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:393)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:529)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$getExistenceDefaultValues$1(ResolveDefaultColumnsUtil.scala:524)
> at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValues(ResolveDefaultColumnsUtil.scala:524)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$existenceDefaultValues$2(ResolveDefaultColumnsUtil.scala:594)
> at scala.Option.getOrElse(Option.scala:201)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.existenceDefaultValues(ResolveDefaultColumnsUtil.scala:592)
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to