[jira] [Updated] (SPARK-53527) Improve fallback of analyzeExistenceDefaultValue

Szehon Ho (Jira) Sat, 20 Sep 2025 18:19:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-53527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Szehon Ho updated SPARK-53527:
------------------------------
    Description: 
https://issues.apache.org/jira/browse/SPARK-51119  skips analysis for 
EXISTS_DEFAULT.  In most case, it works because EXISTS_DEFAULT column metadata 
is supposed to be resolved.
 
But there's some known bugs where it is persisted un-resolved.  For example, 
something like 'current_database, current_user, current_timestamp' , these are 
non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user 
expects the value resolved when they set the default.
 
There is fallback in https://issues.apache.org/jira/browse/SPARK-51119  to 
handle corrupt EXISTS_DEFAULT by running full analysis, but it miss some case.  
In this case one where there are nested function calls.
 
Example: EXISTS_DEFAULT is :
{code:java}
 CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0')){code}
 
 
the current code `Literal.fromSQL(defaultSQL)` will throw the exception before 
getting to the fallback:
{code:java}
Caused by: java.lang.AssertionError: assertion failed: function arguments must 
be resolved.
at scala.Predef$.assert(Predef.scala:279)
at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.$anonfun$expressionBuilder$1(FunctionRegistry.scala:1278)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:251)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:245)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:317)
at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:325)
at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:317)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:586)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:586)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:579)
at scala.collection.immutable.List.map(List.scala:251)
at scala.collection.immutable.List.map(List.scala:79)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:768)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:579)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:556)
at 
org.apache.spark.sql.catalyst.expressions.Literal$.fromSQL(literals.scala:317)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.analyzeExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:393)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:529)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$getExistenceDefaultValues$1(ResolveDefaultColumnsUtil.scala:524)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValues(ResolveDefaultColumnsUtil.scala:524)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$existenceDefaultValues$2(ResolveDefaultColumnsUtil.scala:594)
at scala.Option.getOrElse(Option.scala:201)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.existenceDefaultValues(ResolveDefaultColumnsUtil.scala:592)
 {code}
 
 

  was:
https://issues.apache.org/jira/browse/SPARK-51119  skips analysis for 
EXISTS_DEFAULT.  In most case, it works because EXISTS_DEFAULT column metadata 
is supposed to be resolved.
 
But there's some known bugs where it is persisted un-resolved.  For example, 
something like 'current_database, current_user, current_timestamp' , these are 
non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user 
expects the value resolved when they set the default.
 
There is fallback in https://issues.apache.org/jira/browse/SPARK-51119  to 
handle corrupt EXISTS_DEFAULT, but it miss some case.  In this case one where 
there are nested function calls.
 
Example: EXISTS_DEFAULT is :
{code:java}
 CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0')){code}
 
 
the current code `Literal.fromSQL(defaultSQL)` will throw the exception before 
getting to the fallback:
{code:java}
Caused by: java.lang.AssertionError: assertion failed: function arguments must 
be resolved.
at scala.Predef$.assert(Predef.scala:279)
at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.$anonfun$expressionBuilder$1(FunctionRegistry.scala:1278)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:251)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:245)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:317)
at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:325)
at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:317)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:586)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:586)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:579)
at scala.collection.immutable.List.map(List.scala:251)
at scala.collection.immutable.List.map(List.scala:79)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:768)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:579)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:556)
at 
org.apache.spark.sql.catalyst.expressions.Literal$.fromSQL(literals.scala:317)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.analyzeExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:393)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:529)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$getExistenceDefaultValues$1(ResolveDefaultColumnsUtil.scala:524)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValues(ResolveDefaultColumnsUtil.scala:524)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$existenceDefaultValues$2(ResolveDefaultColumnsUtil.scala:594)
at scala.Option.getOrElse(Option.scala:201)
at 
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.existenceDefaultValues(ResolveDefaultColumnsUtil.scala:592)
 {code}
 
 


> Improve fallback of analyzeExistenceDefaultValue
> ------------------------------------------------
>
>                 Key: SPARK-53527
>                 URL: https://issues.apache.org/jira/browse/SPARK-53527
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.1
>            Reporter: Szehon Ho
>            Priority: Major
>             Fix For: 4.1.0
>
>
> https://issues.apache.org/jira/browse/SPARK-51119  skips analysis for 
> EXISTS_DEFAULT.  In most case, it works because EXISTS_DEFAULT column 
> metadata is supposed to be resolved.
>  
> But there's some known bugs where it is persisted un-resolved.  For example, 
> something like 'current_database, current_user, current_timestamp' , these 
> are non-deterministic and will bring wrong results in EXISTS_DEFAULT, where 
> user expects the value resolved when they set the default.
>  
> There is fallback in https://issues.apache.org/jira/browse/SPARK-51119  to 
> handle corrupt EXISTS_DEFAULT by running full analysis, but it miss some 
> case.  In this case one where there are nested function calls.
>  
> Example: EXISTS_DEFAULT is :
> {code:java}
>  CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0')){code}
>  
>  
> the current code `Literal.fromSQL(defaultSQL)` will throw the exception 
> before getting to the fallback:
> {code:java}
> Caused by: java.lang.AssertionError: assertion failed: function arguments 
> must be resolved.
> at scala.Predef$.assert(Predef.scala:279)
> at 
> org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.$anonfun$expressionBuilder$1(FunctionRegistry.scala:1278)
> at 
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:251)
> at 
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:245)
> at 
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:317)
> at 
> org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:325)
> at 
> org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:317)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:586)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:586)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:579)
> at scala.collection.immutable.List.map(List.scala:251)
> at scala.collection.immutable.List.map(List.scala:79)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:768)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:579)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:556)
> at 
> org.apache.spark.sql.catalyst.expressions.Literal$.fromSQL(literals.scala:317)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.analyzeExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:393)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:529)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$getExistenceDefaultValues$1(ResolveDefaultColumnsUtil.scala:524)
> at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValues(ResolveDefaultColumnsUtil.scala:524)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$existenceDefaultValues$2(ResolveDefaultColumnsUtil.scala:594)
> at scala.Option.getOrElse(Option.scala:201)
> at 
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.existenceDefaultValues(ResolveDefaultColumnsUtil.scala:592)
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-53527) Improve fallback of analyzeExistenceDefaultValue

Reply via email to