[jira] [Assigned] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions

Apache Spark (Jira) Thu, 10 Mar 2022 14:09:07 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-38512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-38512:
------------------------------------

    Assignee: Apache Spark

> ResolveFunctions implemented incorrectly requiring multiple passes to Resolve 
> Nested Expressions 
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-38512
>                 URL: https://issues.apache.org/jira/browse/SPARK-38512
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.0, 3.2.1
>            Reporter: Alexey Kudinkin
>            Assignee: Apache Spark
>            Priority: Critical
>
> ResolveFunctions Rule is implemented incorrectly requiring multiple passes to 
> Resolve Nested Expressions:
> While Plan object is traversed correctly in post-order (bottoms-up, 
> `plan.resolveOperatorsUpWithPruning), internally, Plan children though are 
> traversed incorrectly in pre-order (top-down, using 
> `transformExpressionsWithPruning`):
>  
> {code:java}
> case q: LogicalPlan =>
>   q.transformExpressionsWithPruning(...) { ... } {code}
>  
> Traversing in pre-order means that attempt is taken to resolve the current 
> node, before its children are resolved, which is incorrect, since the node 
> itself could not be resolved before its children are.
> While this is not leading to failures yet, this is taxing on performance – 
> most of the expressions in Spark should be able to be resolved in a *single 
> pass* (if resolved bottoms-up, take reproducible sample at the bottom). 
> Instead, it currently takes Spark at least *N*  iterations to resolve such 
> expressions, where N is proportional to the depth of the Expression tree.
>  
> Example to reproduce: 
>  
> {code:java}
> def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: 
> StructType): Expression = {
>   val expr = spark.sessionState.sqlParser.parseExpression(exprStr)
>   val analyzer = spark.sessionState.analyzer
>   val schemaFields = tableSchema.fields
>   val resolvedExpr = {
>     val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, 
> schemaFields.drop(1): _*))
>     val rules: Seq[Rule[LogicalPlan]] = {
>       analyzer.ResolveFunctions ::
>       analyzer.ResolveReferences ::
>       Nil
>     }
>     rules.foldRight(plan)((rule, plan) => rule.apply(plan))
>       .asInstanceOf[Filter]
>       .condition
>   }
>   resolvedExpr
> }
> // Invoke with
> resolveExpr(spark, "date_format(to_timestamp(B, 'yyyy-MM-dd'), 
> 'MM/dd/yyyy')", StructType(StructField("B", StringType))){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions

Reply via email to