Alexey Kudinkin created SPARK-38512:
---------------------------------------

             Summary: ResolveFunctions implemented incorrectly requiring 
multiple passes to Resolve Nested Expressions 
                 Key: SPARK-38512
                 URL: https://issues.apache.org/jira/browse/SPARK-38512
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.2.1, 3.2.0
            Reporter: Alexey Kudinkin


ResolveFunctions Rule is implemented incorrectly requiring multiple passes to 
Resolve Nested Expressions:

While Plan object is traversed correctly in post-order (bottoms-up, 
`plan.resolveOperatorsUpWithPruning), internally, Plan children though are 
traversed incorrectly in pre-order (top-down, using 
`transformExpressionsWithPruning`):

 
{code:java}
case q: LogicalPlan =>
  q.transformExpressionsWithPruning(...) { ... } {code}
 

Traversing in pre-order means that attempt is taken to resolve the current 
node, before its children are resolved, which is incorrect, since the node 
itself could not be resolved before its children are.

While this is not leading to failures yet, this is taxing on performance – most 
of the expressions in Spark should be able to be resolved in a *single pass* 
(if resolved bottoms-up, take reproducible sample at the bottom). Instead, it 
currently takes Spark at least *N*  iterations to resolve such expressions, 
where N is proportional to the depth of the Expression tree.

 

Example to reproduce: 

 
{code:java}
def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: StructType): 
Expression = {
  val expr = spark.sessionState.sqlParser.parseExpression(exprStr)
  val analyzer = spark.sessionState.analyzer
  val schemaFields = tableSchema.fields

  val resolvedExpr = {
    val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, 
schemaFields.drop(1): _*))
    val rules: Seq[Rule[LogicalPlan]] = {
      analyzer.ResolveFunctions ::
      analyzer.ResolveReferences ::
      Nil
    }

    rules.foldRight(plan)((rule, plan) => rule.apply(plan))
      .asInstanceOf[Filter]
      .condition
  }
  resolvedExpr
}

// Invoke with
resolveExpr(spark, "date_format(to_timestamp(B, 'yyyy-MM-dd'), 'MM/dd/yyyy')", 
StructType(StructField("B", StringType))){code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to