Alexey Kudinkin created SPARK-38512: ---------------------------------------
Summary: ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions Key: SPARK-38512 URL: https://issues.apache.org/jira/browse/SPARK-38512 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.1, 3.2.0 Reporter: Alexey Kudinkin ResolveFunctions Rule is implemented incorrectly requiring multiple passes to Resolve Nested Expressions: While Plan object is traversed correctly in post-order (bottoms-up, `plan.resolveOperatorsUpWithPruning), internally, Plan children though are traversed incorrectly in pre-order (top-down, using `transformExpressionsWithPruning`): {code:java} case q: LogicalPlan => q.transformExpressionsWithPruning(...) { ... } {code} Traversing in pre-order means that attempt is taken to resolve the current node, before its children are resolved, which is incorrect, since the node itself could not be resolved before its children are. While this is not leading to failures yet, this is taxing on performance – most of the expressions in Spark should be able to be resolved in a *single pass* (if resolved bottoms-up, take reproducible sample at the bottom). Instead, it currently takes Spark at least *N* iterations to resolve such expressions, where N is proportional to the depth of the Expression tree. Example to reproduce: {code:java} def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: StructType): Expression = { val expr = spark.sessionState.sqlParser.parseExpression(exprStr) val analyzer = spark.sessionState.analyzer val schemaFields = tableSchema.fields val resolvedExpr = { val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, schemaFields.drop(1): _*)) val rules: Seq[Rule[LogicalPlan]] = { analyzer.ResolveFunctions :: analyzer.ResolveReferences :: Nil } rules.foldRight(plan)((rule, plan) => rule.apply(plan)) .asInstanceOf[Filter] .condition } resolvedExpr } // Invoke with resolveExpr(spark, "date_format(to_timestamp(B, 'yyyy-MM-dd'), 'MM/dd/yyyy')", StructType(StructField("B", StringType))){code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org