[ https://issues.apache.org/jira/browse/SPARK-38512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-38512: ------------------------------------ Assignee: Apache Spark > ResolveFunctions implemented incorrectly requiring multiple passes to Resolve > Nested Expressions > ------------------------------------------------------------------------------------------------- > > Key: SPARK-38512 > URL: https://issues.apache.org/jira/browse/SPARK-38512 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.2.0, 3.2.1 > Reporter: Alexey Kudinkin > Assignee: Apache Spark > Priority: Critical > > ResolveFunctions Rule is implemented incorrectly requiring multiple passes to > Resolve Nested Expressions: > While Plan object is traversed correctly in post-order (bottoms-up, > `plan.resolveOperatorsUpWithPruning), internally, Plan children though are > traversed incorrectly in pre-order (top-down, using > `transformExpressionsWithPruning`): > > {code:java} > case q: LogicalPlan => > q.transformExpressionsWithPruning(...) { ... } {code} > > Traversing in pre-order means that attempt is taken to resolve the current > node, before its children are resolved, which is incorrect, since the node > itself could not be resolved before its children are. > While this is not leading to failures yet, this is taxing on performance – > most of the expressions in Spark should be able to be resolved in a *single > pass* (if resolved bottoms-up, take reproducible sample at the bottom). > Instead, it currently takes Spark at least *N* iterations to resolve such > expressions, where N is proportional to the depth of the Expression tree. > > Example to reproduce: > > {code:java} > def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: > StructType): Expression = { > val expr = spark.sessionState.sqlParser.parseExpression(exprStr) > val analyzer = spark.sessionState.analyzer > val schemaFields = tableSchema.fields > val resolvedExpr = { > val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, > schemaFields.drop(1): _*)) > val rules: Seq[Rule[LogicalPlan]] = { > analyzer.ResolveFunctions :: > analyzer.ResolveReferences :: > Nil > } > rules.foldRight(plan)((rule, plan) => rule.apply(plan)) > .asInstanceOf[Filter] > .condition > } > resolvedExpr > } > // Invoke with > resolveExpr(spark, "date_format(to_timestamp(B, 'yyyy-MM-dd'), > 'MM/dd/yyyy')", StructType(StructField("B", StringType))){code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org