[ https://issues.apache.org/jira/browse/SPARK-30072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997368#comment-16997368 ]
Wenchen Fan commented on SPARK-30072: ------------------------------------- > The nested subquery "SELECT max(df2.k) FROM df1 JOIN df2 ON df1.k = df2.k AND > df2.id < 2" will be run in another QueryExecution This is true, but we create `AdaptiveSparkPlanExec` for both the main query and all subqueries in `InsertAdaptiveSparkPlan`. That said, we have the `isSubquery` info when creating `AdaptiveSparkPlanExec`. > Create dedicated planner for subqueries > --------------------------------------- > > Key: SPARK-30072 > URL: https://issues.apache.org/jira/browse/SPARK-30072 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Ali Afroozeh > Assignee: Ali Afroozeh > Priority: Minor > Fix For: 3.0.0 > > > This PR changes subquery planning by calling the planner and plan preparation > rules on the subquery plan directly. Before we were creating a QueryExecution > instance for subqueries to get the executedPlan. This would re-run analysis > and optimization on the subqueries plan. Running the analysis again on an > optimized query plan can have unwanted consequences, as some rules, for > example DecimalPrecision, are not idempotent. > As an example, consider the expression 1.7 * avg(a) which after applying the > DecimalPrecision rule becomes: > promote_precision(1.7) * promote_precision(avg(a)) > After the optimization, more specifically the constant folding rule, this > expression becomes: > 1.7 * promote_precision(avg(a)) > Now if we run the analyzer on this optimized query again, we will get: > promote_precision(1.7) * promote_precision(promote_precision(avg(a))) > Which will later optimized as: > 1.7 * promote_precision(promote_precision(avg(a))) > As can be seen, re-running the analysis and optimization on this expression > results in an expression with extra nested promote_preceision nodes. Adding > unneeded nodes to the plan is problematic because it can eliminate situations > where we can reuse the plan. > We opted to introduce dedicated planners for subuqueries, instead of making > the DecimalPrecision rule idempotent, because this eliminates this entire > category of problems. Another benefit is that planning time for subqueries is > reduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org