[jira] [Commented] (SPARK-30072) Create dedicated planner for subqueries

Wenchen Fan (Jira) Mon, 16 Dec 2019 07:03:29 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-30072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997368#comment-16997368
 ]


Wenchen Fan commented on SPARK-30072:
-------------------------------------

> The nested subquery "SELECT max(df2.k) FROM df1 JOIN df2 ON df1.k = df2.k AND 
> df2.id < 2" will be run in another QueryExecution

This is true, but we create `AdaptiveSparkPlanExec` for both the main query and 
all subqueries in `InsertAdaptiveSparkPlan`. That said, we have the 
`isSubquery` info when creating `AdaptiveSparkPlanExec`.

> Create dedicated planner for subqueries
> ---------------------------------------
>
>                 Key: SPARK-30072
>                 URL: https://issues.apache.org/jira/browse/SPARK-30072
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Ali Afroozeh
>            Assignee: Ali Afroozeh
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> This PR changes subquery planning by calling the planner and plan preparation 
> rules on the subquery plan directly. Before we were creating a QueryExecution 
> instance for subqueries to get the executedPlan. This would re-run analysis 
> and optimization on the subqueries plan. Running the analysis again on an 
> optimized query plan can have unwanted consequences, as some rules, for 
> example DecimalPrecision, are not idempotent.
> As an example, consider the expression 1.7 * avg(a) which after applying the 
> DecimalPrecision rule becomes:
> promote_precision(1.7) * promote_precision(avg(a))
> After the optimization, more specifically the constant folding rule, this 
> expression becomes:
> 1.7 * promote_precision(avg(a))
> Now if we run the analyzer on this optimized query again, we will get:
> promote_precision(1.7) * promote_precision(promote_precision(avg(a)))
> Which will later optimized as:
> 1.7 * promote_precision(promote_precision(avg(a)))
> As can be seen, re-running the analysis and optimization on this expression 
> results in an expression with extra nested promote_preceision nodes. Adding 
> unneeded nodes to the plan is problematic because it can eliminate situations 
> where we can reuse the plan.
> We opted to introduce dedicated planners for subuqueries, instead of making 
> the DecimalPrecision rule idempotent, because this eliminates this entire 
> category of problems. Another benefit is that planning time for subqueries is 
> reduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30072) Create dedicated planner for subqueries

Reply via email to