[ 
https://issues.apache.org/jira/browse/SPARK-46995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziqi Liu updated SPARK-46995:
-----------------------------
    Component/s: SQL

> Allow AQE coalesce final stage in SQL cached plan
> -------------------------------------------------
>
>                 Key: SPARK-46995
>                 URL: https://issues.apache.org/jira/browse/SPARK-46995
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>    Affects Versions: 4.0.0
>            Reporter: Ziqi Liu
>            Priority: Major
>
> [https://github.com/apache/spark/pull/43435] and 
> [https://github.com/apache/spark/pull/43760] are fixing a correctness issue 
> which will be triggered when AQE applied on cached query plan, specifically, 
> when AQE coalescing the final result stage of the cached plan.
>  
> The current semantic of 
> {{spark.sql.optimizer.canChangeCachedPlanOutputPartitioning}}
> ([source 
> code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L403-L411]):
>  * when true, we enable AQE, but disable coalescing final stage 
> ({*}default{*})
>  * when false, we disable AQE
>  
> But let’s revisit the semantic of this config: actually for caller the only 
> thing that matters is whether we change the output partitioning of the cached 
> plan. And we should only try to apply AQE if possible.  Thus we want to 
> modify the semantic of 
> {{spark.sql.optimizer.canChangeCachedPlanOutputPartitioning}}
>  * when true, we enable AQE and allow coalescing final: this might lead to 
> perf regression, because it introduce extra shuffle
>  * when false, we enable AQE, but disable coalescing final stage. *(this is 
> actually the `true` semantic of old behavior)*
> Also, to keep the default behavior unchanged, we might want to flip the 
> default value of 
> {{spark.sql.optimizer.canChangeCachedPlanOutputPartitioning}} to `false`
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to