[ https://issues.apache.org/jira/browse/SPARK-46995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ziqi Liu updated SPARK-46995: ----------------------------- Component/s: SQL > Allow AQE coalesce final stage in SQL cached plan > ------------------------------------------------- > > Key: SPARK-46995 > URL: https://issues.apache.org/jira/browse/SPARK-46995 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL > Affects Versions: 4.0.0 > Reporter: Ziqi Liu > Priority: Major > > [https://github.com/apache/spark/pull/43435] and > [https://github.com/apache/spark/pull/43760] are fixing a correctness issue > which will be triggered when AQE applied on cached query plan, specifically, > when AQE coalescing the final result stage of the cached plan. > > The current semantic of > {{spark.sql.optimizer.canChangeCachedPlanOutputPartitioning}} > ([source > code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L403-L411]): > * when true, we enable AQE, but disable coalescing final stage > ({*}default{*}) > * when false, we disable AQE > > But let’s revisit the semantic of this config: actually for caller the only > thing that matters is whether we change the output partitioning of the cached > plan. And we should only try to apply AQE if possible. Thus we want to > modify the semantic of > {{spark.sql.optimizer.canChangeCachedPlanOutputPartitioning}} > * when true, we enable AQE and allow coalescing final: this might lead to > perf regression, because it introduce extra shuffle > * when false, we enable AQE, but disable coalescing final stage. *(this is > actually the `true` semantic of old behavior)* > Also, to keep the default behavior unchanged, we might want to flip the > default value of > {{spark.sql.optimizer.canChangeCachedPlanOutputPartitioning}} to `false` > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org