[
https://issues.apache.org/jira/browse/SPARK-50257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
guihuawen updated SPARK-50257:
------------------------------
Description:
【sql】
{code:java}
// code placeholder
SELECT
/*+ SHUFFLE_MERGE(b) */
s_date,
sum(s_quantity * i_price) AS total_sales
FROM
sales a
JOIN items b ON s_item_id = i_item_id
WHERE
i_price < 10
GROUP BY
s_date with rollup;
{code}
Set spark.sql.shuffle.partitions=1000
After aqe:
!截屏2024-11-07 13.52.45.png|width=444,height=431!
The parallel reads in the ExpandExecut phase have been adjusted to 71, reducing
parallelism. The ExpandExecut phase can lead to data expansion, and a decrease
in parallelism can result in longer task execution times.
If AGE is turned off as a whole, AQE optimization cannot be enjoyed in other
stages. If it is found that ExpandExec is included in the current stage,
partition merging will not be performed for this issue.
was:
【sql】
{code:java}
// code placeholder
SELECT
/*+ SHUFFLE_MERGE(b) */
s_date,
sum(s_quantity * i_price) AS total_sales
FROM
sales a
JOIN items b ON s_item_id = i_item_id
WHERE
i_price < 10
GROUP BY
s_date with rollup;
{code}
Set spark.sql.shuffle.partitions=1000
After aqe:
!截屏2024-11-07 13.52.45.png!
> [Core]If ExpandExec is included, the CoalesceShufflePartitions rule will not
> be adjusted during the AQE phase
> -------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-50257
> URL: https://issues.apache.org/jira/browse/SPARK-50257
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 4.0.0
> Reporter: guihuawen
> Priority: Major
> Fix For: 4.0.0
>
> Attachments: 截屏2024-11-07 13.52.45.png
>
>
> 【sql】
> {code:java}
> // code placeholder
> SELECT
> /*+ SHUFFLE_MERGE(b) */
> s_date,
> sum(s_quantity * i_price) AS total_sales
> FROM
> sales a
> JOIN items b ON s_item_id = i_item_id
> WHERE
> i_price < 10
> GROUP BY
> s_date with rollup;
> {code}
> Set spark.sql.shuffle.partitions=1000
> After aqe:
> !截屏2024-11-07 13.52.45.png|width=444,height=431!
> The parallel reads in the ExpandExecut phase have been adjusted to 71,
> reducing parallelism. The ExpandExecut phase can lead to data expansion, and
> a decrease in parallelism can result in longer task execution times.
> If AGE is turned off as a whole, AQE optimization cannot be enjoyed in other
> stages. If it is found that ExpandExec is included in the current stage,
> partition merging will not be performed for this issue.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]