[ 
https://issues.apache.org/jira/browse/SPARK-38853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520345#comment-17520345
 ] 

Yuming Wang commented on SPARK-38853:
-------------------------------------

Config:
{noformat}
spark.master                                    yarn
spark.driver.maxResultSize                      4g
spark.driver.memory                             20g
spark.executor.cores                            5
spark.executor.instances                        200
spark.executor.memory                           15g

spark.sql.adaptive.coalescePartitions.initialPartitionNum  10000
spark.sql.adaptive.coalescePartitions.minPartitionNum  200
spark.sql.adaptive.advisoryPartitionSizeInBytes  100m
{noformat}

> optimizeSkewsInRebalancePartitions has performance issue
> --------------------------------------------------------
>
>                 Key: SPARK-38853
>                 URL: https://issues.apache.org/jira/browse/SPARK-38853
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Yuming Wang
>            Priority: Major
>         Attachments: Disable.png, enable.png
>
>
> How to reproduce this issue:
> {code:sql}
> CREATE TABLE t USING PARQUET
> AS
> SELECT
>     /*+ REBALANCE */
>       A.SESSION_START_DT
>       , COALESCE(A.SITE_ID,0) AS SITE_ID
>       , A.GUID
>       , COALESCE(CAST(A.SESSION_SKEY AS BIGINT),0) AS SESSION_SKEY
>       , COALESCE(CAST(A.SEQNUM AS INT),0) AS SEQNUM
>       
>       , COALESCE(A.IMP_PAGE_ID,0) AS IMP_PAGE_ID
>       , COALESCE(A.PLACEMENT_ID,0) AS PLACEMENT_ID
>       , A.PRODUCT_LINE_CODE
>       , A.ALGORITHM_ID
>       , A.MEID
>       , A.ALGO_OUTPUT_ITEMS
>       , A.CLICKS
>       , A.GMV_7D
> FROM big_partition_table A
> WHERE
>       DT BETWEEN DATE_FORMAT(DATE_SUB(CURRENT_DATE,11), 'yyyyMMdd') AND 
> DATE_FORMAT(DATE_ADD(DATE_SUB(CURRENT_DATE,11),0), 'yyyyMMdd')
>       AND TO_DATE(from_unixtime(unix_timestamp(A.SESSION_START_DT, 
> 'yyyy/MM/dd'))) = DATE_SUB(CURRENT_DATE,11)
>       AND ICFBOT = '00';
> {code}
> Enabling optimizeSkewsInRebalancePartitions takes more than 2 hours and the 
> driver hangs:
>  !enable.png! 
> Disabling optimizeSkewsInRebalancePartitions takes only 29 minutes:
>  !Disable.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to