[ https://issues.apache.org/jira/browse/SPARK-38853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520345#comment-17520345 ]
Yuming Wang commented on SPARK-38853: ------------------------------------- Config: {noformat} spark.master yarn spark.driver.maxResultSize 4g spark.driver.memory 20g spark.executor.cores 5 spark.executor.instances 200 spark.executor.memory 15g spark.sql.adaptive.coalescePartitions.initialPartitionNum 10000 spark.sql.adaptive.coalescePartitions.minPartitionNum 200 spark.sql.adaptive.advisoryPartitionSizeInBytes 100m {noformat} > optimizeSkewsInRebalancePartitions has performance issue > -------------------------------------------------------- > > Key: SPARK-38853 > URL: https://issues.apache.org/jira/browse/SPARK-38853 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0 > Reporter: Yuming Wang > Priority: Major > Attachments: Disable.png, enable.png > > > How to reproduce this issue: > {code:sql} > CREATE TABLE t USING PARQUET > AS > SELECT > /*+ REBALANCE */ > A.SESSION_START_DT > , COALESCE(A.SITE_ID,0) AS SITE_ID > , A.GUID > , COALESCE(CAST(A.SESSION_SKEY AS BIGINT),0) AS SESSION_SKEY > , COALESCE(CAST(A.SEQNUM AS INT),0) AS SEQNUM > > , COALESCE(A.IMP_PAGE_ID,0) AS IMP_PAGE_ID > , COALESCE(A.PLACEMENT_ID,0) AS PLACEMENT_ID > , A.PRODUCT_LINE_CODE > , A.ALGORITHM_ID > , A.MEID > , A.ALGO_OUTPUT_ITEMS > , A.CLICKS > , A.GMV_7D > FROM big_partition_table A > WHERE > DT BETWEEN DATE_FORMAT(DATE_SUB(CURRENT_DATE,11), 'yyyyMMdd') AND > DATE_FORMAT(DATE_ADD(DATE_SUB(CURRENT_DATE,11),0), 'yyyyMMdd') > AND TO_DATE(from_unixtime(unix_timestamp(A.SESSION_START_DT, > 'yyyy/MM/dd'))) = DATE_SUB(CURRENT_DATE,11) > AND ICFBOT = '00'; > {code} > Enabling optimizeSkewsInRebalancePartitions takes more than 2 hours and the > driver hangs: > !enable.png! > Disabling optimizeSkewsInRebalancePartitions takes only 29 minutes: > !Disable.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org