[ https://issues.apache.org/jira/browse/SPARK-32184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuan Zhou updated SPARK-32184: ------------------------------ Description: Hi Spark developers, Testing with the new Spark 3.0.0 here and found some performance regression on TPCH Q18. Spark 2.4 seems can "reuse" the HashAgg results in two SMJ, while Spark 3.0.0 needs to calculate this results twice. Here's the SQL diagram for 2.4 !2.4.png! Here's the diagram for 3.0 !3.0.png! was: Hi Spark developers, Testing with the new Spark 3.0.0 here and found some performance regression on TPCH Q18. Spark 2.4 seems can "reuse" the HashAgg results in two SMJ, while Spark 3.0.0 needs to calculate this results twice. > Performance regression on TPCH Q18 > ---------------------------------- > > Key: SPARK-32184 > URL: https://issues.apache.org/jira/browse/SPARK-32184 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Environment: spark 2.4 and spark 3.0 are using the same configurations > * spark.driver.memory 20g > * spark.executor.memory 20g > * spark.executor.cores 7 > * spark.executor.memoryOverhead 3g > * spark.sql.shuffle.partitions 384 > Reporter: Yuan Zhou > Priority: Major > Attachments: 2.4.png, 3.0.png > > > Hi Spark developers, > Testing with the new Spark 3.0.0 here and found some performance regression > on TPCH Q18. Spark 2.4 seems can "reuse" the HashAgg results in two SMJ, > while Spark 3.0.0 needs to calculate this results twice. > Here's the SQL diagram for 2.4 !2.4.png! > Here's the diagram for 3.0 > !3.0.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org