[jira] [Created] (SPARK-32184) Performance regression on TPCH Q18

Yuan Zhou (Jira) Mon, 06 Jul 2020 00:05:07 -0700

Yuan Zhou created SPARK-32184:
---------------------------------

             Summary: Performance regression on TPCH Q18
                 Key: SPARK-32184
                 URL: https://issues.apache.org/jira/browse/SPARK-32184
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
         Environment: spark 2.4 and spark 3.0 are using the same configurations
 * spark.driver.memory 20g
 * spark.executor.memory 20g
 * spark.executor.cores 7
 * spark.executor.memoryOverhead 3g
 * spark.sql.shuffle.partitions 384
            Reporter: Yuan Zhou



Hi Spark developers, 

Testing with the new Spark 3.0.0 here and found some performance regression on 
TPCH Q18. Spark 2.4 seems can "reuse" the HashAgg results in two SMJ, while 
Spark 3.0.0 needs to calculate this results twice. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32184) Performance regression on TPCH Q18

Reply via email to