[jira] [Updated] (SPARK-32184) Performance regression on TPCH Q18

Yuan Zhou (Jira) Mon, 06 Jul 2020 00:08:06 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-32184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yuan Zhou updated SPARK-32184:
------------------------------
    Description: 
Hi Spark developers,

Testing with the new Spark 3.0.0 here and found some performance regression on 
TPCH Q18. Spark 2.4 seems can "reuse" the HashAgg results in two SMJ, while 
Spark 3.0.0 needs to calculate this results twice.

Here's the SQL diagram for 2.4 !2.4.png!

Here's the diagram for 3.0

!3.0.png!

 

  was:
Hi Spark developers, 

Testing with the new Spark 3.0.0 here and found some performance regression on 
TPCH Q18. Spark 2.4 seems can "reuse" the HashAgg results in two SMJ, while 
Spark 3.0.0 needs to calculate this results twice. 


> Performance regression on TPCH Q18
> ----------------------------------
>
>                 Key: SPARK-32184
>                 URL: https://issues.apache.org/jira/browse/SPARK-32184
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>         Environment: spark 2.4 and spark 3.0 are using the same configurations
>  * spark.driver.memory 20g
>  * spark.executor.memory 20g
>  * spark.executor.cores 7
>  * spark.executor.memoryOverhead 3g
>  * spark.sql.shuffle.partitions 384
>            Reporter: Yuan Zhou
>            Priority: Major
>         Attachments: 2.4.png, 3.0.png
>
>
> Hi Spark developers,
> Testing with the new Spark 3.0.0 here and found some performance regression 
> on TPCH Q18. Spark 2.4 seems can "reuse" the HashAgg results in two SMJ, 
> while Spark 3.0.0 needs to calculate this results twice.
> Here's the SQL diagram for 2.4 !2.4.png!
> Here's the diagram for 3.0
> !3.0.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32184) Performance regression on TPCH Q18

Reply via email to