[ 
https://issues.apache.org/jira/browse/SPARK-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114873#comment-14114873
 ] 

Burak Yavuz commented on SPARK-3280:
------------------------------------

I don't have as detailed a comparison like Josh has, but for MLlib algorithms, 
sort based shuffle didn't show the performance boosts Josh has shown. 16 
m3.2xlarge instances were used for these experiments. The difference here is 
that the number of partitions I used were 128. Much less than the number of 
partitions Josh has shown.

!hash-sort-comp.png!

> Made sort-based shuffle the default implementation
> --------------------------------------------------
>
>                 Key: SPARK-3280
>                 URL: https://issues.apache.org/jira/browse/SPARK-3280
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>         Attachments: hash-sort-comp.png
>
>
> sort-based shuffle has lower memory usage and seems to outperform hash-based 
> in almost all of our testing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to