[ https://issues.apache.org/jira/browse/SPARK-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114873#comment-14114873 ]
Burak Yavuz commented on SPARK-3280: ------------------------------------ I don't have as detailed a comparison like Josh has, but for MLlib algorithms, sort based shuffle didn't show the performance boosts Josh has shown. 16 m3.2xlarge instances were used for these experiments. The difference here is that the number of partitions I used were 128. Much less than the number of partitions Josh has shown. !hash-sort-comp.png! > Made sort-based shuffle the default implementation > -------------------------------------------------- > > Key: SPARK-3280 > URL: https://issues.apache.org/jira/browse/SPARK-3280 > Project: Spark > Issue Type: Improvement > Reporter: Reynold Xin > Assignee: Reynold Xin > Attachments: hash-sort-comp.png > > > sort-based shuffle has lower memory usage and seems to outperform hash-based > in almost all of our testing. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org