[ https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131158#comment-14131158 ]
Patrick Wendell commented on SPARK-2045: ---------------------------------------- Yes - that's correct. > Sort-based shuffle implementation > --------------------------------- > > Key: SPARK-2045 > URL: https://issues.apache.org/jira/browse/SPARK-2045 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core > Reporter: Matei Zaharia > Assignee: Matei Zaharia > Fix For: 1.1.0 > > Attachments: Sort-basedshuffledesign.pdf > > > Building on the pluggability in SPARK-2044, a sort-based shuffle > implementation that takes advantage of an Ordering for keys (or just sorts by > hashcode for keys that don't have it) would likely improve performance and > memory usage in very large shuffles. Our current hash-based shuffle needs an > open file for each reduce task, which can fill up a lot of memory for > compression buffers and cause inefficient IO. This would avoid both of those > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org