[ https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matei Zaharia updated SPARK-2045: --------------------------------- Attachment: (was: Sort-basedshuffledesign.pdf) > Sort-based shuffle implementation > --------------------------------- > > Key: SPARK-2045 > URL: https://issues.apache.org/jira/browse/SPARK-2045 > Project: Spark > Issue Type: New Feature > Reporter: Matei Zaharia > > Building on the pluggability in SPARK-2044, a sort-based shuffle > implementation that takes advantage of an Ordering for keys (or just sorts by > hashcode for keys that don't have it) would likely improve performance and > memory usage in very large shuffles. Our current hash-based shuffle needs an > open file for each reduce task, which can fill up a lot of memory for > compression buffers and cause inefficient IO. This would avoid both of those > issues. -- This message was sent by Atlassian JIRA (v6.2#6252)