[jira] [Commented] (SPARK-2045) Sort-based shuffle implementation

Patrick Wendell (JIRA) Thu, 11 Sep 2014 23:21:12 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131158#comment-14131158
 ]


Patrick Wendell commented on SPARK-2045:
----------------------------------------

Yes - that's correct.

> Sort-based shuffle implementation
> ---------------------------------
>
>                 Key: SPARK-2045
>                 URL: https://issues.apache.org/jira/browse/SPARK-2045
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle, Spark Core
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>             Fix For: 1.1.0
>
>         Attachments: Sort-basedshuffledesign.pdf
>
>
> Building on the pluggability in SPARK-2044, a sort-based shuffle 
> implementation that takes advantage of an Ordering for keys (or just sorts by 
> hashcode for keys that don't have it) would likely improve performance and 
> memory usage in very large shuffles. Our current hash-based shuffle needs an 
> open file for each reduce task, which can fill up a lot of memory for 
> compression buffers and cause inefficient IO. This would avoid both of those 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2045) Sort-based shuffle implementation

Reply via email to