[jira] [Commented] (SPARK-3655) Secondary sort

Matei Zaharia (JIRA) Mon, 20 Oct 2014 18:56:14 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177824#comment-14177824
 ]


Matei Zaharia commented on SPARK-3655:
--------------------------------------

I believe you can build this on top of sortByKey with mapPartitions. The values 
for each key are guaranteed to go to the same node (though we should document 
that). Or are you looking to partition the keys by one function and have the 
values sorted by another? In that case we added this weird 
repartitionAndSortWithinPartitions function to OrderedRDDFunctions that would 
do the trick (it was added to make it easier to port apps from MapReduce).

> Secondary sort
> --------------
>
>                 Key: SPARK-3655
>                 URL: https://issues.apache.org/jira/browse/SPARK-3655
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: koert kuipers
>            Priority: Minor
>
> Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
> There are some use cases where getting a sorted iterator of values per key is 
> helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3655) Secondary sort

Reply via email to