[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977427#comment-14977427 ]
swetha k commented on SPARK-3655: --------------------------------- [~koert] Does this use a custom partitioner to make sure that all the values pertaining to a key are placed in a particular node? Right now my code does something like the following and it seems to cause a lot of shuffling. I need to be able to group by sessionId and then sort by timeStamp in a tuple. What is the appropriate method for that? def getGrpdAndSrtdRecs(rdd: RDD[(String, (Long, String))]): RDD[(String, List[(Long, String)])] = { val grpdRecs = rdd.groupByKey(); val srtdRecs = grpdRecs.mapValues[(List[(Long, String)])](iter => iter.toList.sortBy(_._1)) srtdRecs } > Support sorting of values in addition to keys (i.e. secondary sort) > ------------------------------------------------------------------- > > Key: SPARK-3655 > URL: https://issues.apache.org/jira/browse/SPARK-3655 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 1.1.0, 1.2.0 > Reporter: koert kuipers > Assignee: Koert Kuipers > > Now that spark has a sort based shuffle, can we expect a secondary sort soon? > There are some use cases where getting a sorted iterator of values per key is > helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org