[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119080#comment-14119080 ]
Sandy Ryza commented on SPARK-2978: ----------------------------------- What's the thinking behind adding sortWithinPartition? It shouldn't be difficult to add, but I can't think of a situation where it would be useful without a repartition before. > Provide an MR-style shuffle transformation > ------------------------------------------ > > Key: SPARK-2978 > URL: https://issues.apache.org/jira/browse/SPARK-2978 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Reporter: Sandy Ryza > > For Hive on Spark joins in particular, and for running legacy MR code in > general, I think it would be useful to provide a transformation with the > semantics of the Hadoop MR shuffle, i.e. one that > * groups by key: provides (Key, Iterator[Value]) > * within each partition, provides keys in sorted order > A couple ways that could make sense to expose this: > * Add a new operator. "groupAndSortByKey", > "groupByKeyAndSortWithinPartition", "hadoopStyleShuffle", maybe? > * Allow groupByKey to take an ordering param for keys within a partition -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org