GitHub user koertkuipers reopened a pull request: https://github.com/apache/spark/pull/3632
SPARK-3655 GroupByKeyAndSortValues See https://issues.apache.org/jira/browse/SPARK-3655 This pullreq is based on the approach that uses repartitionAndSortWithinPartition, but only implements GroupByKeyAndSortValues and not foldLeft. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tresata/spark feat-group-by-key-and-sort-values Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3632.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3632 ---- commit 7e3cde989ec93849d60988e6d9fae729ca0c46a4 Author: Koert Kuipers <ko...@tresata.com> Date: 2014-12-07T20:16:53Z works but Iterables in signature are not right commit 42075338a32c40e4b962b547dbc74aad89351207 Author: Koert Kuipers <ko...@tresata.com> Date: 2014-12-07T21:57:25Z change groupByKeyAndSortValues to return RDD[(K, TraversableOnce[V]) instead of RDD[(K, Iterable[V]). i dont think the Iterable version can be implemented efficiently commit 4f7defe86c514f3d153feaed804cf77f1d402f63 Author: Koert Kuipers <ko...@tresata.com> Date: 2014-12-10T14:44:18Z change groupByKeyAndSortValues to return RDD[(K, Iterable[V]) where the values (the iterables) are in-memory arrays ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org