I think this Spark Package may be what you're looking for! http://spark-packages.org/package/tresata/spark-sorted
Best, Burak On Mon, May 4, 2015 at 12:56 PM, Imran Rashid <iras...@cloudera.com> wrote: > oh wow, that is a really interesting observation, Marco & Jerry. > I wonder if this is worth exposing in combineByKey()? I think Jerry's > proposed workaround is all you can do for now -- use reflection to > side-step the fact that the methods you need are private. > > On Mon, Apr 27, 2015 at 8:07 AM, Saisai Shao <sai.sai.s...@gmail.com> > wrote: > >> Hi Marco, >> >> As I know, current combineByKey() does not expose the related argument >> where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is >> package private, if you can get the ShuffledRDD through reflection or other >> way, the keyOrdering you set will be pushed down to shuffle. If you use a >> combination of transformations to do it, the result will be same but the >> efficiency may be different, some transformations will separate into >> different stages, which will introduce additional shuffle. >> >> Thanks >> Jerry >> >> >> 2015-04-27 19:00 GMT+08:00 Marco <marcope...@gmail.com>: >> >>> Hi, >>> >>> I'm trying, after reducing by key, to get data ordered among partitions >>> (like RangePartitioner) and within partitions (like sortByKey or >>> repartitionAndSortWithinPartition) pushing the sorting down to the >>> shuffles machinery of the reducing phase. >>> >>> I think, but maybe I'm wrong, that the correct way to do that is that >>> combineByKey call setKeyOrdering function on the ShuflleRDD that it >>> returns. >>> >>> Am I wrong? Can be done by a combination of other transformations with >>> the same efficiency? >>> >>> Thanks, >>> Marco >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >