shoot me an email if you need any help with spark-sorted. it does not (yet?) have a java api, so you will have to work in scala
On Mon, May 4, 2015 at 4:05 PM, Burak Yavuz <brk...@gmail.com> wrote: > I think this Spark Package may be what you're looking for! > http://spark-packages.org/package/tresata/spark-sorted > > Best, > Burak > > On Mon, May 4, 2015 at 12:56 PM, Imran Rashid <iras...@cloudera.com> > wrote: > >> oh wow, that is a really interesting observation, Marco & Jerry. >> I wonder if this is worth exposing in combineByKey()? I think Jerry's >> proposed workaround is all you can do for now -- use reflection to >> side-step the fact that the methods you need are private. >> >> On Mon, Apr 27, 2015 at 8:07 AM, Saisai Shao <sai.sai.s...@gmail.com> >> wrote: >> >>> Hi Marco, >>> >>> As I know, current combineByKey() does not expose the related argument >>> where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is >>> package private, if you can get the ShuffledRDD through reflection or other >>> way, the keyOrdering you set will be pushed down to shuffle. If you use a >>> combination of transformations to do it, the result will be same but the >>> efficiency may be different, some transformations will separate into >>> different stages, which will introduce additional shuffle. >>> >>> Thanks >>> Jerry >>> >>> >>> 2015-04-27 19:00 GMT+08:00 Marco <marcope...@gmail.com>: >>> >>>> Hi, >>>> >>>> I'm trying, after reducing by key, to get data ordered among partitions >>>> (like RangePartitioner) and within partitions (like sortByKey or >>>> repartitionAndSortWithinPartition) pushing the sorting down to the >>>> shuffles machinery of the reducing phase. >>>> >>>> I think, but maybe I'm wrong, that the correct way to do that is that >>>> combineByKey call setKeyOrdering function on the ShuflleRDD that it >>>> returns. >>>> >>>> Am I wrong? Can be done by a combination of other transformations with >>>> the same efficiency? >>>> >>>> Thanks, >>>> Marco >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >