Re: ReduceByKey and sorting within partitions

2015-05-04 Thread Koert Kuipers
shoot me an email if you need any help with spark-sorted. it does not (yet?) have a java api, so you will have to work in scala On Mon, May 4, 2015 at 4:05 PM, Burak Yavuz brk...@gmail.com wrote: I think this Spark Package may be what you're looking for!

Re: ReduceByKey and sorting within partitions

2015-05-04 Thread Imran Rashid
oh wow, that is a really interesting observation, Marco Jerry. I wonder if this is worth exposing in combineByKey()? I think Jerry's proposed workaround is all you can do for now -- use reflection to side-step the fact that the methods you need are private. On Mon, Apr 27, 2015 at 8:07 AM,

Re: ReduceByKey and sorting within partitions

2015-05-04 Thread Burak Yavuz
I think this Spark Package may be what you're looking for! http://spark-packages.org/package/tresata/spark-sorted Best, Burak On Mon, May 4, 2015 at 12:56 PM, Imran Rashid iras...@cloudera.com wrote: oh wow, that is a really interesting observation, Marco Jerry. I wonder if this is worth

Re: ReduceByKey and sorting within partitions

2015-04-29 Thread Marco
On 04/27/2015 06:00 PM, Ganelin, Ilya wrote: Marco - why do you want data sorted both within and across partitions? If you need to take an ordered sequence across all your data you need to either aggregate your RDD on the driver and sort it, or use zipWithIndex to apply an ordered index

ReduceByKey and sorting within partitions

2015-04-27 Thread Marco
Hi, I'm trying, after reducing by key, to get data ordered among partitions (like RangePartitioner) and within partitions (like sortByKey or repartitionAndSortWithinPartition) pushing the sorting down to the shuffles machinery of the reducing phase. I think, but maybe I'm wrong, that the correct

Re: ReduceByKey and sorting within partitions

2015-04-27 Thread Saisai Shao
Hi Marco, As I know, current combineByKey() does not expose the related argument where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is package private, if you can get the ShuffledRDD through reflection or other way, the keyOrdering you set will be pushed down to shuffle. If you

RE: ReduceByKey and sorting within partitions

2015-04-27 Thread Ganelin, Ilya
Standard Time To: user@spark.apache.org Subject: ReduceByKey and sorting within partitions Hi, I'm trying, after reducing by key, to get data ordered among partitions (like RangePartitioner) and within partitions (like sortByKey or repartitionAndSortWithinPartition) pushing the sorting down