subject:"RE\: ReduceByKey and sorting within partitions"

Re: ReduceByKey and sorting within partitions

2015-05-04 Thread Koert Kuipers

shoot me an email if you need any help with spark-sorted. it does not (yet?) have a java api, so you will have to work in scala On Mon, May 4, 2015 at 4:05 PM, Burak Yavuz brk...@gmail.com wrote: I think this Spark Package may be what you're looking for!

Re: ReduceByKey and sorting within partitions

2015-05-04 Thread Imran Rashid

oh wow, that is a really interesting observation, Marco Jerry. I wonder if this is worth exposing in combineByKey()? I think Jerry's proposed workaround is all you can do for now -- use reflection to side-step the fact that the methods you need are private. On Mon, Apr 27, 2015 at 8:07 AM,

Re: ReduceByKey and sorting within partitions

2015-05-04 Thread Burak Yavuz

I think this Spark Package may be what you're looking for! http://spark-packages.org/package/tresata/spark-sorted Best, Burak On Mon, May 4, 2015 at 12:56 PM, Imran Rashid iras...@cloudera.com wrote: oh wow, that is a really interesting observation, Marco Jerry. I wonder if this is worth

Re: ReduceByKey and sorting within partitions

2015-04-29 Thread Marco

On 04/27/2015 06:00 PM, Ganelin, Ilya wrote: Marco - why do you want data sorted both within and across partitions? If you need to take an ordered sequence across all your data you need to either aggregate your RDD on the driver and sort it, or use zipWithIndex to apply an ordered index

Re: ReduceByKey and sorting within partitions

2015-04-27 Thread Saisai Shao

Hi Marco, As I know, current combineByKey() does not expose the related argument where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is package private, if you can get the ShuffledRDD through reflection or other way, the keyOrdering you set will be pushed down to shuffle. If you

RE: ReduceByKey and sorting within partitions

2015-04-27 Thread Ganelin, Ilya

Marco - why do you want data sorted both within and across partitions? If you need to take an ordered sequence across all your data you need to either aggregate your RDD on the driver and sort it, or use zipWithIndex to apply an ordered index to your data that matches the order it was stored on

Re: ReduceByKey and sorting within partitions

Re: ReduceByKey and sorting within partitions

Re: ReduceByKey and sorting within partitions

Re: ReduceByKey and sorting within partitions

Re: ReduceByKey and sorting within partitions

RE: ReduceByKey and sorting within partitions

6 matches

Site Navigation

Mail list logo

Footer information