shoot me an email if you need any help with spark-sorted. it does not
(yet?) have a java api, so you will have to work in scala
On Mon, May 4, 2015 at 4:05 PM, Burak Yavuz brk...@gmail.com wrote:
I think this Spark Package may be what you're looking for!
oh wow, that is a really interesting observation, Marco Jerry.
I wonder if this is worth exposing in combineByKey()? I think Jerry's
proposed workaround is all you can do for now -- use reflection to
side-step the fact that the methods you need are private.
On Mon, Apr 27, 2015 at 8:07 AM,
I think this Spark Package may be what you're looking for!
http://spark-packages.org/package/tresata/spark-sorted
Best,
Burak
On Mon, May 4, 2015 at 12:56 PM, Imran Rashid iras...@cloudera.com wrote:
oh wow, that is a really interesting observation, Marco Jerry.
I wonder if this is worth
On 04/27/2015 06:00 PM, Ganelin, Ilya wrote:
Marco - why do you want data sorted both within and across partitions? If you
need to take an ordered sequence across all your data you need to either
aggregate your RDD on the driver and sort it, or use zipWithIndex to apply an
ordered index
Hi,
I'm trying, after reducing by key, to get data ordered among partitions
(like RangePartitioner) and within partitions (like sortByKey or
repartitionAndSortWithinPartition) pushing the sorting down to the
shuffles machinery of the reducing phase.
I think, but maybe I'm wrong, that the correct
Hi Marco,
As I know, current combineByKey() does not expose the related argument
where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is
package private, if you can get the ShuffledRDD through reflection or other
way, the keyOrdering you set will be pushed down to shuffle. If you
Standard Time
To: user@spark.apache.org
Subject: ReduceByKey and sorting within partitions
Hi,
I'm trying, after reducing by key, to get data ordered among partitions
(like RangePartitioner) and within partitions (like sortByKey or
repartitionAndSortWithinPartition) pushing the sorting down