shoot me an email if you need any help with spark-sorted. it does not
(yet?) have a java api, so you will have to work in scala
On Mon, May 4, 2015 at 4:05 PM, Burak Yavuz brk...@gmail.com wrote:
I think this Spark Package may be what you're looking for!
oh wow, that is a really interesting observation, Marco Jerry.
I wonder if this is worth exposing in combineByKey()? I think Jerry's
proposed workaround is all you can do for now -- use reflection to
side-step the fact that the methods you need are private.
On Mon, Apr 27, 2015 at 8:07 AM,
I think this Spark Package may be what you're looking for!
http://spark-packages.org/package/tresata/spark-sorted
Best,
Burak
On Mon, May 4, 2015 at 12:56 PM, Imran Rashid iras...@cloudera.com wrote:
oh wow, that is a really interesting observation, Marco Jerry.
I wonder if this is worth
On 04/27/2015 06:00 PM, Ganelin, Ilya wrote:
Marco - why do you want data sorted both within and across partitions? If you
need to take an ordered sequence across all your data you need to either
aggregate your RDD on the driver and sort it, or use zipWithIndex to apply an
ordered index
Hi Marco,
As I know, current combineByKey() does not expose the related argument
where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is
package private, if you can get the ShuffledRDD through reflection or other
way, the keyOrdering you set will be pushed down to shuffle. If you
Marco - why do you want data sorted both within and across partitions? If you
need to take an ordered sequence across all your data you need to either
aggregate your RDD on the driver and sort it, or use zipWithIndex to apply an
ordered index to your data that matches the order it was stored on