shoot me an email if you need any help with spark-sorted. it does not
(yet?) have a java api, so you will have to work in scala

On Mon, May 4, 2015 at 4:05 PM, Burak Yavuz <brk...@gmail.com> wrote:

> I think this Spark Package may be what you're looking for!
> http://spark-packages.org/package/tresata/spark-sorted
>
> Best,
> Burak
>
> On Mon, May 4, 2015 at 12:56 PM, Imran Rashid <iras...@cloudera.com>
> wrote:
>
>> oh wow, that is a really interesting observation, Marco & Jerry.
>> I wonder if this is worth exposing in combineByKey()?  I think Jerry's
>> proposed workaround is all you can do for now -- use reflection to
>> side-step the fact that the methods you need are private.
>>
>> On Mon, Apr 27, 2015 at 8:07 AM, Saisai Shao <sai.sai.s...@gmail.com>
>> wrote:
>>
>>> Hi Marco,
>>>
>>> As I know, current combineByKey() does not expose the related argument
>>> where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is
>>> package private, if you can get the ShuffledRDD through reflection or other
>>> way, the keyOrdering you set will be pushed down to shuffle. If you use a
>>> combination of transformations to do it, the result will be same but the
>>> efficiency may be different, some transformations will separate into
>>> different stages, which will introduce additional shuffle.
>>>
>>> Thanks
>>> Jerry
>>>
>>>
>>> 2015-04-27 19:00 GMT+08:00 Marco <marcope...@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> I'm trying, after reducing by key, to get data ordered among partitions
>>>> (like RangePartitioner) and within partitions (like sortByKey or
>>>> repartitionAndSortWithinPartition) pushing the sorting down to the
>>>> shuffles machinery of the reducing phase.
>>>>
>>>> I think, but maybe I'm wrong, that the correct way to do that is that
>>>> combineByKey call setKeyOrdering function on the ShuflleRDD that it
>>>> returns.
>>>>
>>>> Am I wrong? Can be done by a combination of other transformations with
>>>> the same efficiency?
>>>>
>>>> Thanks,
>>>> Marco
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Reply via email to