oh wow, that is a really interesting observation, Marco & Jerry.
I wonder if this is worth exposing in combineByKey()?  I think Jerry's
proposed workaround is all you can do for now -- use reflection to
side-step the fact that the methods you need are private.

On Mon, Apr 27, 2015 at 8:07 AM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> Hi Marco,
>
> As I know, current combineByKey() does not expose the related argument
> where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is
> package private, if you can get the ShuffledRDD through reflection or other
> way, the keyOrdering you set will be pushed down to shuffle. If you use a
> combination of transformations to do it, the result will be same but the
> efficiency may be different, some transformations will separate into
> different stages, which will introduce additional shuffle.
>
> Thanks
> Jerry
>
>
> 2015-04-27 19:00 GMT+08:00 Marco <marcope...@gmail.com>:
>
>> Hi,
>>
>> I'm trying, after reducing by key, to get data ordered among partitions
>> (like RangePartitioner) and within partitions (like sortByKey or
>> repartitionAndSortWithinPartition) pushing the sorting down to the
>> shuffles machinery of the reducing phase.
>>
>> I think, but maybe I'm wrong, that the correct way to do that is that
>> combineByKey call setKeyOrdering function on the ShuflleRDD that it
>> returns.
>>
>> Am I wrong? Can be done by a combination of other transformations with
>> the same efficiency?
>>
>> Thanks,
>> Marco
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to