Hi Marco,

As I know, current combineByKey() does not expose the related argument
where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is
package private, if you can get the ShuffledRDD through reflection or other
way, the keyOrdering you set will be pushed down to shuffle. If you use a
combination of transformations to do it, the result will be same but the
efficiency may be different, some transformations will separate into
different stages, which will introduce additional shuffle.

Thanks
Jerry


2015-04-27 19:00 GMT+08:00 Marco <marcope...@gmail.com>:

> Hi,
>
> I'm trying, after reducing by key, to get data ordered among partitions
> (like RangePartitioner) and within partitions (like sortByKey or
> repartitionAndSortWithinPartition) pushing the sorting down to the
> shuffles machinery of the reducing phase.
>
> I think, but maybe I'm wrong, that the correct way to do that is that
> combineByKey call setKeyOrdering function on the ShuflleRDD that it
> returns.
>
> Am I wrong? Can be done by a combination of other transformations with
> the same efficiency?
>
> Thanks,
> Marco
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to