The example violates the basic contract of a Partitioner.
It does make sense to take Partitioner as a param to distinct - though it
is fairly trivial to simulate that in user code as well ...


On Wednesday, June 8, 2016, 汪洋 <> wrote:

> Hi Alexander,
> I think it does not guarantee to be right if an arbitrary Partitioner is
> passed in.
> I have created a notebook and you can check it out. (
> )
> Best regards,
> Yang
> 在 2016年6月9日,上午11:42,Alexander Pivovarov <
> <javascript:_e(%7B%7D,'cvml','');>> 写道:
> most of the RDD methods which shuffle data take Partitioner as a parameter
> But rdd.distinct does not have such signature
> Should I open a PR for that?
> /**
>  * Return a new RDD containing the distinct elements in this RDD.
>  */
> def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] = null): 
> RDD[T] = withScope {
>   map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1)
> }

Reply via email to