reduceByKey(randomPartitioner, (a, b) => a + b) also gives incorrect result
Why reduceByKey with Partitioner exists then? On Wed, Jun 8, 2016 at 9:22 PM, 汪洋 <tiandiwo...@icloud.com> wrote: > Hi Alexander, > > I think it does not guarantee to be right if an arbitrary Partitioner is > passed in. > > I have created a notebook and you can check it out. ( > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366/latest.html > ) > > Best regards, > > Yang > > > 在 2016年6月9日,上午11:42,Alexander Pivovarov <apivova...@gmail.com> 写道: > > most of the RDD methods which shuffle data take Partitioner as a parameter > > But rdd.distinct does not have such signature > > Should I open a PR for that? > > /** > * Return a new RDD containing the distinct elements in this RDD. > */ > > def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] = null): > RDD[T] = withScope { > map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1) > } > > >