subject:"partitioner aware subtract"

Re: partitioner aware subtract

2016-05-10 Thread Raghava Mutharaju

Thank you for the response. This does not work on the test case that I mentioned in the previous email. val data1 = Seq((1 -> 2), (1 -> 5), (2 -> 3), (3 -> 20), (3 -> 16)) val data2 = Seq((1 -> 2), (3 -> 30), (3 -> 16), (5 -> 12)) val rdd1 = sc.parallelize(data1, 8) val rdd2 =

Re: partitioner aware subtract

2016-05-10 Thread Rishi Mishra

As you have same partitioner and number of partitions probably you can use zipPartition and provide a user defined function to substract . A very primitive example being. val data1 = Seq(1->1,2->2,3->3,4->4,5->5,6->6,7->7) val data2 = Seq(1->1,2->2,3->3,4->4,5->5,6->6) val rdd1 =

Re: partitioner aware subtract

2016-05-09 Thread Raghava Mutharaju

We tried that but couldn't figure out a way to efficiently filter it. Lets take two RDDs. rdd1: (1,2) (1,5) (2,3) (3,20) (3,16) rdd2: (1,2) (3,30) (3,16) (5,12) rdd1.leftOuterJoin(rdd2) and get rdd1.subtract(rdd2): (1,(2,Some(2))) (1,(5,Some(2))) (2,(3,None)) (3,(20,Some(30)))

Re: partitioner aware subtract

2016-05-09 Thread ayan guha

How about outer join? On 9 May 2016 13:18, "Raghava Mutharaju" wrote: > Hello All, > > We have two PairRDDs (rdd1, rdd2) which are hash partitioned on key > (number of partitions are same for both the RDDs). We would like to > subtract rdd2 from rdd1. > > The subtract

partitioner aware subtract

2016-05-08 Thread Raghava Mutharaju

Hello All, We have two PairRDDs (rdd1, rdd2) which are hash partitioned on key (number of partitions are same for both the RDDs). We would like to subtract rdd2 from rdd1. The subtract code at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala seems to

Re: partitioner aware subtract

Re: partitioner aware subtract

Re: partitioner aware subtract

Re: partitioner aware subtract

partitioner aware subtract

5 matches

Site Navigation

Mail list logo

Footer information