Re: Join and HashPartitioner question
You may need to persist r1 after partitionBy call. second join will be more efficient. On Mon, Nov 16, 2015 at 2:48 PM, Rishi Mishra wrote: > AFAIK and can see in the code both of them should behave same. > > On Sat, Nov 14, 2015 at 2:10 AM, Alexander Pivovarov > wrote: > >> Hi Everyone >> >> Is there any difference in performance btw the following two joins? >> >> >> val r1: RDD[(String, String]) = ??? >> val r2: RDD[(String, String]) = ??? >> >> val partNum = 80 >> val partitioner = new HashPartitioner(partNum) >> >> // Join 1 >> val res1 = r1.partitionBy(partitioner).join(r2.partitionBy(partitioner)) >> >> // Join 2 >> val res2 = r1.join(r2, partNum) >> >> >> > > > -- > Regards, > Rishitesh Mishra, > SnappyData . (http://www.snappydata.io/) > > https://in.linkedin.com/in/rishiteshmishra >
Re: Join and HashPartitioner question
AFAIK and can see in the code both of them should behave same. On Sat, Nov 14, 2015 at 2:10 AM, Alexander Pivovarov wrote: > Hi Everyone > > Is there any difference in performance btw the following two joins? > > > val r1: RDD[(String, String]) = ??? > val r2: RDD[(String, String]) = ??? > > val partNum = 80 > val partitioner = new HashPartitioner(partNum) > > // Join 1 > val res1 = r1.partitionBy(partitioner).join(r2.partitionBy(partitioner)) > > // Join 2 > val res2 = r1.join(r2, partNum) > > > -- Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra
Join and HashPartitioner question
Hi Everyone Is there any difference in performance btw the following two joins? val r1: RDD[(String, String]) = ??? val r2: RDD[(String, String]) = ??? val partNum = 80 val partitioner = new HashPartitioner(partNum) // Join 1 val res1 = r1.partitionBy(partitioner).join(r2.partitionBy(partitioner)) // Join 2 val res2 = r1.join(r2, partNum)