Re: Join and HashPartitioner question

2015-11-16 Thread Erwan ALLAIN
You may need to persist r1 after partitionBy call. second join will be more
efficient.

On Mon, Nov 16, 2015 at 2:48 PM, Rishi Mishra  wrote:

> AFAIK and can see in the code both of them should behave same.
>
> On Sat, Nov 14, 2015 at 2:10 AM, Alexander Pivovarov  > wrote:
>
>> Hi Everyone
>>
>> Is there any difference in performance btw the following two joins?
>>
>>
>> val r1: RDD[(String, String]) = ???
>> val r2: RDD[(String, String]) = ???
>>
>> val partNum = 80
>> val partitioner = new HashPartitioner(partNum)
>>
>> // Join 1
>> val res1 = r1.partitionBy(partitioner).join(r2.partitionBy(partitioner))
>>
>> // Join 2
>> val res2 = r1.join(r2, partNum)
>>
>>
>>
>
>
> --
> Regards,
> Rishitesh Mishra,
> SnappyData . (http://www.snappydata.io/)
>
> https://in.linkedin.com/in/rishiteshmishra
>


Re: Join and HashPartitioner question

2015-11-16 Thread Rishi Mishra
AFAIK and can see in the code both of them should behave same.

On Sat, Nov 14, 2015 at 2:10 AM, Alexander Pivovarov 
wrote:

> Hi Everyone
>
> Is there any difference in performance btw the following two joins?
>
>
> val r1: RDD[(String, String]) = ???
> val r2: RDD[(String, String]) = ???
>
> val partNum = 80
> val partitioner = new HashPartitioner(partNum)
>
> // Join 1
> val res1 = r1.partitionBy(partitioner).join(r2.partitionBy(partitioner))
>
> // Join 2
> val res2 = r1.join(r2, partNum)
>
>
>


-- 
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra


Join and HashPartitioner question

2015-11-13 Thread Alexander Pivovarov
Hi Everyone

Is there any difference in performance btw the following two joins?


val r1: RDD[(String, String]) = ???
val r2: RDD[(String, String]) = ???

val partNum = 80
val partitioner = new HashPartitioner(partNum)

// Join 1
val res1 = r1.partitionBy(partitioner).join(r2.partitionBy(partitioner))

// Join 2
val res2 = r1.join(r2, partNum)