Re: Is pair rdd join more efficient than regular rdd

2015-02-02 Thread Akhil Das
Yes it would, you can create a key and then partition it (say HashPartitioner) and then joining would be faster as all the similar keys will go in one partition. Thanks Best Regards On Sun, Feb 1, 2015 at 5:13 PM, Sunita Arvind sunitarv...@gmail.com wrote: Hi All We are joining large tables

Is pair rdd join more efficient than regular rdd

2015-02-01 Thread Sunita Arvind
Hi All We are joining large tables using spark sql and running into shuffle issues. We have explored multiple options - using coalesce to reduce number of partitions, tuning various parameters like disk buffer, reducing data in chunks etc. which all seem to help btw. What I would like to know is,