Hi all,

I want  to do a recursive leftOuterJoin between an RDD (created from  file) 
with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from 
30 diff files in each iteration of a loop) varying from 1 to 6 million rows.
When I run it for 5 RDDs,its running successfully  in 5 minutes.But when I 
increase it to 10 or 30 RDDs its gradually slowing down and finally getting 
stuck without showing any warning or error.

I am running in standalone mode with 2 workers of 4GB each and a total of 16 
cores .

Any of you facing similar problems with JOIN  or is it a problem with my 
configuration.

Thanks & Regards, 
Meethu M

Reply via email to