Hi all,
I want to do a recursive leftOuterJoin between an RDD (created from file) with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from 30 diff files in each iteration of a loop) varying from 1 to 6 million rows. When I run it for 5 RDDs,its running successfully in 5 minutes.But when I increase it to 10 or 30 RDDs its gradually slowing down and finally getting stuck without showing any warning or error. I am running in standalone mode with 2 workers of 4GB each and a total of 16 cores . Any of you facing similar problems with JOIN or is it a problem with my configuration. Thanks & Regards, Meethu M