Hello Team,
I am trying to perform join 2 rdds where one is of size 800 MB and the other is 190 MB. During the join step, my job halts and I don't see progress in the execution. This is the message I see on console - INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output locations for shuffle 0 to <hostname1>:40000 INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output locations for shuffle 1 to <hostname2>:40000 After these messages, I dont see any progress. I am using Spark 1.6.0 version and yarn scheduler (running in YARN client mode). My cluster configurations is - 3 node cluster (1 master and 2 slaves). Each slave has 1 TB hard disk space, 300GB memory and 32 cores. HDFS block size is 128 MB. Thanks, Padma Ch