Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-04 Thread Imran Rashid
Hi Michael, judging from the logs, it seems that those tasks are just working a really long time. If you have long running tasks, then you wouldn't expect the driver to output anything while those tasks are working. What is unusual is that there is no activity during all that time the tasks are

Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-04 Thread Sandy Ryza
Also, do you see any lines in the YARN NodeManager logs where it says that it's killing a container? -Sandy On Wed, Feb 4, 2015 at 8:56 AM, Imran Rashid iras...@cloudera.com wrote: Hi Michael, judging from the logs, it seems that those tasks are just working a really long time. If you have

Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-04 Thread Michael Albert
Greetings! Thanks to all who have taken the time to look at this. While the process is stalled, I see, in the yarn log on the head node, repeating messages of the form Trying to fulfill reservation for application XXX on node YYY, but that node is is reserved by XXX_01.  Below is a chunk of

advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-03 Thread Michael Albert
Greetings! First, my sincere thanks to all who have given me advice.Following previous discussion, I've rearranged my code to try to keep the partitions to more manageable sizes.Thanks to all who commented. At the moment, the input set I'm trying to work with is about 90GB (avro parquet