MapReduce jobs hanging or failing near completion

Kai Ju Liu Thu, 07 Jul 2011 17:43:49 -0700

Over the past week or two, I've run into an issue where MapReduce jobs hang
or fail near completion. The percent completion of both map and reduce tasks
is often reported as 100%, but the actual number of completed tasks is less
than the total number. It appears that either tasks backtrack and need to be
restarted or the last few reduce tasks hang interminably on the copy step.


In certain cases, the jobs actually complete. In other cases, I can't wait
long enough and have to kill the job manually.

My Hadoop cluster is hosted in EC2 on instances of type c1.xlarge with 4
attached EBS volumes. The instances run Ubuntu 10.04.1 with the
2.6.32-309-ec2 kernel, and I'm currently using Cloudera's CDH3u0
distribution. Has anyone experienced similar behavior in their clusters, and
if so, had any luck resolving it? Thanks!

Kai Ju

MapReduce jobs hanging or failing near completion

Reply via email to