Job tracker not responding during streaming job

David Kellogg Mon, 06 Apr 2009 15:18:12 -0700

I am running Hadoop streaming. After around 42 jobs on an 18-nodecluster, the jobtracker stops responding. This happens on normally-working code. Here are the symptoms.


1. A job is running, but it pauses with reduce stuck at XX%
2. "hadoop job -list" hangs or takes a very long time to return
3. In the Ganglia metrics on the Jobtracker node:

a. jvm.metrics__JobTracker__gcTimeMillis rises above 20 k (20seconds) before failureb. jvm.metrics__JobTracker__memHeapUsedM rises above 600 beforefailure

     c. jvm.metrics__JobTracker__gcCount rises above 1 k before failure



The ticker looks like this.

09/04/06 03:06:28 INFO streaming.StreamJob:  map 24%  reduce 7%
09/04/06 03:13:44 INFO streaming.StreamJob:  map 25%  reduce 7%
After the 03:13:44 line, it hangs for more than 15 minutes.

In the jobtracker log, I see this.

2009-04-04 04:19:13,563 WARN org.apache.hadoop.hdfs.DFSClient: ErrorRecovery for block blk_-8143535428142072268_95993 failed becauserecovery from primary datanode 10.1.0.156:50010 failed 4 times. Willretry...

After restarting both dfs and mapreduce on all nodes, the problemgoes away, and the formally non-working job proceeds without failure.


Does anyone else see this problem?

David Kellogg

Job tracker not responding during streaming job

Reply via email to