Hi!

I'm experiencing hung reducers, with the following symptoms:

> Task Logs: 'task_200807230647_0008_r_000009_1'
>
>
> stdout logs
>
>
>
> stderr logs
>
>
>
> syslog logs
>
> red.ReduceTask: task_200807230647_0008_r_000009_1 Got 0 known map output
> location(s); scheduling... 2008-07-24 07:56:11,064 INFO
> org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
> Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
> 07:56:16,073 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
> 07:56:16,074 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-07-24 07:56:16,074 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
> scheduling... 2008-07-24 07:56:16,074 INFO
> org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
> Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
> 07:56:21,083 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
> 07:56:21,084 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-07-24 07:56:21,084 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
> scheduling... 2008-07-24 07:56:21,084 INFO
> org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
> Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
> 07:56:26,093 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
> 07:56:26,094 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-07-24 07:56:26,094 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
> scheduling... 2008-07-24 07:56:26,094 INFO
> org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
> Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
> 07:56:31,103 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
> 07:56:31,104 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-07-24 07:56:31,104 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
> scheduling... 2008-07-24 07:56:31,104 INFO
> org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
> Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
> 07:56:36,113 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
> 07:56:36,114 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-07-24 07:56:36,114 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
> scheduling... 2008-07-24 07:56:36,114 INFO
> org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
> Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
> 07:56:41,123 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
> 07:56:41,126 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-07-24 07:56:41,126 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
> scheduling... 2008-07-24 07:56:41,126 INFO
> org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
> Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)


Notice how it needs 6 map outputs, all map tasks have finished, and it still 
just hangs there.

The second speculative copy of that reducer task needs 14 map outputs with the 
same messages :(

Other observations:

killing the reduce tasks via job -killtask ends up with restarting the job on 
the same node, and curiously the new job gets jammed at the same position  
(6/14 maps needed).

The only remedy to this problem seems to be a complete restart of the cluster 
and reprocessing. That gets really boring with jobs that took a day to 
process first :(

Andreas

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to