Hi! I'm experiencing hung reducers, with the following symptoms:
> Task Logs: 'task_200807230647_0008_r_000009_1' > > > stdout logs > > > > stderr logs > > > > syslog logs > > red.ReduceTask: task_200807230647_0008_r_000009_1 Got 0 known map output > location(s); scheduling... 2008-07-24 07:56:11,064 INFO > org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1 > Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24 > 07:56:16,073 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24 > 07:56:16,074 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete > map-outputs from tasktracker and 0 map-outputs from previous failures > 2008-07-24 07:56:16,074 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Got 0 known map output location(s); > scheduling... 2008-07-24 07:56:16,074 INFO > org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1 > Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24 > 07:56:21,083 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24 > 07:56:21,084 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete > map-outputs from tasktracker and 0 map-outputs from previous failures > 2008-07-24 07:56:21,084 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Got 0 known map output location(s); > scheduling... 2008-07-24 07:56:21,084 INFO > org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1 > Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24 > 07:56:26,093 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24 > 07:56:26,094 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete > map-outputs from tasktracker and 0 map-outputs from previous failures > 2008-07-24 07:56:26,094 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Got 0 known map output location(s); > scheduling... 2008-07-24 07:56:26,094 INFO > org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1 > Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24 > 07:56:31,103 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24 > 07:56:31,104 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete > map-outputs from tasktracker and 0 map-outputs from previous failures > 2008-07-24 07:56:31,104 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Got 0 known map output location(s); > scheduling... 2008-07-24 07:56:31,104 INFO > org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1 > Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24 > 07:56:36,113 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24 > 07:56:36,114 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete > map-outputs from tasktracker and 0 map-outputs from previous failures > 2008-07-24 07:56:36,114 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Got 0 known map output location(s); > scheduling... 2008-07-24 07:56:36,114 INFO > org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1 > Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24 > 07:56:41,123 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24 > 07:56:41,126 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete > map-outputs from tasktracker and 0 map-outputs from previous failures > 2008-07-24 07:56:41,126 INFO org.apache.hadoop.mapred.ReduceTask: > task_200807230647_0008_r_000009_1 Got 0 known map output location(s); > scheduling... 2008-07-24 07:56:41,126 INFO > org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1 > Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) Notice how it needs 6 map outputs, all map tasks have finished, and it still just hangs there. The second speculative copy of that reducer task needs 14 map outputs with the same messages :( Other observations: killing the reduce tasks via job -killtask ends up with restarting the job on the same node, and curiously the new job gets jammed at the same position (6/14 maps needed). The only remedy to this problem seems to be a complete restart of the cluster and reprocessing. That gets really boring with jobs that took a day to process first :( Andreas
signature.asc
Description: This is a digitally signed message part.