Hi,

They were indeed reported as failed and speculative execution is disabled 
because this particular job is not side-effect free. We've seen this before 
but it's rare.

Thanks

> Exit code 137 would mean a SIGKILL. Are you positive its reported as a
> failed task and not as a killed task? If not, it may just be a result of
> speculative reducer execution at work and is nothing to worry about.
> 
> On 27-Dec-2011, at 1:14 AM, Markus Jelsma wrote:
> > Hi,
> > 
> > We sometimes see reducers fail just when all mappers are finishing. All
> > mappers finish roughly at the same time. The reducers only dump the
> > following exception:
> > 
> > java.lang.Throwable: Child Error
> > 
> >     at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> > 
> > Caused by: java.io.IOException: Task process exit with nonzero status of
> > 137.
> > 
> >     at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
> > 
> > The reducers own log output also shows nothing that gives a clue, this is
> > the last part of the log:
> > 
> > 2011-12-26 19:35:19,116 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new decompressor
> > 2011-12-26 19:35:19,117 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new decompressor
> > 2011-12-26 19:35:19,117 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new decompressor
> > 2011-12-26 19:35:19,120 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Thread started: Thread for merging
> > on- disk files
> > 2011-12-26 19:35:19,120 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Thread waiting: Thread for merging
> > on- disk files
> > 2011-12-26 19:35:19,121 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Thread started: Thread for merging
> > in memory files
> > 2011-12-26 19:35:19,122 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Need another 50 map output(s) where
> > 0 is already in progress
> > 2011-12-26 19:35:19,122 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Thread started: Thread for polling
> > Map Completion Events
> > 2011-12-26 19:35:19,122 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Scheduled 0 outputs (0 slow hosts
> > and0 dup hosts)
> > 2011-12-26 19:35:24,124 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Scheduled 2 outputs (0 slow hosts
> > and0 dup hosts)
> > 2011-12-26 19:35:25,805 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2011-12-26 19:36:21,578 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Need another 47 map output(s) where
> > 0 is already in progress
> > 2011-12-26 19:36:21,593 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2011-12-26 19:36:42,412 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112261420_0006_r_000009_0 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 
> > Is there any advice?
> > Thanks

Reply via email to