Hi, They were indeed reported as failed and speculative execution is disabled because this particular job is not side-effect free. We've seen this before but it's rare.
Thanks > Exit code 137 would mean a SIGKILL. Are you positive its reported as a > failed task and not as a killed task? If not, it may just be a result of > speculative reducer execution at work and is nothing to worry about. > > On 27-Dec-2011, at 1:14 AM, Markus Jelsma wrote: > > Hi, > > > > We sometimes see reducers fail just when all mappers are finishing. All > > mappers finish roughly at the same time. The reducers only dump the > > following exception: > > > > java.lang.Throwable: Child Error > > > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > > > > Caused by: java.io.IOException: Task process exit with nonzero status of > > 137. > > > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > > The reducers own log output also shows nothing that gives a clue, this is > > the last part of the log: > > > > 2011-12-26 19:35:19,116 INFO org.apache.hadoop.io.compress.CodecPool: Got > > brand-new decompressor > > 2011-12-26 19:35:19,117 INFO org.apache.hadoop.io.compress.CodecPool: Got > > brand-new decompressor > > 2011-12-26 19:35:19,117 INFO org.apache.hadoop.io.compress.CodecPool: Got > > brand-new decompressor > > 2011-12-26 19:35:19,120 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Thread started: Thread for merging > > on- disk files > > 2011-12-26 19:35:19,120 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Thread waiting: Thread for merging > > on- disk files > > 2011-12-26 19:35:19,121 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Thread started: Thread for merging > > in memory files > > 2011-12-26 19:35:19,122 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Need another 50 map output(s) where > > 0 is already in progress > > 2011-12-26 19:35:19,122 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Thread started: Thread for polling > > Map Completion Events > > 2011-12-26 19:35:19,122 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Scheduled 0 outputs (0 slow hosts > > and0 dup hosts) > > 2011-12-26 19:35:24,124 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Scheduled 2 outputs (0 slow hosts > > and0 dup hosts) > > 2011-12-26 19:35:25,805 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Scheduled 1 outputs (0 slow hosts > > and0 dup hosts) > > 2011-12-26 19:36:21,578 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Need another 47 map output(s) where > > 0 is already in progress > > 2011-12-26 19:36:21,593 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Scheduled 1 outputs (0 slow hosts > > and0 dup hosts) > > 2011-12-26 19:36:42,412 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201112261420_0006_r_000009_0 Scheduled 1 outputs (0 slow hosts > > and0 dup hosts) > > > > Is there any advice? > > Thanks