[jira] [Created] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error

Nishan Shetty (Created) (JIRA) Sun, 18 Mar 2012 22:44:16 -0700

If the nodemanager on which the maptask is executed is going down before the 
mapoutput is consumed by the reducer,then the job is failing with shuffle error
------------------------------------------------------------------------------------------------------------------------------------------------------------


                 Key: MAPREDUCE-4030
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
            Reporter: Nishan Shetty


My cluster has 2 NM's.
The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1.
When the job execution is in progress and Mappers has finished about 99% 
completion,one of the NM has gone down.
The job has failed with the following trace

"Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
shuffle in fetcher#1 at 
org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: 
java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
 at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
 at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240) 
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) "

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error

Reply via email to