[ https://issues.apache.org/jira/browse/MAPREDUCE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sunil G updated MAPREDUCE-5617: ------------------------------- Attachment: Yarn.7.1.patch > map task is not re-launched when the task is failed while reducers are > running with full cluster capacity - which will lead to job hang > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-5617 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5617 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.2.0 > Environment: SuSe Linux > Reporter: Sunil G > Priority: Critical > Attachments: Yarn.7.1.patch, Yarn.7.2.patch > > > In a Cluster with 16GB capacity, job has started with 100maps and 10 > reducers. > When the reducers has started its execution, one NM has went down and > resulted a failure for 2 maps. But at this time, remaining 8Gb was used by 6 > reducers and AM. So there was no place to launch the failed maps. [NM never > came up again, and cluster size became 8GB] > If we kill one of reducers, then also the map cannot be launched as the > priority of Failed map is lesser than that of reducer. So the remaining > reducer only will get allocated from RM side. > This is causing a hang for in reducer side. -- This message was sent by Atlassian JIRA (v6.2#6252)