[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated MAPREDUCE-5617:
-------------------------------

    Attachment: Yarn.7.2.patch

> map task is not re-launched when the task is failed while reducers are 
> running with full cluster capacity - which will lead to job hang
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5617
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5617
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: SuSe Linux
>            Reporter: Sunil G
>            Priority: Critical
>         Attachments: Yarn.7.1.patch, Yarn.7.2.patch
>
>
> In a Cluster with 16GB capacity, job has started with 100maps and 10 
> reducers. 
> When the reducers has started its execution, one NM has went down and 
> resulted a failure for 2 maps. But at this time, remaining 8Gb was used by 6 
> reducers and AM. So there was no place to launch the failed maps. [NM never 
> came up again, and cluster size became 8GB]
> If we kill one of reducers, then also the map cannot be launched as the 
> priority of Failed map is lesser than that of reducer. So the remaining 
> reducer only will get allocated from RM side.
> This is causing a hang for in reducer side. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to