[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215952#comment-13215952 ]
Vinod Kumar Vavilapalli commented on MAPREDUCE-3353: ---------------------------------------------------- +1 for the latest proposal, we can use NodeListManager itself instead of a new ClusterManager. In addition, we need AMs to act on the nodes information. Filing a separate ticket. > Need a RM->AM channel to inform AMs about faulty/unhealthy/lost nodes > --------------------------------------------------------------------- > > Key: MAPREDUCE-3353 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2, resourcemanager > Affects Versions: 0.23.0 > Reporter: Vinod Kumar Vavilapalli > Assignee: Bikas Saha > Priority: Critical > Fix For: 0.23.2 > > > When a node gets lost or turns faulty, AM needs to know about that event so > that it can take some action like for e.g. re-executing map tasks whose > intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira