[ https://issues.apache.org/jira/browse/MAPREDUCE-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174947#comment-13174947 ]
Bh V S Kamesh commented on MAPREDUCE-3360: ------------------------------------------ Hi Jason, Thanks for comments. Will incorporate your comments in my next patch. But before submitting patch, would like clarify this. When the RM, does not receive node heartbeat from an NM for *node expiry* interval, RM removes the NM from its RM Nodes Map under node *EXPIRE* event. Before removing the NM, corresponding Cluster metrics will be updated (In this case, incrementing *lost* node count) If the same NM sends heartbeat after above operation, RM checks whether there is any node corresponding to this NodeId. If RM does not find any NM corresponding to the NodeId, RM simply returns *reboot* as its heartbeat response. Before sending its heartbeat reponse, RM again updates the Cluster metrics (this time, incrementing *reboot* node count). Is it necessary to update different metrics for the same node's unavailability? IMO, it shows incorrect information. I *think* either we need to update *lost* node count or *reboot* node count but not both, in such circumstance. any comments? > Provide information about lost nodes in the UI. > ----------------------------------------------- > > Key: MAPREDUCE-3360 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3360 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 > Affects Versions: 0.23.0 > Environment: NA > Reporter: Bh V S Kamesh > Attachments: LostNodes.png, MAPREDUCE-3360-1.patch, > MAPREDUCE-3360.patch, lostNodes.png > > > Currently there is no information provided about *lost nodes*. Provide > information in the UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira