[ https://issues.apache.org/jira/browse/MAPREDUCE-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175120#comment-13175120 ]
Jason Lowe commented on MAPREDUCE-3360: --------------------------------------- The reboot case is interesting since the code does not seem use {{RMNodeState.REBOOT}}. Therefore the status page for "Rebooted Nodes" will always be empty even with this patch. This makes me wonder whether the metrics for the inactive states were intended to be current or historical. If the latter is the intent, then the node status pages for LOST, REBOOT, etc. would be for listing all nodes that went through those states since the RM started. For example, it might be useful to know that a particular node has been lost many times even though that node is currently active. If we want the metrics to track the number of nodes currently in a given state, as the current patch does, then we should either remove the "Rebooted Nodes" column from the metrics since nodes will never be in the REBOOT state or add code to support the REBOOT state. For example, we'd have to update the {{StateMachineFactory}} in RMNodeImpl.java to transition to the REBOOT state instead of the LOST state when the REBOOTING event is delivered. And in the code you ran across, we'd probably not want to increment the reboot count if the node is in the inactive nodes map already. If it wasn't in the map, we'd increment the reboot counter and add it to the inactive node map. I guess this brings up the question of whether we should allow the counts to ever get out of sync with what's stored in the inactive node map. Currently they're tracked separately and could become out of sync (like the code you ran across), but it would be easier to guarantee they're consistent if the counts were computed from the inactive node collection directly. (e.g.: by scanning the inactive map or having a map per inactive node state). > Provide information about lost nodes in the UI. > ----------------------------------------------- > > Key: MAPREDUCE-3360 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3360 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 > Affects Versions: 0.23.0 > Environment: NA > Reporter: Bh V S Kamesh > Attachments: LostNodes.png, MAPREDUCE-3360-1.patch, > MAPREDUCE-3360.patch, lostNodes.png > > > Currently there is no information provided about *lost nodes*. Provide > information in the UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira