[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175120#comment-13175120
 ] 

Jason Lowe commented on MAPREDUCE-3360:
---------------------------------------

The reboot case is interesting since the code does not seem use 
{{RMNodeState.REBOOT}}.  Therefore the status page for "Rebooted Nodes" will 
always be empty even with this patch.

This makes me wonder whether the metrics for the inactive states were intended 
to be current or historical.  If the latter is the intent, then the node status 
pages for LOST, REBOOT, etc. would be for listing all nodes that went through 
those states since the RM started.  For example, it might be useful to know 
that a particular node has been lost many times even though that node is 
currently active.

If we want the metrics to track the number of nodes currently in a given state, 
as the current patch does, then we should either remove the "Rebooted Nodes" 
column from the metrics since nodes will never be in the REBOOT state or add 
code to support the REBOOT state.  For example, we'd have to update the 
{{StateMachineFactory}} in RMNodeImpl.java to transition to the REBOOT state 
instead of the LOST state when the REBOOTING event is delivered.  And in the 
code you ran across, we'd probably not want to increment the reboot count if 
the node is in the inactive nodes map already.  If it wasn't in the map, we'd 
increment the reboot counter and add it to the inactive node map.

I guess this brings up the question of whether we should allow the counts to 
ever get out of sync with what's stored in the inactive node map.  Currently 
they're tracked separately and could become out of sync (like the code you ran 
across), but it would be easier to guarantee they're consistent if the counts 
were computed from the inactive node collection directly.  (e.g.: by scanning 
the inactive map or having a map per inactive node state).

                
> Provide information about lost nodes in the UI.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3360
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3360
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 0.23.0
>         Environment: NA
>            Reporter: Bh V S Kamesh
>         Attachments: LostNodes.png, MAPREDUCE-3360-1.patch, 
> MAPREDUCE-3360.patch, lostNodes.png
>
>
> Currently there is no information provided about *lost nodes*. Provide 
> information in the UI. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to