[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174947#comment-13174947
 ] 

Bh V S Kamesh commented on MAPREDUCE-3360:
------------------------------------------

Hi Jason,
  
Thanks for comments. Will incorporate your comments in my next patch. But 
before submitting patch, would like clarify this.

When the RM, does not receive node heartbeat from an NM for *node expiry* 
interval, RM removes the NM from its RM Nodes Map under node *EXPIRE* event. 
Before removing the NM, corresponding Cluster metrics will be updated (In this 
case, incrementing *lost* node count)

If the same NM sends heartbeat after above operation, RM checks whether there 
is any node corresponding to this NodeId. If RM does not find any NM 
corresponding to the NodeId, RM simply returns *reboot* as its heartbeat 
response.
Before sending its heartbeat reponse, RM again updates the Cluster metrics 
(this time, incrementing *reboot* node count).

Is it necessary to update different metrics for the same node's unavailability?
IMO, it shows incorrect information. I *think* either we need to update *lost* 
node count or *reboot* node count but not both, in such circumstance.

any comments?
                
> Provide information about lost nodes in the UI.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3360
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3360
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 0.23.0
>         Environment: NA
>            Reporter: Bh V S Kamesh
>         Attachments: LostNodes.png, MAPREDUCE-3360-1.patch, 
> MAPREDUCE-3360.patch, lostNodes.png
>
>
> Currently there is no information provided about *lost nodes*. Provide 
> information in the UI. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to