[ 
https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960776#comment-14960776
 ] 

Kihwal Lee commented on HDFS-9239:
----------------------------------

It may not help much with the namenode side. Even on extremely busy clusters, I 
have not seen nodes missing heartbeat and considered dead because of the 
contention among heartbeats, incremental block reports (IBR) and full block 
reports (FBR).  Well before node liveness is affected by inundation of IBRs and 
FBRs, the namenode performance will degrade to unacceptable level. It is really 
easy to test this. Create a wide job that creates a lot small files. 

However,making it lighter on the datanode side is a good idea. We have seen 
many cases where nodes are declared dead because the service actor thread is 
delayed/blocked. 

> DataNode Lifeline Protocol: an alternative protocol for reporting DataNode 
> liveness
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-9239
>                 URL: https://issues.apache.org/jira/browse/HDFS-9239
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: DataNode-Lifeline-Protocol.pdf
>
>
> This issue proposes introduction of a new feature: the DataNode Lifeline 
> Protocol.  This is an RPC protocol that is responsible for reporting liveness 
> and basic health information about a DataNode to a NameNode.  Compared to the 
> existing heartbeat messages, it is lightweight and not prone to resource 
> contention problems that can harm accurate tracking of DataNode liveness 
> currently.  The attached design document contains more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to