[ 
https://issues.apache.org/jira/browse/HADOOP-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haryadi Gunawi updated HADOOP-923:
----------------------------------

    Description: 
The datanode sends a heartbeat to the namenode every 3 seconds. The namenode 
processes the heartbeat and sends  a list of block-to-be-replicated and 
blocks-to-be-deleted as part of the heartbeat response.

At times when a couple of datanodes fail, the heartbeat processing on the 
namenode becomes pretty heavyweight. It acquires the global FSNamesystem lock, 
traverses the neededReplication structure, generates a list of blocks to be 
replicated and responds to the heartbeat message. Determining the list of 
blocks-to-be-replciated is pretty heavyweight, takes plenty of CPU and blocks 
processing of other heartbeats because of the global FSNamesystem lock.

It would improve scalability a lot if heartbeat processing does not require the 
FSNamesystem lock. In fact, the pre-existing "heartbeat" lock already exists 
for this purpose. 

I propose that the Heartbeat message be separate from the "retrieve 
blocks-to-replicate and blocks-to-delete" messages. The datanode can continue 
to heartbeat once every 3 seconds while it can afford to "retrieve 
blocks-to-replicate" at a much coarser interval. Heartbeat processing on the 
namenode will be fast because it does not require the global FSNamesystem lock. 
Moreover, a datanode failure will not aggrevate the heartbeat processing time 
on the namenode.
 


  was:
The datanode sends a heartbeat to the namenode every 3 seconds. The namenode 
processes the heartbeat and sends  a list of block-to-be-replicated and 
blocks-to-be-deleted as part of the heartbeat response.

At times when a couple of datanodes fail, the heartbeat processing on the 
namenode becomes pretty heavyweight. It acquires the global FSNamesystem lock, 
traverses the neededReplication structure, generates a list of blocks to be 
replicated and responds to the heartbeat message. Determining the list of 
blocks-to-be-replciated is pretty heavyweight, takes plenty of CPU and blocks 
processing of other heartbeats because of the global FSNamesystem lock.

It would improve scalability a lot if heartbeat processing does not require the 
FSNamesystem lock. In fact, the pre-existing "heartbeat" lock already exists 
for this purpose. 

I propose that the Heartbeat message be separate from the "retrieve 
blocks-to-replicate and blocks-to-delete" messages. The datanode can continue 
to heartbeat once every 3 seconds while it can afford to "retrieve 
blocks-to-replicate" at a much coarser interval. Heartbeat processing on the 
namenode will be fast because it does not require the global FSNamesystem lock. 
Moreover, a datanode failure will not aggrevate the heartbeat processing time 
on the namenode.



    
> DFS Scalability: datanode heartbeat timeouts cause cascading timeouts of 
> other datanodes
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-923
>                 URL: https://issues.apache.org/jira/browse/HADOOP-923
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.12.0
>
>         Attachments: pendingTransferThread2.patch
>
>
> The datanode sends a heartbeat to the namenode every 3 seconds. The namenode 
> processes the heartbeat and sends  a list of block-to-be-replicated and 
> blocks-to-be-deleted as part of the heartbeat response.
> At times when a couple of datanodes fail, the heartbeat processing on the 
> namenode becomes pretty heavyweight. It acquires the global FSNamesystem 
> lock, traverses the neededReplication structure, generates a list of blocks 
> to be replicated and responds to the heartbeat message. Determining the list 
> of blocks-to-be-replciated is pretty heavyweight, takes plenty of CPU and 
> blocks processing of other heartbeats because of the global FSNamesystem lock.
> It would improve scalability a lot if heartbeat processing does not require 
> the FSNamesystem lock. In fact, the pre-existing "heartbeat" lock already 
> exists for this purpose. 
> I propose that the Heartbeat message be separate from the "retrieve 
> blocks-to-replicate and blocks-to-delete" messages. The datanode can continue 
> to heartbeat once every 3 seconds while it can afford to "retrieve 
> blocks-to-replicate" at a much coarser interval. Heartbeat processing on the 
> namenode will be fast because it does not require the global FSNamesystem 
> lock. Moreover, a datanode failure will not aggrevate the heartbeat 
> processing time on the namenode.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to