[ 
https://issues.apache.org/jira/browse/HDFS-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752801#action_12752801
 ] 

Eli Collins commented on HDFS-599:
----------------------------------

To take this one step further -- why does the failure detection code need to be 
implemented as part of the DN and NN daemons? 

Alternatively, each host could run a single failure detection service. 
Potential benefits:
- You could plug in different types of detectors (or re-use an existing one 
like heartbeat from linux ha)
- The detector would not have to be in java (an be susceptible to gc issues etc)
- You could use host scheduling priorities to prioritize heartbeats over the 
daemon rather than implementing it as part of RPC processing (even when 
prioritizing RPCs the priority of processing heart beats will never be higher 
priority than the priority of the process running the daemon)
- Multiple DN daemons running on the same host could share a single detector

There's still the issue of the priority of a message from the failure detection 
service to the daemons indicating a failure occurred, but this message is only 
sent when a failure has been detected, each heart beat would not be susceptible 
to issues running the DN or NN daemons.

> Improve Namenode robustness by prioritizing datanode heartbeats over client 
> requests
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-599
>                 URL: https://issues.apache.org/jira/browse/HDFS-599
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> The namenode processes RPC requests from clients that are reading/writing to 
> files as well as heartbeats/block reports from datanodes.
> Sometime, because of various reasons (Java GC runs, inconsistent performance 
> of NFS filer that stores HDFS transacttion logs, etc), the namenode 
> encounters transient slowness. For example, if the device that stores the 
> HDFS transaction logs becomes sluggish, the Namenode's ability to process 
> RPCs slows down to a certain extent. During this time, the RPCs from clients 
> as well as the RPCs from datanodes suffer in similar fashion. If the 
> underlying problem becomes worse, the NN's ability to process a heartbeat 
> from a DN is severly impacted, thus causing the NN to declare that the DN is 
> dead. Then the NN starts replicating blocks that used to reside on the 
> now-declared-dead datanode. This adds extra load to the NN. Then the 
> now-declared-datanode finally re-establishes contact with the NN, and sends a 
> block report. The block report processing on the NN is another heavyweight 
> activity, thus casing more load to the already overloaded namenode. 
> My proposal is tha the NN should try its best to continue processing RPCs 
> from datanodes and give lesser priority to serving client requests. The 
> Datanode RPCs are integral to the consistency and performance of the Hadoop 
> file system, and it is better to protect it at all costs. This will ensure 
> that NN  recovers from the hiccup much faster than what it does now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to