[ 
https://issues.apache.org/jira/browse/HDFS-15605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216155#comment-17216155
 ] 

Jinglun commented on HDFS-15605:
--------------------------------

Hi [~ayushtkn], thanks your nice comments ! If I understand correctly, your 
suggestion is to add some if-else to let the DeadNodeDetector has different 
behaviors. And the purpose is to keep the main structure unchanged for better 
stability and compatibility.

When I first started working on this I did think about using some if-else to 
let the DeadNodeDetector updating deadnodes from the NameNode. Finally I chosen 
the current way beacuse:

1.  To preserve the basic structure of the DeadNodeDetector. Adding the logic 
of InServiceDetector with many if-else conditions would make the 
DeadNodeDetector logic not clear and even harder to be understood. The 
DeadNodeDetector maintains many states, sets and threads. But if we choose to 
update deadnodes from the NameNode then all these states and threads are 
unrelated and I'm afraid it would need many if-else conditions. 

2. Make the DeadNodeDetector flexible. Like your suggestion in the future we 
might consider adding a new Detector which detects deadnodes by fetching the 
block locations. So I think using an Abstract class might be a good choice.

 
{quote}A point to note is getDatanodeReport is a very heavy call, Refetching 
block locations again might be cheaper in some cases. 
{quote}
Thanks your reminding ! Yes this is very important. For me the cost is ok 
because the dead node detector is only used for hbase. The cluster is always 
under 100 nodes. The update interval is 10min so I think it is fine for the 
NameNode. 

 

Shall I split this into 2 steps: first implement the abstract class of 
DeadNodeDetector, then add the new InServiceDetector to it. The current patch 
is a little big.

[~ayushtkn] Please correct me if I make anything wrong. Hope your further 
suggestions ! Hi [~leosun08], do you have time for this. Looking forward to 
your comments !

> DeadNodeDetector supports getting deadnode from NameNode.
> ---------------------------------------------------------
>
>                 Key: HDFS-15605
>                 URL: https://issues.apache.org/jira/browse/HDFS-15605
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: HDFS-15605.001.patch, HDFS-15605.002.patch, 
> HDFS-15605.003.patch
>
>
> When we are using DeadNodeDetector, sometimes it marks too many nodes as dead 
> and cause the read failures. The DeadNodeDetector assumes all the 
> getDatanodeInfo rpcs failed to return in time are dead nodes. But actually 
> not. A client side error or a slow rpc in DataNode might be marked as dead 
> too. For example the client side delay of the rpcThreadPool might cause the 
> getDatanodeInfo rpcs timeout and adding many datanodes to the dead list.
> We have a simple improvement for this: the NameNode already knows which 
> datanodes are dead. So just update the dead list from NameNode using 
> DFSClient.datanodeReport().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to