[ https://issues.apache.org/jira/browse/HDFS-15605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216155#comment-17216155 ]
Jinglun commented on HDFS-15605: -------------------------------- Hi [~ayushtkn], thanks your nice comments ! If I understand correctly, your suggestion is to add some if-else to let the DeadNodeDetector has different behaviors. And the purpose is to keep the main structure unchanged for better stability and compatibility. When I first started working on this I did think about using some if-else to let the DeadNodeDetector updating deadnodes from the NameNode. Finally I chosen the current way beacuse: 1. To preserve the basic structure of the DeadNodeDetector. Adding the logic of InServiceDetector with many if-else conditions would make the DeadNodeDetector logic not clear and even harder to be understood. The DeadNodeDetector maintains many states, sets and threads. But if we choose to update deadnodes from the NameNode then all these states and threads are unrelated and I'm afraid it would need many if-else conditions. 2. Make the DeadNodeDetector flexible. Like your suggestion in the future we might consider adding a new Detector which detects deadnodes by fetching the block locations. So I think using an Abstract class might be a good choice. {quote}A point to note is getDatanodeReport is a very heavy call, Refetching block locations again might be cheaper in some cases. {quote} Thanks your reminding ! Yes this is very important. For me the cost is ok because the dead node detector is only used for hbase. The cluster is always under 100 nodes. The update interval is 10min so I think it is fine for the NameNode. Shall I split this into 2 steps: first implement the abstract class of DeadNodeDetector, then add the new InServiceDetector to it. The current patch is a little big. [~ayushtkn] Please correct me if I make anything wrong. Hope your further suggestions ! Hi [~leosun08], do you have time for this. Looking forward to your comments ! > DeadNodeDetector supports getting deadnode from NameNode. > --------------------------------------------------------- > > Key: HDFS-15605 > URL: https://issues.apache.org/jira/browse/HDFS-15605 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Jinglun > Assignee: Jinglun > Priority: Major > Attachments: HDFS-15605.001.patch, HDFS-15605.002.patch, > HDFS-15605.003.patch > > > When we are using DeadNodeDetector, sometimes it marks too many nodes as dead > and cause the read failures. The DeadNodeDetector assumes all the > getDatanodeInfo rpcs failed to return in time are dead nodes. But actually > not. A client side error or a slow rpc in DataNode might be marked as dead > too. For example the client side delay of the rpcThreadPool might cause the > getDatanodeInfo rpcs timeout and adding many datanodes to the dead list. > We have a simple improvement for this: the NameNode already knows which > datanodes are dead. So just update the dead list from NameNode using > DFSClient.datanodeReport(). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org