[ https://issues.apache.org/jira/browse/HDFS-15605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215809#comment-17215809 ]
Ayush Saxena commented on HDFS-15605: ------------------------------------- Just had a very quick look, Can we not leverage the existing class, rather than having an AbstractClass then implementing two different child classes. Can we just not add a configuration for this behaviour, and if this configuration is turned on, The client can go to the Namenode to confirm the details, else work as usual. A point to note is getDatanodeReport is a very heavy call, Refetching block locations again might be cheaper in some cases. :) > DeadNodeDetector supports getting deadnode from NameNode. > --------------------------------------------------------- > > Key: HDFS-15605 > URL: https://issues.apache.org/jira/browse/HDFS-15605 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Jinglun > Assignee: Jinglun > Priority: Major > Attachments: HDFS-15605.001.patch, HDFS-15605.002.patch, > HDFS-15605.003.patch > > > When we are using DeadNodeDetector, sometimes it marks too many nodes as dead > and cause the read failures. The DeadNodeDetector assumes all the > getDatanodeInfo rpcs failed to return in time are dead nodes. But actually > not. A client side error or a slow rpc in DataNode might be marked as dead > too. For example the client side delay of the rpcThreadPool might cause the > getDatanodeInfo rpcs timeout and adding many datanodes to the dead list. > We have a simple improvement for this: the NameNode already knows which > datanodes are dead. So just update the dead list from NameNode using > DFSClient.datanodeReport(). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org