[ https://issues.apache.org/jira/browse/HDFS-15605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204415#comment-17204415 ]
Jinglun commented on HDFS-15605: -------------------------------- The failed tests seems unrelated. Upload v02 fix checkstyle. > DeadNodeDetector supports getting deadnode from NameNode. > --------------------------------------------------------- > > Key: HDFS-15605 > URL: https://issues.apache.org/jira/browse/HDFS-15605 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Jinglun > Assignee: Jinglun > Priority: Major > Attachments: HDFS-15605.001.patch, HDFS-15605.002.patch > > > When we are using DeadNodeDetector, sometimes it marks too many nodes as dead > and cause the read failures. The DeadNodeDetector assumes all the > getDatanodeInfo rpcs failed to return in time are dead nodes. But actually > not. A client side error or a slow rpc in DataNode might be marked as dead > too. For example the client side delay of the rpcThreadPool might cause the > getDatanodeInfo rpcs timeout and adding many datanodes to the dead list. > We have a simple improvement for this: the NameNode already knows which > datanodes are dead. So just update the dead list from NameNode using > DFSClient.datanodeReport(). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org