[ https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287014#comment-17287014 ]
Jinglun commented on HDFS-15809: -------------------------------- I haven't deal with the checkstyle complain and it is out of date now(cry). Re-upload v02 to trigger the jenkins. > DeadNodeDetector doesn't remove live nodes from dead node set. > -------------------------------------------------------------- > > Key: HDFS-15809 > URL: https://issues.apache.org/jira/browse/HDFS-15809 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Jinglun > Assignee: Jinglun > Priority: Major > Attachments: HDFS-15809.001.patch, HDFS-15809.002.patch > > > We found the dead node detector might never remove the alive nodes from the > dead node set in a big cluster. For example: > # 200 nodes are added to the dead node set by DeadNodeDetector. > # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the > deadNodesProbeQueue because the queue limited length is 100. > # The probe threads start working and probe 30 nodes. > # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead > node set and adds 30 nodes to the deadNodesProbeQueue. But the order is the > same as the last time. So the 30 nodes that has already been probed are added > to the queue again. > # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If > they are all dead then the live nodes behind them could never be recovered. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org