Jinglun created HDFS-15809:
------------------------------
Summary: DeadNodeDetector doesn't remove live nodes from dead node
set.
Key: HDFS-15809
URL: https://issues.apache.org/jira/browse/HDFS-15809
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Jinglun
We found the dead node detector might never remove the alive nodes from the
dead node set in a big cluster. For example:
# 200 nodes are added to the dead node set by DeadNodeDetector.
# DeadNodeDetector#checkDeadNodes() adds 100 nodes to the deadNodesProbeQueue
because the queue limited length is 100.
# The probe threads start working and probe 30 nodes.
# DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead
node set and adds 30 nodes to the deadNodesProbeQueue. But the order is the
same as the last time. So the 30 nodes that has already been probed are added
to the queue again.
# Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If
they are all dead then the live nodes behind them could never be recovered.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]