[
https://issues.apache.org/jira/browse/HADOOP-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490475
]
Christian Kunz commented on HADOOP-1255:
----------------------------------------
Just for the record, our namenode servers with release 0.12.3 got into this
situation twice, once with a 1000-node cluster, once with a 500-node cluster.
In this situation the server spits out 300+ messages per sec and becomes rather
unresponsive to DFS clients.
> Name-node falls into infinite loop trying to remove a dead node.
> ----------------------------------------------------------------
>
> Key: HADOOP-1255
> URL: https://issues.apache.org/jira/browse/HADOOP-1255
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.3
> Reporter: Konstantin Shvachko
> Assigned To: Hairong Kuang
> Fix For: 0.13.0
>
> Attachments: heartbeat.patch
>
>
> Under certain conditions the name-node fall into infinite loop in
> heartbeatCheck().
> It's rather hard to reproduce. I'm running one node cluster: 1 name-node, 1
> data-node.
> The data-node dies, and 10 minutes later I get
> 07/04/12 10:40:34 INFO net.NetworkTopology: Removing a node:
> /default-rack/0.0.0.0:50077
> 07/04/12 10:44:35 INFO dfs.StateChange: BLOCK* NameSystem.heartbeatCheck:
> lost heartbeat from 0.0.0.0:50077
> ...................................................
> 07/04/12 10:45:17 INFO net.NetworkTopology: Removing a node:
> /default-rack/0.0.0.0:50077
> 07/04/12 10:47:44 INFO dfs.StateChange: BLOCK* NameSystem.heartbeatCheck:
> lost heartbeat from 0.0.0.0:50077
> Here is what I see in the debugger:
> FSNamesystem.heartbeats contains 2 identical (same instance)
> DatanodeDescriptor entries, both have
> DatanodeDescriptor.isAlive = false. The heartbeatCheck() correctly detects
> that there is a dead node in
> the list, but removeDatanode() does not delete the node from the heartbeats
> because it is dead.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.