[ https://issues.apache.org/jira/browse/HDFS-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062771#comment-14062771 ]
Andrew Wang commented on HDFS-6688: ----------------------------------- Hi Biju, The 10.5 minute dead node timeout has been around for a while, it's different from the heartbeat. We want to wait a conservative amount of time before marking a node as dead, since that will start re-replication for all the blocks on that DN (very I/O and network intensive). We do measure the "last heartbeat" time in places, and will mark a DN as "stale" if we haven't heard from it from a little while (e.g. 30s) but it's not yet dead. You could try looking at those metrics if you're interested in lower-latency detection methods. If this is satisfactory, could we close this JIRA? Thanks Biju. > Hadoop JMX stats are not refreshed > ---------------------------------- > > Key: HDFS-6688 > URL: https://issues.apache.org/jira/browse/HDFS-6688 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Ubuntu > Reporter: Biju Nair > > Even when the HDFS datanode process is stopped the JMX attribute > Hadoop.NameNode.FSNamesystemState.NumLiveDataNodes/NumDeadDataNodes attribute > values doesn't change. Also Hadoop.NameNode.NameNodeInfo.Attributes.LiveNodes > shows the stopped datanode details. If these attributes reflect the actual > changes in the datanode, they can be used to monitor the health of the HDFS > cluster which currently can't be used. -- This message was sent by Atlassian JIRA (v6.2#6252)