[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179335#comment-15179335 ]
Hua Liu commented on HDFS-9882: ------------------------------- Hi [~arpiagariu] When a data node needs to transfer a block, it validates the block in the heartbeat thread invoking the checkBlock method of FsDatasetImpl, where it checks whether the block exists and gets the block length. If the block is valid, it then spins off a thread to do the actual block transfer. During heavy disk IO that happened once in our environment, we found the heartbeat thread hang on "replicaInfo.getBlockFile().exists()" for more than 10 minutes. > Add heartbeatsTotal in Datanode metrics > --------------------------------------- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode > Affects Versions: 2.7.2 > Reporter: Hua Liu > Assignee: Hua Liu > Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)