[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179335#comment-15179335
 ] 

Hua Liu commented on HDFS-9882:
-------------------------------

Hi [~arpiagariu]

When a data node needs to transfer a block, it validates the block in the 
heartbeat thread invoking the checkBlock method of FsDatasetImpl, where it 
checks whether the block exists and gets the block length. If the block is 
valid, it then spins off a thread to do the actual block transfer. During heavy 
disk IO that happened once in our environment, we found the heartbeat thread 
hang on "replicaInfo.getBlockFile().exists()" for more than 10 minutes.

> Add heartbeatsTotal in Datanode metrics
> ---------------------------------------
>
>                 Key: HDFS-9882
>                 URL: https://issues.apache.org/jira/browse/HDFS-9882
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode
>    Affects Versions: 2.7.2
>            Reporter: Hua Liu
>            Assignee: Hua Liu
>            Priority: Minor
>         Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to