[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179292#comment-15179292 ]
Arpit Agarwal commented on HDFS-9882: ------------------------------------- Hi [~elgoiri], bq. heartbeats were reporting as running smoothly but the block report processing was actually getting stuck because of the disk and delaying the heartbeats which wasn't easy to monitor Do you mean processing commands from the NN was slow because of disk operations? Did you figure out which disk operations? IIRC we schedule async disk deletions to avoid this exact problem. Thanks. > Add heartbeatsTotal in Datanode metrics > --------------------------------------- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode > Affects Versions: 2.7.2 > Reporter: Hua Liu > Assignee: Hua Liu > Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)