[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184426#comment-15184426 ] Hudson commented on HDFS-9882: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9441 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9441/]) HDFS-9882. Add heartbeatsTotal in Datanode metrics. (Contributed by Hua (arp: rev c2140d05efaf18b41caae8c61d9f6d668ab0e874) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Fix For: 2.8.0 > > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184393#comment-15184393 ] Hadoop QA commented on HDFS-9882: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 28s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 1s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 2s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 52s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 18s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s {color} | {color:red} root: patch generated 1 new + 83 unchanged - 0 fixed = 84 total (was 83) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 59s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 40s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 40s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 37s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 25s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184312#comment-15184312 ] Hua Liu commented on HDFS-9882: --- Hi [~arpiagariu] I submitted the V4 patch a few hours ago but seems jenkins hasn't built it. I will re-submit tomorrow if jenkins still cannot kick in by tomorrow morning. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184253#comment-15184253 ] Arpit Agarwal commented on HDFS-9882: - +1 pending jenkins for the v4 patch. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183887#comment-15183887 ] Arpit Agarwal commented on HDFS-9882: - bq. We think heartbeatsTotal may be a good alternative. Makes sense. Do you want to change the function name {{addHeartbeatTotalTime}} to be consistent with the metric name {{addHeartbeatTotal}}? The v3 patch looks fine otherwise. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182559#comment-15182559 ] Hua Liu commented on HDFS-9882: --- Hi [~arpiagariu] Since NumOps and AvgTime are appended to the metric name, HeartbeatTotalTImeAvgTime would look verbose and HeartbeatTotalTimeNumOps would appear confusing. We think heartbeatsTotal may be a good alternative. And we described this new metric in metrics.md. Please take a look at it and submit if you see fit. Thanks, Hua > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182534#comment-15182534 ] Hadoop QA commented on HDFS-9882: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 53s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 2s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 21s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 59s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 50s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 17s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 19s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s {color} | {color:red} root: patch generated 1 new + 82 unchanged - 0 fixed = 83 total (was 82) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 3s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 52s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 37s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 7s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 58s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} |
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182359#comment-15182359 ] Inigo Goiri commented on HDFS-9882: --- I missed HDFS-9901 which already does this, I marked it as duplicate. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181487#comment-15181487 ] Inigo Goiri commented on HDFS-9882: --- I created HDFS-9910 to make the disk operations in the heartbeat asynchronous. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180384#comment-15180384 ] Arpit Agarwal commented on HDFS-9882: - Disk operations can be slow on any platform if the disk is loaded or bad. So I think it is a good idea to move those operations out of the heartbeat processing path which is perf-sensitive. Would you consider filing a separate Jira to fix the {{checkBlock}} issue described by [~hualiu]? Meanwhile we can also add this new metric. Can you rename it to something like {{HeartbeatTotalTime}} and describe it in Metrics.md? > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180364#comment-15180364 ] Inigo Goiri commented on HDFS-9882: --- Just for the record, this happened in Windows where the Hadoop code might not be that optimized. Not sure if we can remove those operations; it might be a little too deep of a change. For now, our internal solution has been to make these operations into a different thread and make checks from the heartbeat one. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180348#comment-15180348 ] Arpit Agarwal commented on HDFS-9882: - Thanks that makes sense. This is a good find. Do you think it's a better idea to fix heartbeat handling to remove expensive operations? > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179335#comment-15179335 ] Hua Liu commented on HDFS-9882: --- Hi [~arpiagariu] When a data node needs to transfer a block, it validates the block in the heartbeat thread invoking the checkBlock method of FsDatasetImpl, where it checks whether the block exists and gets the block length. If the block is valid, it then spins off a thread to do the actual block transfer. During heavy disk IO that happened once in our environment, we found the heartbeat thread hang on "replicaInfo.getBlockFile().exists()" for more than 10 minutes. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179292#comment-15179292 ] Arpit Agarwal commented on HDFS-9882: - Hi [~elgoiri], bq. heartbeats were reporting as running smoothly but the block report processing was actually getting stuck because of the disk and delaying the heartbeats which wasn't easy to monitor Do you mean processing commands from the NN was slow because of disk operations? Did you figure out which disk operations? IIRC we schedule async disk deletions to avoid this exact problem. Thanks. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178841#comment-15178841 ] Inigo Goiri commented on HDFS-9882: --- I think we need to add a description to Metrics.md. Other than that, I think the patch is good (I don't see any related unit tests for this and the left checkstyle would break the style of the class). [~andrew.wang], [~arpitagarwal], do you guys think this is a useful addition? We found that the heartbeats were reporting as running smoothly but the block report processing was actually getting stuck because of the disk and delaying the heartbeats which wasn't easy to monitor. Actually, we are planning to open a separate JIRA to move some of the disk related checks to a separate thread. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177175#comment-15177175 ] Hadoop QA commented on HDFS-9882: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 84 unchanged - 0 fixed = 85 total (was 84) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 36s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 15s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 59m 35s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 161m 11s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.TestRollingUpgradeRollback | | | hadoop.hdfs.TestErasureCodeBenchmarkThroughput | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attach
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176786#comment-15176786 ] Hadoop QA commented on HDFS-9882: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 83 unchanged - 0 fixed = 85 total (was 83) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 47s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 21s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 144m 54s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock | | | hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs | \\ \\ || Subsystem || Report/Notes || | Docker | Image
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176335#comment-15176335 ] Hadoop QA commented on HDFS-9882: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 29s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 27s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 27s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 30s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 30s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 83 unchanged - 0 fixed = 85 total (was 83) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 33s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 20s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 24s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 28s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 34s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790819/0001-Add-heartbeatsTotal-metric.patch | | JIRA Issue | HDFS-9882 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux df5ab6dd5b00 3.13.0-36-lowlatency #63-Ubuntu