[jira] [Commented] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212107#comment-15212107 ] Hua Liu commented on HDFS-9901: --- Added comments as [~elgoiri] suggested. [~arpitagarwal], would you please take a look at the patch and help with submission? > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, > 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch, > 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Attachment: 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, > 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch, > 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Status: Open (was: Patch Available) > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, > 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch, > 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Status: Patch Available (was: Open) > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, > 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch, > 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Attachment: 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, > 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194414#comment-15194414 ] Hua Liu commented on HDFS-9901: --- Added comments for DFRefreshThread and DataCheckAndTransfer. > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, > 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Status: Patch Available (was: Open) > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, > 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Status: Open (was: Patch Available) > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Status: Patch Available (was: Open) > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Attachment: 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Status: Open (was: Patch Available) > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Summary: Move disk IO out of the heartbeat thread (was: Move block validation out of the heartbeat thread) > Move disk IO out of the heartbeat thread > > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Status: Patch Available (was: Open) > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Attachment: 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Status: Open (was: Patch Available) > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188664#comment-15188664 ] Hua Liu commented on HDFS-9901: --- Hi [~elgoiri], thanks for helping explain the approach. I've added it to the "Description" section. > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188663#comment-15188663 ] Hua Liu commented on HDFS-9901: --- Hi [~arpiagariu], I extended the "Description" section with some detailed information about the approach. > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Description: During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, which checks the existence and length of a block before spins off a thread to do the actual transferring. In extreme cases, the heartbeat thread hang more than 10 minutes so the namenode marked the datanode as dead and started replicating its blocks, which caused more disk IO on other nodes and can potentially brought them down. The patch contains two changes: 1. Makes DF asynchronous when monitoring the disk by creating a thread that checks the disk and updates the disk status periodically. When the heartbeat threads generates storage report, it then reads disk usage information from memory so that the heartbeat thread won't get blocked during heavy diskIO. 2. Makes the checks (which required disk accesses) in transferBlock() in DataNode into a separate thread so the heartbeat does not have to wait for this when heartbeating. was: During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, which checks the existence and length of a block before spins off a thread to do the actual transferring. In extreme cases, the heartbeat thread hang more than 10 minutes so the namenode marked the datanode as dead and started replicating its blocks, which caused more disk IO on other nodes and can potentially brought them down. The patch contains two changes: 1. Makes DF asynchronous when monitoring the disk by creating a thread that checks the disk and updates the disk status periodically. Then the FsVolumeImpl reads the values that are collected asynchronously. 2. > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Description: During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, which checks the existence and length of a block before spins off a thread to do the actual transferring. In extreme cases, the heartbeat thread hang more than 10 minutes so the namenode marked the datanode as dead and started replicating its blocks, which caused more disk IO on other nodes and can potentially brought them down. The patch contains two changes: 1. Makes DF asynchronous when monitoring the disk by creating a thread that checks the disk and updates the disk status periodically. Then the FsVolumeImpl reads the values that are collected asynchronously. 2. was: During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, which checks the existence and length of a block before spins off a thread to do the actual transferring. In extreme cases, the heartbeat thread hang more than 10 minutes so the namenode marked the datanode as dead and started replicating its blocks, which caused more disk IO on other nodes and can potentially brought them down. The patch contains two changes: 1. > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. Then the > FsVolumeImpl reads the values that are collected asynchronously. > 2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Description: During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, which checks the existence and length of a block before spins off a thread to do the actual transferring. In extreme cases, the heartbeat thread hang more than 10 minutes so the namenode marked the datanode as dead and started replicating its blocks, which caused more disk IO on other nodes and can potentially brought them down. The patch contains two changes: 1. was:During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, which checks the existence and length of a block before spins off a thread to do the actual transferring. In extreme cases, the heartbeat thread hang more than 10 minutes so the namenode marked the datanode as dead and started replicating its blocks, which caused more disk IO on other nodes and can potentially brought them down. > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Status: Patch Available (was: In Progress) > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Attachment: 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9901: -- Priority: Major (was: Minor) > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184312#comment-15184312 ] Hua Liu commented on HDFS-9882: --- Hi [~arpiagariu] I submitted the V4 patch a few hours ago but seems jenkins hasn't built it. I will re-submit tomorrow if jenkins still cannot kick in by tomorrow morning. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: Open) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Open (was: Patch Available) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: Open) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Attachment: 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Open (was: Patch Available) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182559#comment-15182559 ] Hua Liu commented on HDFS-9882: --- Hi [~arpiagariu] Since NumOps and AvgTime are appended to the metric name, HeartbeatTotalTImeAvgTime would look verbose and HeartbeatTotalTimeNumOps would appear confusing. We think heartbeatsTotal may be a good alternative. And we described this new metric in metrics.md. Please take a look at it and submit if you see fit. Thanks, Hua > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: Open) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Open (was: Patch Available) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Attachment: 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-9901) Move block validation out of the heartbeat thread
[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9901 started by Hua Liu. - > Move block validation out of the heartbeat thread > - > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9901) Move block validation out of the heartbeat thread
Hua Liu created HDFS-9901: - Summary: Move block validation out of the heartbeat thread Key: HDFS-9901 URL: https://issues.apache.org/jira/browse/HDFS-9901 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Hua Liu Assignee: Hua Liu Priority: Minor During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, which checks the existence and length of a block before spins off a thread to do the actual transferring. In extreme cases, the heartbeat thread hang more than 10 minutes so the namenode marked the datanode as dead and started replicating its blocks, which caused more disk IO on other nodes and can potentially brought them down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179335#comment-15179335 ] Hua Liu commented on HDFS-9882: --- Hi [~arpiagariu] When a data node needs to transfer a block, it validates the block in the heartbeat thread invoking the checkBlock method of FsDatasetImpl, where it checks whether the block exists and gets the block length. If the block is valid, it then spins off a thread to do the actual block transfer. During heavy disk IO that happened once in our environment, we found the heartbeat thread hang on "replicaInfo.getBlockFile().exists()" for more than 10 minutes. > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: Open) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work stopped] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9882 stopped by Hua Liu. - > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: In Progress (was: Patch Available) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Attachment: 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: Open) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Open (was: Patch Available) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: Open) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work stopped] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9882 stopped by Hua Liu. - > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Attachment: 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Attachment: (was: 0001-Add-heartbeatsTotal-metric.patch) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: Open) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Open (was: Patch Available) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: In Progress) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9882 started by Hua Liu. - > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu reopened HDFS-9882: --- > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu resolved HDFS-9882. --- Resolution: Fixed > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Attachment: 0001-Add-heartbeatsTotal-metric.patch > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Description: Heartbeat latency only reflects the time spent on generating reports and sending reports to NN. When heartbeats are delayed due to processing commands, this latency does not help investigation. I would like to propose to add another metric counter to show the total time. (was: Heartbeat latency only reflects the time spent on generating reports and sending reports to NN. When heartbeats are delayed due to processing commands, this latency does not help investigation. I would like to propose either (1) changing the heartbeat latency to reflect the total time spent on sending reports and processing commands or (2) adding another metric counter to show the total time. ) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Open (was: Patch Available) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: In Progress) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Summary: Add heartbeatsTotal in Datanode metrics (was: Change the meaning of heartbeat latency in Datanode metrics) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-9882) Change the meaning of heartbeat latency
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9882 started by Hua Liu. - > Change the meaning of heartbeat latency > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9882) Change the meaning of heartbeat latency
Hua Liu created HDFS-9882: - Summary: Change the meaning of heartbeat latency Key: HDFS-9882 URL: https://issues.apache.org/jira/browse/HDFS-9882 Project: Hadoop HDFS Issue Type: Task Components: datanode Reporter: Hua Liu Assignee: Hua Liu Priority: Minor Heartbeat latency only reflects the time spent on generating reports and sending reports to NN. When heartbeats are delayed due to processing commands, this latency does not help investigation. I would like to propose either (1) changing the heartbeat latency to reflect the total time spent on sending reports and processing commands or (2) adding another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)