[jira] [Commented] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-25 Thread Hua Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212107#comment-15212107
 ] 

Hua Liu commented on HDFS-9901:
---

Added comments as [~elgoiri] suggested.
[~arpitagarwal], would you please take a look at the patch and help with 
submission?

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, 
> 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch, 
> 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-25 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Attachment: 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, 
> 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch, 
> 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-25 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Status: Open  (was: Patch Available)

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, 
> 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch, 
> 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-25 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Status: Patch Available  (was: Open)

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, 
> 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch, 
> 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-14 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Attachment: 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, 
> 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-14 Thread Hua Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194414#comment-15194414
 ] 

Hua Liu commented on HDFS-9901:
---

Added comments for DFRefreshThread and DataCheckAndTransfer.

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, 
> 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-14 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Status: Patch Available  (was: Open)

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, 
> 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-14 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Status: Open  (was: Patch Available)

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-10 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Status: Patch Available  (was: Open)

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-10 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Attachment: 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-10 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Status: Open  (was: Patch Available)

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move disk IO out of the heartbeat thread

2016-03-10 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Summary: Move disk IO out of the heartbeat thread  (was: Move block 
validation out of the heartbeat thread)

> Move disk IO out of the heartbeat thread
> 
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Status: Patch Available  (was: Open)

> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Attachment: 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch

> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Status: Open  (was: Patch Available)

> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188664#comment-15188664
 ] 

Hua Liu commented on HDFS-9901:
---

Hi [~elgoiri], thanks for helping explain the approach. I've added it to the 
"Description" section.

> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188663#comment-15188663
 ] 

Hua Liu commented on HDFS-9901:
---

Hi [~arpiagariu], I extended the "Description" section with some detailed 
information about the approach. 

> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Description: 
During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
which checks the existence and length of a block before spins off a thread to 
do the actual transferring. In extreme cases, the heartbeat thread hang more 
than 10 minutes so the namenode marked the datanode as dead and started 
replicating its blocks, which caused more disk IO on other nodes and can 
potentially brought them down.

The patch contains two changes:
1. Makes DF asynchronous when monitoring the disk by creating a thread that 
checks the disk and updates the disk status periodically. When the heartbeat 
threads generates storage report, it then reads disk usage information from 
memory so that the heartbeat thread won't get blocked during heavy diskIO. 
2. Makes the checks (which required disk accesses) in transferBlock() in 
DataNode into a separate thread so the heartbeat does not have to wait for this 
when heartbeating.


  was:
During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
which checks the existence and length of a block before spins off a thread to 
do the actual transferring. In extreme cases, the heartbeat thread hang more 
than 10 minutes so the namenode marked the datanode as dead and started 
replicating its blocks, which caused more disk IO on other nodes and can 
potentially brought them down.

The patch contains two changes:
1. Makes DF asynchronous when monitoring the disk by creating a thread that 
checks the disk and updates the disk status periodically. Then the FsVolumeImpl 
reads the values that are collected asynchronously.
2. 



> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Description: 
During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
which checks the existence and length of a block before spins off a thread to 
do the actual transferring. In extreme cases, the heartbeat thread hang more 
than 10 minutes so the namenode marked the datanode as dead and started 
replicating its blocks, which caused more disk IO on other nodes and can 
potentially brought them down.

The patch contains two changes:
1. Makes DF asynchronous when monitoring the disk by creating a thread that 
checks the disk and updates the disk status periodically. Then the FsVolumeImpl 
reads the values that are collected asynchronously.
2. 


  was:
During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
which checks the existence and length of a block before spins off a thread to 
do the actual transferring. In extreme cases, the heartbeat thread hang more 
than 10 minutes so the namenode marked the datanode as dead and started 
replicating its blocks, which caused more disk IO on other nodes and can 
potentially brought them down.

The patch contains two changes:
1. 


> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. Then the 
> FsVolumeImpl reads the values that are collected asynchronously.
> 2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Description: 
During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
which checks the existence and length of a block before spins off a thread to 
do the actual transferring. In extreme cases, the heartbeat thread hang more 
than 10 minutes so the namenode marked the datanode as dead and started 
replicating its blocks, which caused more disk IO on other nodes and can 
potentially brought them down.

The patch contains two changes:
1. 

  was:During heavy disk IO, we noticed hearbeat thread hangs on checkBlock 
method, which checks the existence and length of a block before spins off a 
thread to do the actual transferring. In extreme cases, the heartbeat thread 
hang more than 10 minutes so the namenode marked the datanode as dead and 
started replicating its blocks, which caused more disk IO on other nodes and 
can potentially brought them down.


> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Status: Patch Available  (was: In Progress)

> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Attachment: 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch

> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
> Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-09 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--
Priority: Major  (was: Minor)

> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184312#comment-15184312
 ] 

Hua Liu commented on HDFS-9882:
---

Hi [~arpiagariu]

I submitted the V4 patch a few hours ago but seems jenkins hasn't built it. I 
will re-submit tomorrow if jenkins still cannot kick in by tomorrow morning.

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Open  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Attachment: 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Open  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-06 Thread Hua Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182559#comment-15182559
 ] 

Hua Liu commented on HDFS-9882:
---

Hi [~arpiagariu]

Since NumOps and AvgTime are appended to the metric name, 
HeartbeatTotalTImeAvgTime would look verbose and HeartbeatTotalTimeNumOps would 
appear confusing. We think heartbeatsTotal may be a good alternative. 

And we described this new metric in metrics.md. Please take a look at it and 
submit if you see fit.

Thanks,
Hua

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-06 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-06 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Open  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-06 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Attachment: 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-03 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9901 started by Hua Liu.
-
> Move block validation out of the heartbeat thread
> -
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9901) Move block validation out of the heartbeat thread

2016-03-03 Thread Hua Liu (JIRA)
Hua Liu created HDFS-9901:
-

 Summary: Move block validation out of the heartbeat thread
 Key: HDFS-9901
 URL: https://issues.apache.org/jira/browse/HDFS-9901
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Hua Liu
Assignee: Hua Liu
Priority: Minor


During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
which checks the existence and length of a block before spins off a thread to 
do the actual transferring. In extreme cases, the heartbeat thread hang more 
than 10 minutes so the namenode marked the datanode as dead and started 
replicating its blocks, which caused more disk IO on other nodes and can 
potentially brought them down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-03 Thread Hua Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179335#comment-15179335
 ] 

Hua Liu commented on HDFS-9882:
---

Hi [~arpiagariu]

When a data node needs to transfer a block, it validates the block in the 
heartbeat thread invoking the checkBlock method of FsDatasetImpl, where it 
checks whether the block exists and gets the block length. If the block is 
valid, it then spins off a thread to do the actual block transfer. During heavy 
disk IO that happened once in our environment, we found the heartbeat thread 
hang on "replicaInfo.getBlockFile().exists()" for more than 10 minutes.

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9882 stopped by Hua Liu.
-
> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: In Progress  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Attachment: 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Open  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9882 stopped by Hua Liu.
-
> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Attachment: 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Attachment: (was: 0001-Add-heartbeatsTotal-metric.patch)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Open  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: In Progress)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9882 started by Hua Liu.
-
> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu reopened HDFS-9882:
---

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-02 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu resolved HDFS-9882.
---
Resolution: Fixed

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Attachment: 0001-Add-heartbeatsTotal-metric.patch

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Description: Heartbeat latency only reflects the time spent on generating 
reports and sending reports to NN. When heartbeats are delayed due to 
processing commands, this latency does not help investigation. I would like to 
propose to add another metric counter to show the total time.   (was: Heartbeat 
latency only reflects the time spent on generating reports and sending reports 
to NN. When heartbeats are delayed due to processing commands, this latency 
does not help investigation. I would like to propose either (1) changing the 
heartbeat latency to reflect the total time spent on sending reports and 
processing commands or (2) adding another metric counter to show the total 
time. )

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Open  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: In Progress)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Summary: Add heartbeatsTotal in Datanode metrics  (was: Change the meaning 
of heartbeat latency in Datanode metrics)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-9882) Change the meaning of heartbeat latency

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9882 started by Hua Liu.
-
> Change the meaning of heartbeat latency
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9882) Change the meaning of heartbeat latency

2016-03-01 Thread Hua Liu (JIRA)
Hua Liu created HDFS-9882:
-

 Summary: Change the meaning of heartbeat latency
 Key: HDFS-9882
 URL: https://issues.apache.org/jira/browse/HDFS-9882
 Project: Hadoop HDFS
  Issue Type: Task
  Components: datanode
Reporter: Hua Liu
Assignee: Hua Liu
Priority: Minor


Heartbeat latency only reflects the time spent on generating reports and 
sending reports to NN. When heartbeats are delayed due to processing commands, 
this latency does not help investigation. I would like to propose either (1) 
changing the heartbeat latency to reflect the total time spent on sending 
reports and processing commands or (2) adding another metric counter to show 
the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)