[ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9901:
--------------------------
    Description: 
During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
which checks the existence and length of a block before spins off a thread to 
do the actual transferring. In extreme cases, the heartbeat thread hang more 
than 10 minutes so the namenode marked the datanode as dead and started 
replicating its blocks, which caused more disk IO on other nodes and can 
potentially brought them down.

The patch contains two changes:
1. Makes DF asynchronous when monitoring the disk by creating a thread that 
checks the disk and updates the disk status periodically. Then the FsVolumeImpl 
reads the values that are collected asynchronously.
2. 


  was:
During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
which checks the existence and length of a block before spins off a thread to 
do the actual transferring. In extreme cases, the heartbeat thread hang more 
than 10 minutes so the namenode marked the datanode as dead and started 
replicating its blocks, which caused more disk IO on other nodes and can 
potentially brought them down.

The patch contains two changes:
1. 


> Move block validation out of the heartbeat thread
> -------------------------------------------------
>
>                 Key: HDFS-9901
>                 URL: https://issues.apache.org/jira/browse/HDFS-9901
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Hua Liu
>            Assignee: Hua Liu
>         Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. Then the 
> FsVolumeImpl reads the values that are collected asynchronously.
> 2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to