[ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hua Liu updated HDFS-9901: -------------------------- Summary: Move disk IO out of the heartbeat thread (was: Move block validation out of the heartbeat thread) > Move disk IO out of the heartbeat thread > ---------------------------------------- > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Reporter: Hua Liu > Assignee: Hua Liu > Attachments: > 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, > 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, > which checks the existence and length of a block before spins off a thread to > do the actual transferring. In extreme cases, the heartbeat thread hang more > than 10 minutes so the namenode marked the datanode as dead and started > replicating its blocks, which caused more disk IO on other nodes and can > potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that > checks the disk and updates the disk status periodically. When the heartbeat > threads generates storage report, it then reads disk usage information from > memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in > DataNode into a separate thread so the heartbeat does not have to wait for > this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)