[ 
https://issues.apache.org/jira/browse/HDFS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-5032.
------------------------------
    Resolution: Fixed

> Write pipeline failures caused by slow or busy disk may not be handled 
> properly.
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-5032
>                 URL: https://issues.apache.org/jira/browse/HDFS-5032
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta, 0.23.9
>            Reporter: Kihwal Lee
>            Assignee: Daryn Sharp
>
> Here is one scenario I have recently encountered in a hbase cluster.
> The 1st datanode in a write pipeline's disk became extremely busy for many 
> minutes and it caused block writes on the disk to slow down. The 2nd 
> datanode's socket read from the 1st datanode timed out in 60 seconds and 
> disconnected. This caused a block recovery. The problem was, the 1st datanode 
> hasn't written the last packet, but the downstream nodes did and ACK was sent 
> back to the client. For this reason, the block recovery was issued up to the 
> ACKed size. 
> During the recovery, the first datanode was told to do copyBlock(). Since it 
> didn't have enough data on disk, it waited in waitForMinLength(), which 
> didn't help, so the command failed. The connection was already established to 
> the target node for the copy, but the target never received any data. The 
> data packet was eventually written, but it was too late for the copyBlock() 
> call.
> The destination node for the copy had block metadata in memory, but no file 
> was created on disk. When client contacted this node for block recovery, it 
> too failed. 
> There are few problems:
> - The faulty (slow) node was not detected correctly. Instead, the 2nd DN was 
> excluded. The 1st DN's packet responder could have done a better job. It 
> didn't have any outstanding ACKs to receive.  Or the second DN could have 
> tried to hint the 1st DN of what happened. 
> - copyBlock() could probably wait longer than 3 seconds in 
> waitForMinLength(). Or it could check the on-disk size early on and fail 
> early even before trying to establish a connection to the target.
> - Failed targets in block write/copy should clean up the record or make it 
> recoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to