[ https://issues.apache.org/jira/browse/HDFS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kihwal Lee resolved HDFS-5032. ------------------------------ Resolution: Fixed > Write pipeline failures caused by slow or busy disk may not be handled > properly. > -------------------------------------------------------------------------------- > > Key: HDFS-5032 > URL: https://issues.apache.org/jira/browse/HDFS-5032 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.1.0-beta, 0.23.9 > Reporter: Kihwal Lee > Assignee: Daryn Sharp > > Here is one scenario I have recently encountered in a hbase cluster. > The 1st datanode in a write pipeline's disk became extremely busy for many > minutes and it caused block writes on the disk to slow down. The 2nd > datanode's socket read from the 1st datanode timed out in 60 seconds and > disconnected. This caused a block recovery. The problem was, the 1st datanode > hasn't written the last packet, but the downstream nodes did and ACK was sent > back to the client. For this reason, the block recovery was issued up to the > ACKed size. > During the recovery, the first datanode was told to do copyBlock(). Since it > didn't have enough data on disk, it waited in waitForMinLength(), which > didn't help, so the command failed. The connection was already established to > the target node for the copy, but the target never received any data. The > data packet was eventually written, but it was too late for the copyBlock() > call. > The destination node for the copy had block metadata in memory, but no file > was created on disk. When client contacted this node for block recovery, it > too failed. > There are few problems: > - The faulty (slow) node was not detected correctly. Instead, the 2nd DN was > excluded. The 1st DN's packet responder could have done a better job. It > didn't have any outstanding ACKs to receive. Or the second DN could have > tried to hint the 1st DN of what happened. > - copyBlock() could probably wait longer than 3 seconds in > waitForMinLength(). Or it could check the on-disk size early on and fail > early even before trying to establish a connection to the target. > - Failed targets in block write/copy should clean up the record or make it > recoverable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)