[ https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kihwal Lee resolved HDFS-1595. ------------------------------ Resolution: Duplicate Fix Version/s: HDFS-9178 > DFSClient may incorrectly detect datanode failure > ------------------------------------------------- > > Key: HDFS-1595 > URL: https://issues.apache.org/jira/browse/HDFS-1595 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client > Reporter: Tsz Wo Nicholas Sze > Priority: Critical > Fix For: HDFS-9178 > > Attachments: hdfs-1595-idea.txt > > > Suppose a source datanode S is writing to a destination datanode D in a write > pipeline. We have an implicit assumption that _if S catches an exception > when it is writing to D, then D is faulty and S is fine._ As a result, > DFSClient will take out D from the pipeline, reconstruct the write pipeline > with the remaining datanodes and then continue writing . > However, we find a case that the faulty machine F is indeed S but not D. In > the case we found, F has a faulty network interface (or a faulty switch port) > in such a way that the faulty network interface works fine when transferring > a small amount of data, say 1MB, but it often fails when transferring a large > amount of data, say 100MB. > It is even worst if F is the first datanode in the pipeline. Consider the > following: > # DFSClient creates a pipeline with three datanodes. The first datanode is F. > # F catches an IOException when writing to the second datanode. Then, F > reports the second datanode has error. > # DFSClient removes the second datanode from the pipeline and continue > writing with the remaining datanode(s). > # The pipeline now has two datanodes but (2) and (3) repeat. > # Now, only F remains in the pipeline. DFSClient continues writing with one > replica in F. > # The write succeeds and DFSClient is able to *close the file successfully*. > # The block is under replicated. The NameNode schedules replication from F > to some other datanode D. > # The replication fails for the same reason. D reports to the NameNode that > the replica in F is corrupted. > # The NameNode marks the replica in F is corrupted. > # The block is corrupted since no replica is available. > We were able to manually divide the replicas into small files and copy them > out from F without fixing the hardware. The replicas seems uncorrupted. > This is a *data availability problem*. -- This message was sent by Atlassian JIRA (v6.3.4#6332)