[jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure

dhruba borthakur (JIRA) Wed, 09 Feb 2011 01:01:25 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992396#comment-12992396
 ]


dhruba borthakur commented on HDFS-1595:
----------------------------------------

Error recovery is a pain when a datanode in a write pipeline fails. Sometimes 
it is truly difficult for the client to accurately determine which datanode 
failed. Does it make sense to change the algorithm itself: what are the 
tradeoff's if we say that when the number of datanode in the write-pipeline 
decreases to min.replication, the client streams data directly to all remaining 
(or new) datanodes, instead of pipelining? If new datanodes fail, the client 
will find it easy to determine accurately which datanodes are dead.


> DFSClient may incorrectly detect datanode failure
> -------------------------------------------------
>
>                 Key: HDFS-1595
>                 URL: https://issues.apache.org/jira/browse/HDFS-1595
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 0.20.4
>            Reporter: Tsz Wo (Nicholas), SZE
>            Priority: Critical
>         Attachments: hdfs-1595-idea.txt
>
>
> Suppose a source datanode S is writing to a destination datanode D in a write 
> pipeline.  We have an implicit assumption that _if S catches an exception 
> when it is writing to D, then D is faulty and S is fine._  As a result, 
> DFSClient will take out D from the pipeline, reconstruct the write pipeline 
> with the remaining datanodes and then continue writing .
> However, we find a case that the faulty machine F is indeed S but not D.  In 
> the case we found, F has a faulty network interface (or a faulty switch port) 
> in such a way that the faulty network interface works fine when transferring 
> a small amount of data, say 1MB, but it often fails when transferring a large 
> amount of data, say 100MB.
> It is even worst if F is the first datanode in the pipeline.  Consider the 
> following:
> # DFSClient creates a pipeline with three datanodes.  The first datanode is F.
> # F catches an IOException when writing to the second datanode. Then, F 
> reports the second datanode has error.
> # DFSClient removes the second datanode from the pipeline and continue 
> writing with the remaining datanode(s).
> # The pipeline now has two datanodes but (2) and (3) repeat.
> # Now, only F remains in the pipeline.  DFSClient continues writing with one 
> replica in F.
> # The write succeeds and DFSClient is able to *close the file successfully*.
> # The block is under replicated.  The NameNode schedules replication from F 
> to some other datanode D.
> # The replication fails for the same reason.  D reports to the NameNode that 
> the replica in F is corrupted.
> # The NameNode marks the replica in F is corrupted.
> # The block is corrupted since no replica is available.
> We were able to manually divide the replicas into small files and copy them 
> out from F without fixing the hardware.  The replicas seems uncorrupted.  
> This is a *data availability problem*.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure

Reply via email to