[jira] Updated: (HDFS-1595) DFSClient may incorrectly detect datanode failure

Tsz Wo (Nicholas), SZE (JIRA) Wed, 26 Jan 2011 10:13:10 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tsz Wo (Nicholas), SZE updated HDFS-1595:
-----------------------------------------

    Description: 
Suppose a source datanode S is writing to a destination datanode D in a write 
pipeline.  We have an implicit assumption that _if S catches an exception when 
it is writing to D, then D is faulty and S is fine._  As a result, DFSClient 
will take out D from the pipeline, reconstruct the write pipeline with the 
remaining datanodes and then continue writing .

However, we find a case that the faulty machine F is indeed S but not D.  In 
the case we found, F has a faulty network interface (or a faulty switch port) 
in such a way that the faulty network interface works fine when transferring a 
small amount of data, say 1MB, but it often fails when transferring a large 
amount of data, say 100MB.

It is even worst if F is the first datanode in the pipeline.  Consider the 
following:
# DFSClient creates a pipeline with three datanodes.  The first datanode is F.
# F catches an IOException when writing to the second datanode. Then, F reports 
the second datanode has error.
# DFSClient removes the second datanode from the pipeline and continue writing 
with the remaining datanode(s).
# The pipeline now has two datanodes but (2) and (3) repeat.
# Now, only F remains in the pipeline.  DFSClient continues writing with one 
replica in F.
# The write succeeds and DFSClient is able to *close the file successfully*.
# The block is under replicated.  The NameNode schedules replication from F to 
some other datanode D.
# The replication fails for the same reason.  D reports to the NameNode that 
the replica in F is corrupted.
# The NameNode marks the replica in F is corrupted.
# The block is corrupted since no replica is available.

We were able to manually divide the replicas into small files and copy them out 
from F without fixing the hardware.  The replicas seems uncorrupted.  This is a 
*data availability problem*.

  was:
Suppose a source datanode S is writing to a destination datanode D in a write 
pipeline.  We have an implicit assumption that _if S catches an exception when 
it is writing to D, then D is faulty and S is fine._  As a result, DFSClient 
will take out D from the pipeline, reconstruct the write pipeline with the 
remaining datanodes and then continue writing .

However, we find a case that the faulty machine F is indeed S but not D.  In 
the case we found, F has a faulty network interface (or a faulty switch port) 
in such a way that the faulty network interface works fine when sending out a 
small amount of data, say 1MB, but it fails when sending out a large amount of 
data, say 100MB.  Reading is working fine for any data size.

It is even worst if F is the first datanode in the pipeline.  Consider the 
following:
# DFSClient creates a pipeline with three datanodes.  The first datanode is F.
# F catches an IOException when writing to the second datanode. Then, F reports 
the second datanode has error.
# DFSClient removes the second datanode from the pipeline and continue writing 
with the remaining datanode(s).
# The pipeline now has two datanodes but (2) and (3) repeat.
# Now, only F remains in the pipeline.  DFSClient continues writing with one 
replica in F.
# The write succeeds and DFSClient is able to *close the file successfully*.
# The block is under replicated.  The NameNode schedules replication from F to 
some other datanode D.
# The replication fails for the same reason.  D reports to the NameNode that 
the replica in F is corrupted.
# The NameNode marks the replica in F is corrupted.
# The block is corrupted since no replica is available.

This is a *data loss* scenario.


Revised the description.  Thanks Koji and Dhruba for correcting me.

> DFSClient may incorrectly detect datanode failure
> -------------------------------------------------
>
>                 Key: HDFS-1595
>                 URL: https://issues.apache.org/jira/browse/HDFS-1595
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 0.20.4
>            Reporter: Tsz Wo (Nicholas), SZE
>            Priority: Critical
>         Attachments: hdfs-1595-idea.txt
>
>
> Suppose a source datanode S is writing to a destination datanode D in a write 
> pipeline.  We have an implicit assumption that _if S catches an exception 
> when it is writing to D, then D is faulty and S is fine._  As a result, 
> DFSClient will take out D from the pipeline, reconstruct the write pipeline 
> with the remaining datanodes and then continue writing .
> However, we find a case that the faulty machine F is indeed S but not D.  In 
> the case we found, F has a faulty network interface (or a faulty switch port) 
> in such a way that the faulty network interface works fine when transferring 
> a small amount of data, say 1MB, but it often fails when transferring a large 
> amount of data, say 100MB.
> It is even worst if F is the first datanode in the pipeline.  Consider the 
> following:
> # DFSClient creates a pipeline with three datanodes.  The first datanode is F.
> # F catches an IOException when writing to the second datanode. Then, F 
> reports the second datanode has error.
> # DFSClient removes the second datanode from the pipeline and continue 
> writing with the remaining datanode(s).
> # The pipeline now has two datanodes but (2) and (3) repeat.
> # Now, only F remains in the pipeline.  DFSClient continues writing with one 
> replica in F.
> # The write succeeds and DFSClient is able to *close the file successfully*.
> # The block is under replicated.  The NameNode schedules replication from F 
> to some other datanode D.
> # The replication fails for the same reason.  D reports to the NameNode that 
> the replica in F is corrupted.
> # The NameNode marks the replica in F is corrupted.
> # The block is corrupted since no replica is available.
> We were able to manually divide the replicas into small files and copy them 
> out from F without fixing the hardware.  The replicas seems uncorrupted.  
> This is a *data availability problem*.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-1595) DFSClient may incorrectly detect datanode failure

Reply via email to