[ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-900:
-------------------------------------

    Attachment: reportCorruptBlock.patch

Yes, this is indeed a bug in block report. After step 3 in Todd's description 
the NN has 3 good replicas and one corrupt. The corrupt replica is in 
recentInvalidatesSet, but not in the DatanodeDescriptor. That is the replica is 
scheduled for deletion from the DN. See blockReceived(). 
But before it is deleted from the DN, that same DN sends a block report, which 
contains the replica. DatanodeDescriptor.processReport() treats it as a new 
replica because it is not in the DatanodeDescriptor and a good one since its 
blockId, generationStamp, and the length are in order.
The fix is to ignore replicas that are scheduled for deletion from this DN.
I tested this patch with the test case attached by Todd, thanks. The test 
passes with the fix and fails without.
The test case is not exactly a unit test as it introduces changes to 
FSNamesystem class for testing. So I did not include it to the patch.
Todd, is it possible to convert your case into a real unit test.

> Corrupt replicas are not tracked correctly through block report from DN
> -----------------------------------------------------------------------
>
>                 Key: HDFS-900
>                 URL: https://issues.apache.org/jira/browse/HDFS-900
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.22.0
>
>         Attachments: log-commented, reportCorruptBlock.patch, 
> to-reproduce.patch
>
>
> This one is tough to describe, but essentially the following order of events 
> is seen to occur:
> # A client marks one replica of a block to be corrupt by telling the NN about 
> it
> # Replication is then scheduled to make a new replica of this node
> # The replication completes, such that there are now 3 good replicas and 1 
> corrupt replica
> # The DN holding the corrupt replica sends a block report. Rather than 
> telling this DN to delete the node, the NN instead marks this as a new *good* 
> replica of the block, and schedules deletion on one of the good replicas.
> I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
> dfs.replication=2, but it seems feasible. I will attach a debug log with some 
> commentary marked by '============>', plus a unit test patch which I can get 
> to reproduce this behavior reliably. (it's not a proper unit test, just some 
> edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to