[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss

Hadoop QA (JIRA) Mon, 28 Oct 2013 17:09:14 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807502#comment-13807502
 ]


Hadoop QA commented on HDFS-5438:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610696/HDFS-5438.trunk.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

                  org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery
                  org.apache.hadoop.hdfs.server.namenode.TestCorruptFilesJsp

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5299//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5299//console

This message is automatically generated.

> Flaws in block report processing can cause data loss
> ----------------------------------------------------
>
>                 Key: HDFS-5438
>                 URL: https://issues.apache.org/jira/browse/HDFS-5438
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.9, 2.2.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-5438-1.trunk.patch, HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss

Reply via email to