[ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823578#comment-13823578
 ] 

Hudson commented on HDFS-5438:
------------------------------

FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #791 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/791/])
HDFS-5438. Flaws in block report processing can cause data loss. Contributed by 
Kihwal Lee. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1542058)
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClientFaultInjector.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java


> Flaws in block report processing can cause data loss
> ----------------------------------------------------
>
>                 Key: HDFS-5438
>                 URL: https://issues.apache.org/jira/browse/HDFS-5438
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.9, 2.2.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>             Fix For: 3.0.0, 2.3.0, 0.23.10
>
>         Attachments: HDFS-5438-1.trunk.patch, HDFS-5438-2.trunk.patch, 
> HDFS-5438-3.trunk.patch, HDFS-5438-4.branch-0.23.patch, 
> HDFS-5438-4.trunk.patch, HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to