[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807502#comment-13807502 ]
Hadoop QA commented on HDFS-5438: --------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610696/HDFS-5438.trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery org.apache.hadoop.hdfs.server.namenode.TestCorruptFilesJsp {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5299//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5299//console This message is automatically generated. > Flaws in block report processing can cause data loss > ---------------------------------------------------- > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 0.23.9, 2.2.0 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > Attachments: HDFS-5438-1.trunk.patch, HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)