[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kihwal Lee updated HDFS-5438: ----------------------------- Status: Patch Available (was: Open) > Flaws in block report processing can cause data loss > ---------------------------------------------------- > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.2.0, 0.23.9 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)