[ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981262#comment-14981262 ]
Zhe Zhang commented on HDFS-9289: --------------------------------- bq. That's silent data corruption! [~daryn] I agree it's a silent data corruption in the current logic because we update the NN's copy of the GS with the reported GS from the client: {code} // BlockInfo#commitBlock this.set(getBlockId(), block.getNumBytes(), block.getGenerationStamp()); {code} Throwing an exception (and therefore denying the commitBlock) turns this into an explicit failure, which is better. But it's still a data loss because the data written by the client after {{updatePipeline}} becomes invisible. So I think at least for this particular bug (lacking {{volatile}}), the right thing to do is to avoid changing NN's copy of GS when committing block (so we should avoid changing blockID as well). The only thing we should commit is {{numBytes}}. Of course we should still print a {{WARN}} or {{ERROR}} when GSes mismatch. As a safer first step we should at least avoid decrementing NN's copy of block GS. In general, if a client misreports GS, does it indicate a likelihood of misreported {{numBytes}} -- and therefore we should deny the {{commitBlock}}? It's hard to say; the {{volatile}} bug here is only for GS. But since we have already ensured the NN's copy of block {{numBytes}} never decrements, the harm of a misreported {{numBytes}} is not severe. > check genStamp when complete file > --------------------------------- > > Key: HDFS-9289 > URL: https://issues.apache.org/jira/browse/HDFS-9289 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Chang Li > Assignee: Chang Li > Priority: Critical > Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch, HDFS-9289.3.patch, > HDFS-9289.4.patch > > > we have seen a case of corrupt block which is caused by file complete after a > pipelineUpdate, but the file complete with the old block genStamp. This > caused the replicas of two datanodes in updated pipeline to be viewed as > corrupte. Propose to check genstamp when commit block -- This message was sent by Atlassian JIRA (v6.3.4#6332)