[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981262#comment-14981262
 ] 

Zhe Zhang commented on HDFS-9289:
---------------------------------

bq. That's silent data corruption!
[~daryn] I agree it's a silent data corruption in the current logic because we 
update the NN's copy of the GS with the reported GS from the client:
{code}
// BlockInfo#commitBlock
this.set(getBlockId(), block.getNumBytes(), block.getGenerationStamp());
{code}

Throwing an exception (and therefore denying the commitBlock) turns this into 
an explicit failure, which is better. But it's still a data loss because the 
data written by the client after {{updatePipeline}} becomes invisible. 

So I think at least for this particular bug (lacking {{volatile}}), the right 
thing to do is to avoid changing NN's copy of GS when committing block (so we 
should avoid changing blockID as well). The only thing we should commit is 
{{numBytes}}. Of course we should still print a {{WARN}} or {{ERROR}} when GSes 
mismatch. As a safer first step we should at least avoid decrementing NN's copy 
of block GS.

In general, if a client misreports GS, does it indicate a likelihood of 
misreported {{numBytes}} -- and therefore we should deny the {{commitBlock}}? 
It's hard to say; the {{volatile}} bug here is only for GS. But since we have 
already ensured the NN's copy of block {{numBytes}} never decrements, the harm 
of a misreported {{numBytes}} is not severe.

> check genStamp when complete file
> ---------------------------------
>
>                 Key: HDFS-9289
>                 URL: https://issues.apache.org/jira/browse/HDFS-9289
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chang Li
>            Assignee: Chang Li
>            Priority: Critical
>         Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch, HDFS-9289.3.patch, 
> HDFS-9289.4.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to