[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977156#comment-14977156
 ] 

Chang Li commented on HDFS-9289:
--------------------------------

[~zhz], yes, the above log is from the same cluster as the first log I post.

The two replicas in two datanodes from updated pipeline had new GS but they 
were marked as corrupt because the block commit with old genstamp. 
The complete story happened in that cluster is:  there were initially 3 
datanodes in pipeline d1, d2, d3. Then pipelineupdate happen with only d2 and 
d3 with new GS. Then file complete with old GS and d2 and d3 were marked 
corrupt. Then after 1 day, full block report from d1 came in, and NN found out 
d1 has the the right block with "correct" old GS but d1 is under replicated, so 
NN told d1 to replicate its replica with old GS to the other two nodes, d4, d5. 
So the all 3DNs I showed above were d1, d4, and d5 having old GS.
I think there probabaly exist some cache coherence issue since 
{code}protected ExtendedBlock block;{code}
lack volatile. That could also explain why this issue didn't happen frequently.

> check genStamp when complete file
> ---------------------------------
>
>                 Key: HDFS-9289
>                 URL: https://issues.apache.org/jira/browse/HDFS-9289
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chang Li
>            Assignee: Chang Li
>            Priority: Critical
>         Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch, HDFS-9289.3.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to