[ 
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071592#comment-15071592
 ] 

Phil Yang commented on HDFS-9600:
---------------------------------

I find the reason of test case breaking  and open a new issue to fix it on 
HDFS-9602

> do not check replication if the block is under construction
> -----------------------------------------------------------
>
>                 Key: HDFS-9600
>                 URL: https://issues.apache.org/jira/browse/HDFS-9600
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>            Priority: Critical
>         Attachments: HDFS-9600-v1.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old 
> GS will be considered as out of date. When changing GS, in 
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having 
> old GS which means we will remove all replicas because no DN has new GS until 
> the block with new GS is added to blockMaps again by 
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be 
> regarded as missing. The probability is low but if there are decommissioning 
> nodes the DecommissionManager.Monitor will scan all blocks belongs to 
> decommissioning nodes with a very fast speed so the probability of finding 
> missing block is very high but actually they are not missing. 
> Furthermore, after closing the appended file, in 
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If 
> some of nodes are decommissioning, this block with new GS will be added to 
> UnderReplicatedBlocks map so there are two blocks with same ID in this map, 
> one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in 
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many 
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the 
> block is under construction. We only check complete blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to