[ https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072703#comment-15072703 ]
Phil Yang commented on HDFS-9600: --------------------------------- Hi, Thanks for your reply. I'm not sure what is the difference between getUnderConstructionFeature() == null and isComplete(), I thought they equals each other so I use the simpler one. Is there any chance that isComplete()==false but we still check the replication or isComplete()==true but we should not check replication? Please correct me if I am wrong :) > do not check replication if the block is under construction > ----------------------------------------------------------- > > Key: HDFS-9600 > URL: https://issues.apache.org/jira/browse/HDFS-9600 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Phil Yang > Assignee: Phil Yang > Priority: Critical > Attachments: HDFS-9600-v1.patch, HDFS-9600-v2.patch > > > When appending a file, we will update pipeline to bump a new GS and the old > GS will be considered as out of date. When changing GS, in > BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having > old GS which means we will remove all replicas because no DN has new GS until > the block with new GS is added to blockMaps again by > DatanodeProtocol.blockReceivedAndDeleted. > If we check replication of this block before it is added back, it will be > regarded as missing. The probability is low but if there are decommissioning > nodes the DecommissionManager.Monitor will scan all blocks belongs to > decommissioning nodes with a very fast speed so the probability of finding > missing block is very high but actually they are not missing. > Furthermore, after closing the appended file, in > FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If > some of nodes are decommissioning, this block with new GS will be added to > UnderReplicatedBlocks map so there are two blocks with same ID in this map, > one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in > QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many > missing blocks warning in NameNode website but there is no corrupt files... > Therefore, I think the solution is we should not check replication if the > block is under construction. We only check complete blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)