[ https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774345#comment-17774345 ]
ASF GitHub Bot commented on HDFS-17218: --------------------------------------- haiyang1987 opened a new pull request, #6176: URL: https://github.com/apache/hadoop/pull/6176 ### Description of PR https://issues.apache.org/jira/browse/HDFS-17218 Currently found that DN will lose all pending DNA_INVALIDATE blocks if it restarts. DN enables asynchronously deletion, it have many pending deletion blocks in memory. when DN restarts, these cached blocks may be lost. it causes some blocks in the excess map in the namenode to be leaked and this will result in many blocks having more replicas then expected. **Root case** 1.block1 of dn1 is chosen as excess, added to excessRedundancyMap and add To Invalidates. 2.dn1 heartbeat gets Invalidates command. 3.dn1 will execute async deletion when receive commands, but before it is actually deleted, the service stop, so the block1 still exsit. 4.at this time, nn's excessRedundancyMap will still have the block of dn1 5. restart the dn, at this time nn has not determined that the dn is in a dead state. 6. dn restarts will FBR is executed (processFirstBlockReport will not be executed here, processReport will be executed). since block1 is not a new block, the processExtraRedundancy logic will not be executed. In HeartbeatManager#register(final DatanodeDescriptor d) https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java#L230-L238 ``` //here current dn still is alive(expired heartbeat time has not been exceeded), dn register will not call d.updateHeartbeatState, so torageInfo.hasReceivedBlockReport() still is true synchronized void register(final DatanodeDescriptor d) { if (!d.isAlive()) { addDatanode(d); //update its timestamp d.updateHeartbeatState(StorageReport.EMPTY_ARRAY, 0L, 0L, 0, 0, null); stats.add(d); } } ``` In BlockManager#processReport, the dn restart run FBR, here current dn still is alive,storageInfo.hasReceivedBlockReport() is true, so will call method processReport https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2916-L2946 ``` if (!storageInfo.hasReceivedBlockReport()) { // The first block report can be processed a lot more efficiently than // ordinary block reports. This shortens restart times. blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: Processing first " + "storage report for {} from datanode {}", strBlockReportId, fullBrLeaseId, storageInfo.getStorageID(), nodeID); processFirstBlockReport(storageInfo, newReport); } else { // Block reports for provided storage are not // maintained by DN heartbeats if (!StorageType.PROVIDED.equals(storageInfo.getStorageType())) { invalidatedBlocks = processReport(storageInfo, newReport); } } storageInfo.receivedBlockReport(); } finally { endTime = Time.monotonicNow(); namesystem.writeUnlock("processReport"); } if (blockLog.isDebugEnabled()) { for (Block b : invalidatedBlocks) { blockLog.debug("BLOCK* processReport 0x{} with lease ID 0x{}: {} on node {} size {} " + "does not belong to any file.", strBlockReportId, fullBrLeaseId, b, node, b.getNumBytes()); } } ``` In BlockManager#processReport run FBR, since the current DatanodeStorageInfo exists in the triplets in the BlockInfo corresponding to the reported block, so will not add toAdd list, addStoredBlock and processExtraRedundancy logic will not be executed. https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3044-L3085 ``` Collection<Block> processReport( final DatanodeStorageInfo storageInfo, final BlockListAsLongs report) throws IOException { // Normal case: // Modify the (block-->datanode) map, according to the difference // between the old and new block report. // Collection<BlockInfoToAdd> toAdd = new ArrayList<>(); Collection<BlockInfo> toRemove = new HashSet<>(); Collection<Block> toInvalidate = new ArrayList<>(); Collection<BlockToMarkCorrupt> toCorrupt = new ArrayList<>(); Collection<StatefulBlockInfo> toUC = new ArrayList<>(); reportDiff(storageInfo, report, toAdd, toRemove, toInvalidate, toCorrupt, toUC); DatanodeDescriptor node = storageInfo.getDatanodeDescriptor(); // Process the blocks on each queue for (StatefulBlockInfo b : toUC) { addStoredBlockUnderConstruction(b, storageInfo); } for (BlockInfo b : toRemove) { removeStoredBlock(b, node); } int numBlocksLogged = 0; for (BlockInfoToAdd b : toAdd) { addStoredBlock(b.stored, b.reported, storageInfo, null, numBlocksLogged < maxNumBlocksToLog); numBlocksLogged++; } if (numBlocksLogged > maxNumBlocksToLog) { blockLog.info("BLOCK* processReport: logged info for {} of {} " + "reported.", maxNumBlocksToLog, numBlocksLogged); } for (Block b : toInvalidate) { addToInvalidates(b, node); } for (BlockToMarkCorrupt b : toCorrupt) { markBlockAsCorrupt(b, storageInfo, node); } return toInvalidate; } ``` 7. so the block of dn1 will always exist in excessRedundancyMap (until HA switch is performed). In BlockManager#processChosenExcessRedundancy will add the redundancy of the given block stored in the given datanode to the excess map. https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4267-L4285 ``` private void processChosenExcessRedundancy( final Collection<DatanodeStorageInfo> nonExcess, final DatanodeStorageInfo chosen, BlockInfo storedBlock) { nonExcess.remove(chosen); excessRedundancyMap.add(chosen.getDatanodeDescriptor(), storedBlock); // // The 'excessblocks' tracks blocks until we get confirmation // that the datanode has deleted them; the only way we remove them // is when we get a "removeBlock" message. // // The 'invalidate' list is used to inform the datanode the block // should be deleted. Items are removed from the invalidate list // upon giving instructions to the datanodes. // final Block blockToInvalidate = getBlockOnStorage(storedBlock, chosen); addToInvalidates(blockToInvalidate, chosen.getDatanodeDescriptor()); blockLog.debug("BLOCK* chooseExcessRedundancies: ({}, {}) is added to invalidated blocks set", chosen, storedBlock); } ``` but because the dn side has not deleted the block, it will not call processIncrementalBlockReport, so the block of dn can not remove from excessRedundancyMap > NameNode should remove its excess blocks from the ExcessRedundancyMap When a > DN registers > ----------------------------------------------------------------------------------------- > > Key: HDFS-17218 > URL: https://issues.apache.org/jira/browse/HDFS-17218 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode > Reporter: Haiyang Hu > Assignee: Haiyang Hu > Priority: Major > > Currently found that DN will lose all pending DNA_INVALIDATE blocks if it > restarts. > *Root case* > Current DN enables asynchronously deletion, it have many pending deletion > blocks in memory. > when DN restarts, these cached blocks may be lost. it causes some blocks in > the excess map in the namenode to be leaked and this will result in many > blocks having more replicas then expected. > *solution* > Consider NameNode should remove its excess blocks from the > ExcessRedundancyMap When a DN registers, > this approach will ensure that when processing the DN's full block report, > the 'processExtraRedundancy' can be performed according to the actual of the > blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org