[jira] [Commented] (HDFS-17218) NameNode should remove its excess blocks from the ExcessRedundancyMap When a DN registers

ASF GitHub Bot (Jira) Thu, 12 Oct 2023 00:02:56 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774345#comment-17774345
 ]


ASF GitHub Bot commented on HDFS-17218:
---------------------------------------

haiyang1987 opened a new pull request, #6176:
URL: https://github.com/apache/hadoop/pull/6176

   ### Description of PR
   https://issues.apache.org/jira/browse/HDFS-17218
   
   Currently found that DN will lose all pending DNA_INVALIDATE blocks if it 
restarts.
   DN enables asynchronously deletion, it have many pending deletion blocks in 
memory.
   when DN restarts, these cached blocks may be lost. it causes some blocks in 
the excess map in the namenode to be leaked and this will result in many blocks 
having more replicas then expected.
   
   **Root case**
   1.block1 of dn1 is chosen as excess, added to excessRedundancyMap and add To 
Invalidates.
   2.dn1 heartbeat gets Invalidates command.
   3.dn1 will execute async deletion when receive commands, but before it is 
actually deleted, the service stop, so the block1 still exsit.
   4.at this time, nn's excessRedundancyMap will still have the block of dn1
   5. restart the dn, at this time nn has not determined that the dn is in a 
dead state.
   6. dn restarts will FBR is executed (processFirstBlockReport will not be 
executed here, processReport will be executed). since block1 is not a new 
block, the processExtraRedundancy logic will not be executed.
   
   In HeartbeatManager#register(final DatanodeDescriptor d)
   
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java#L230-L238
   ```
   //here current dn still is alive(expired heartbeat time has not been 
exceeded), dn register will not call d.updateHeartbeatState, so 
torageInfo.hasReceivedBlockReport() still is true
   synchronized void register(final DatanodeDescriptor d) {
     if (!d.isAlive()) {
       addDatanode(d);
       //update its timestamp
       d.updateHeartbeatState(StorageReport.EMPTY_ARRAY, 0L, 0L, 0, 0, null);
       stats.add(d);
     }
   }
   ```
   
   In BlockManager#processReport, the dn restart run FBR, here current dn still 
is alive,storageInfo.hasReceivedBlockReport() is true, so will call method 
processReport
   
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2916-L2946
   ```
   if (!storageInfo.hasReceivedBlockReport()) {
           // The first block report can be processed a lot more efficiently 
than
           // ordinary block reports.  This shortens restart times.
           blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: 
Processing first "
               + "storage report for {} from datanode {}",
               strBlockReportId, fullBrLeaseId,
               storageInfo.getStorageID(),
               nodeID);
           processFirstBlockReport(storageInfo, newReport);
         } else {
           // Block reports for provided storage are not
           // maintained by DN heartbeats
           if (!StorageType.PROVIDED.equals(storageInfo.getStorageType())) {
             invalidatedBlocks = processReport(storageInfo, newReport);
           }
         }
         storageInfo.receivedBlockReport();
       } finally {
         endTime = Time.monotonicNow();
         namesystem.writeUnlock("processReport");
       }
   
       if (blockLog.isDebugEnabled()) {
         for (Block b : invalidatedBlocks) {
           blockLog.debug("BLOCK* processReport 0x{} with lease ID 0x{}: {} on 
node {} size {} " +
                   "does not belong to any file.", strBlockReportId, 
fullBrLeaseId, b,
               node, b.getNumBytes());
         }
       }
   ```
   In BlockManager#processReport run FBR, since the current DatanodeStorageInfo 
exists in the triplets in the BlockInfo corresponding to the reported block, so 
will not add toAdd list, addStoredBlock and processExtraRedundancy logic will 
not be executed.
   
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3044-L3085
   ```
   Collection<Block> processReport(
         final DatanodeStorageInfo storageInfo,
         final BlockListAsLongs report) throws IOException {
       // Normal case:
       // Modify the (block-->datanode) map, according to the difference
       // between the old and new block report.
       //
       Collection<BlockInfoToAdd> toAdd = new ArrayList<>();
       Collection<BlockInfo> toRemove = new HashSet<>();
       Collection<Block> toInvalidate = new ArrayList<>();
       Collection<BlockToMarkCorrupt> toCorrupt = new ArrayList<>();
       Collection<StatefulBlockInfo> toUC = new ArrayList<>();
       reportDiff(storageInfo, report,
                    toAdd, toRemove, toInvalidate, toCorrupt, toUC);
   
       DatanodeDescriptor node = storageInfo.getDatanodeDescriptor();
       // Process the blocks on each queue
       for (StatefulBlockInfo b : toUC) {
         addStoredBlockUnderConstruction(b, storageInfo);
       }
       for (BlockInfo b : toRemove) {
         removeStoredBlock(b, node);
       }
       int numBlocksLogged = 0;
       for (BlockInfoToAdd b : toAdd) {
         addStoredBlock(b.stored, b.reported, storageInfo, null,
             numBlocksLogged < maxNumBlocksToLog);
         numBlocksLogged++;
       }
       if (numBlocksLogged > maxNumBlocksToLog) {
         blockLog.info("BLOCK* processReport: logged info for {} of {} " +
             "reported.", maxNumBlocksToLog, numBlocksLogged);
       }
       for (Block b : toInvalidate) {
         addToInvalidates(b, node);
       }
       for (BlockToMarkCorrupt b : toCorrupt) {
         markBlockAsCorrupt(b, storageInfo, node);
       }
   
       return toInvalidate;
     }
   ```
   
   7. so the block of dn1 will always exist in excessRedundancyMap (until HA 
switch is performed).
   
   In BlockManager#processChosenExcessRedundancy will add the redundancy of the 
given block stored in the given datanode to the excess map.
   
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4267-L4285
   ```
   private void processChosenExcessRedundancy(
         final Collection<DatanodeStorageInfo> nonExcess,
         final DatanodeStorageInfo chosen, BlockInfo storedBlock) {
       nonExcess.remove(chosen);
       excessRedundancyMap.add(chosen.getDatanodeDescriptor(), storedBlock);
       //
       // The 'excessblocks' tracks blocks until we get confirmation
       // that the datanode has deleted them; the only way we remove them
       // is when we get a "removeBlock" message.
       //
       // The 'invalidate' list is used to inform the datanode the block
       // should be deleted.  Items are removed from the invalidate list
       // upon giving instructions to the datanodes.
       //
       final Block blockToInvalidate = getBlockOnStorage(storedBlock, chosen);
       addToInvalidates(blockToInvalidate, chosen.getDatanodeDescriptor());
       blockLog.debug("BLOCK* chooseExcessRedundancies: ({}, {}) is added to 
invalidated blocks set",
           chosen, storedBlock);
     }
   ```
   
   but because the dn side has not deleted the block, it will not call 
processIncrementalBlockReport, so the block of dn can not remove from 
excessRedundancyMap




> NameNode should remove its excess blocks from the ExcessRedundancyMap When a 
> DN registers
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-17218
>                 URL: https://issues.apache.org/jira/browse/HDFS-17218
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namanode
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>
> Currently found that DN will lose all pending DNA_INVALIDATE blocks if it 
> restarts.
> *Root case*
> Current DN enables asynchronously deletion, it have many pending deletion 
> blocks in memory.
> when DN restarts, these cached blocks may be lost. it causes some blocks in 
> the excess map in the namenode to be leaked and this will result in many 
> blocks having more replicas then expected.
> *solution*
> Consider NameNode should remove its excess blocks from the 
> ExcessRedundancyMap When a DN registers,
> this approach will ensure that when processing the DN's full block report, 
> the 'processExtraRedundancy' can be performed according to the actual of the 
> blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17218) NameNode should remove its excess blocks from the ExcessRedundancyMap When a DN registers

Reply via email to