[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

ASF GitHub Bot (Jira) Thu, 15 Oct 2020 16:52:24 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501332&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501332
 ]


ASF GitHub Bot logged work on HDFS-15634:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Oct/20 23:51
            Start Date: 15/Oct/20 23:51
    Worklog Time Spent: 10m 
      Work Description: goiri commented on a change in pull request #2388:
URL: https://github.com/apache/hadoop/pull/2388#discussion_r505926806



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##########
@@ -3512,7 +3512,11 @@ private Block addStoredBlock(final BlockInfo block,
     int numUsableReplicas = num.liveReplicas() +
         num.decommissioning() + num.liveEnteringMaintenanceReplicas();
 
-    if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED &&
+
+    // if block is still under construction, then done for now
+    if (!storedBlock.isCompleteOrCommitted()) {

Review comment:
       Why do we move this block here?
   BTW we can leave it as a single if with a return.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##########
@@ -3559,9 +3558,26 @@ private Block addStoredBlock(final BlockInfo block,
     if ((corruptReplicasCount > 0) && (numLiveReplicas >= fileRedundancy)) {
       invalidateCorruptReplicas(storedBlock, reportedBlock, num);
     }
+    if (shouldInvalidateDecommissionedRedundancy(num, fileRedundancy)) {
+      for (DatanodeStorageInfo storage : blocksMap.getStorages(block)) {
+        final DatanodeDescriptor datanode = storage.getDatanodeDescriptor();
+        if (datanode.isDecommissioned()
+            || datanode.isDecommissionInProgress()) {
+          addToInvalidates(storedBlock, datanode);
+        }
+      }
+    }
     return storedBlock;
   }
 
+  // If there are enough live replicas, start invalidating
+  // decommissioned + decommissioning replicas
+  private boolean shouldInvalidateDecommissionedRedundancy(NumberReplicas num,

Review comment:
       It makes sense. Maybe we should describe some of the JIRA description in 
this method to explain what we are doing in the high level.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 501332)
    Time Spent: 20m  (was: 10m)

> Invalidate block on decommissioning DataNode after replication
> --------------------------------------------------------------
>
>                 Key: HDFS-15634
>                 URL: https://issues.apache.org/jira/browse/HDFS-15634
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Fengnan Li
>            Assignee: Fengnan Li
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 
> DataNodes, then marked as decommissioned. These blocks are not touched since 
> they are not counted as live replicas.
> Proposal: Invalidate these blocks once they are replicated and there are 
> enough live replicas in the cluster.
> Reason: A recent shutdown of decommissioned datanodes to finished the flow 
> caused Namenode latency spike since namenode needs to remove all of the 
> blocks from its memory and this step requires holding write lock. If we have 
> gradually invalidated these blocks the deletion will be much easier and 
> faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

Reply via email to