[jira] [Work logged] (HDFS-16438) Avoid holding read locks for a long time when scanDatanodeStorage

ASF GitHub Bot (Jira) Wed, 26 Jan 2022 05:59:05 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-16438?focusedWorklogId=715695&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715695
 ]


ASF GitHub Bot logged work on HDFS-16438:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jan/22 13:58
            Start Date: 26/Jan/22 13:58
    Worklog Time Spent: 10m 
      Work Description: sodonnel commented on pull request #3928:
URL: https://github.com/apache/hadoop/pull/3928#issuecomment-1022221913


   I am not sure if it is safe to drop the read lock while iterating over the 
blocks in a StorageInfo as the blocks could change when you drop the lock.
   
   Looking at the code in DatanodeStorageInfo, we can add a block to the list, 
which always adds to the head of the list. The worst case here, is we don't 
consider some block, but it would be caught by the rescan at the end, so that 
is not a big problem.
   
   However we can remove a block. Lets say we stop iterating at blockX and drop 
the lock.
   
   Then a IBR comes in and gets processed, removing that block from the storage.
   
   The iterator will be holding a reference to that blockInfo object as 
`current`, and it is expected to call next on it to get the next block Info.
   
   The remove block code in BlockInfo.java looks like:
   
   ```  
   BlockInfo listRemove(BlockInfo head, DatanodeStorageInfo storage) {
       if (head == null) {
         return null;
       }
       int dnIndex = this.findStorageInfo(storage);
       if (dnIndex < 0) { // this block is not on the data-node list
         return head;
       }
   
       BlockInfo next = this.getNext(dnIndex);
       BlockInfo prev = this.getPrevious(dnIndex);
       this.setNext(dnIndex, null);
       this.setPrevious(dnIndex, null);
       if (prev != null) {
         prev.setNext(prev.findStorageInfo(storage), next);
       }
       if (next != null) {
         next.setPrevious(next.findStorageInfo(storage), prev);
       }
       if (this == head) { // removing the head
         head = next;
       }
       return head;
     }
   ```
   
   It is basically unlinking itself from the linked list, and then sets its own 
prev and next to null. Then the iterator calls `getNext` on it:
   
   ```
     BlockInfo getNext(int index) {
       assert this.triplets != null : "BlockInfo is not initialized";
       assert index >= 0 && index * 3 + 2 < triplets.length : "Index is out of 
bound";
       BlockInfo info = (BlockInfo)triplets[index * 3 + 2];
       assert info == null || info.getClass().getName().startsWith(
           BlockInfo.class.getName()) :
           "BlockInfo is expected at " + (index * 3 + 2);
       return info;
     }
   ```
   
   If the case of the current blockInfo having a null next, the iterator will 
get null, and think it has reached the end of the list and exit wrongly.
   
   This would be a rare bug to hit, as you would need to drop the lock and then 
the next block to consume from the list would need to be removed from the list, 
but it is one of those things that will happen from time to time and be very 
hard to explain.
   
   5M blocks on a single storage is a lot - dropping and re-locking at the 
storage level was supposed to handle DNs with millions of blocks, but each 
storage only having a small number.
   
   Does this pause occur once per storage at the start and end of decommission 
for each node?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 715695)
    Time Spent: 1h 20m  (was: 1h 10m)

> Avoid holding read locks for a long time when scanDatanodeStorage
> -----------------------------------------------------------------
>
>                 Key: HDFS-16438
>                 URL: https://issues.apache.org/jira/browse/HDFS-16438
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: tomscut
>            Assignee: tomscut
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-01-25-23-18-30-275.png
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> At the time of decommission, if use {*}DatanodeAdminBackoffMonitor{*}, there 
> is a heavy operation: {*}scanDatanodeStorage{*}. If the number of blocks on a 
> storage is large(more than 5 million), and GC performance is also poor, it 
> may hold *read lock* for a long time, we should optimize it.
> !image-2022-01-25-23-18-30-275.png|width=764,height=193!
> {code:java}
> 2021-12-22 07:49:01,279 INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:readUnlock(220)) - FSNamesystem scanDatanodeStorage 
> read lock held for 5491 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:222)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1641)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.scanDatanodeStorage(DatanodeAdminBackoffMonitor.java:646)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.checkForCompletedNodes(DatanodeAdminBackoffMonitor.java:417)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.check(DatanodeAdminBackoffMonitor.java:300)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.run(DatanodeAdminBackoffMonitor.java:201)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
>     Number of suppressed read-lock reports: 0
>     Longest read-lock held interval: 5491 {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16438) Avoid holding read locks for a long time when scanDatanodeStorage

Reply via email to