[ 
https://issues.apache.org/jira/browse/HDFS-16583?focusedWorklogId=776551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776551
 ]

ASF GitHub Bot logged work on HDFS-16583:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/May/22 20:27
            Start Date: 31/May/22 20:27
    Worklog Time Spent: 10m 
      Work Description: sodonnel opened a new pull request, #4394:
URL: https://github.com/apache/hadoop/pull/4394

   ### Description of PR
   
   Avoid concurrently modifying the outOfServiceNodeBlocks map in the 
DatanodeAdminDefaultMonitor when ending maintenance. The concurrentModification 
can cause the monitor to get stuck in an infinite loop in some circumstances, 
while holding the NN write lock.
   
   The issue does not affect the DatanodeAdminBackoffMonitor, as it is 
implemented slightly differently.
   
   More details in the Jira https://issues.apache.org/jira/browse/HDFS-16583
   
   This is a backport for branch-3.2 as the newer branches have diverged 
significantly.
   
   ### How was this patch tested?
   
   Existing tests cover it.
   
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 776551)
    Time Spent: 1.5h  (was: 1h 20m)

> DatanodeAdminDefaultMonitor can get stuck in an infinite loop
> -------------------------------------------------------------
>
>                 Key: HDFS-16583
>                 URL: https://issues.apache.org/jira/browse/HDFS-16583
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.4
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We encountered a case where the decommission monitor in the namenode got 
> stuck for about 6 hours. The logs give:
> {code}
> 2022-05-15 01:09:25,490 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.132:50010
> 2022-05-15 01:10:20,918 INFO org.apache.hadoop.http.HttpServer2: Process 
> Thread Dump: jsp requested
> <Thread dump>
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753665_3428271426
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753659_3428271420
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753662_3428271423
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753663_3428271424
> 2022-05-15 06:00:57,281 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.34:50010
> 2022-05-15 06:00:58,105 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
> held for 17492614 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1601)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:496)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>       Number of suppressed write-lock reports: 0
>       Longest write-lock held interval: 17492614
> {code}
> We only have the one thread dump triggered by the FC:
> {code}
> Thread 80 (DatanodeAdminMonitor-0):
>   State: RUNNABLE
>   Blocked count: 16
>   Waited count: 453693
>   Stack:
>     
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:538)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:494)
>     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     java.lang.Thread.run(Thread.java:748)
> {code}
> This was the line of code:
> {code}
>     private void check() {
>       final Iterator<Map.Entry<DatanodeDescriptor, AbstractList<BlockInfo>>>
>           it = new CyclicIteration<>(outOfServiceNodeBlocks,
>               iterkey).iterator();
>       final LinkedList<DatanodeDescriptor> toRemove = new LinkedList<>();
>       while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem
>           .isRunning()) {
>         numNodesChecked++;
>         final Map.Entry<DatanodeDescriptor, AbstractList<BlockInfo>>
>             entry = it.next();
>         final DatanodeDescriptor dn = entry.getKey();
>         AbstractList<BlockInfo> blocks = entry.getValue();
>         boolean fullScan = false;
>         if (dn.isMaintenance() && dn.maintenanceExpired()) {
>           // If maintenance expires, stop tracking it.
>           stopMaintenance(dn);
>           toRemove.add(dn);
>           continue;
>         }
>         if (dn.isInMaintenance()) {
>           // The dn is IN_MAINTENANCE and the maintenance hasn't expired yet.
>           continue;  //// >>> This line
>         }
> {code}
> With only one sample, we cannot figure out for sure if it is somehow stuck in 
> an infinite loop, but I suspect it is.
> The problem is two fold:
> 1) When we call stopMaintenance(dn), which we must have done as it logged the 
> "Stopping maintenance of dead node", the code looks like:
> {code}
>        if (dn.isMaintenance() && dn.maintenanceExpired()) {
>           // If maintenance expires, stop tracking it.
>           stopMaintenance(dn);
>           toRemove.add(dn);
>           continue;
>         }
> {code}
> Notice we add the DN to the toRemove list - this is to avoid concurrent 
> modifications on the outOfServiceNodeBlocks map. However inside 
> stopMaintenance we see:
> {code}
>       pendingNodes.remove(node);
>       outOfServiceNodeBlocks.remove(node);
> {code}      
> This should cause a concurrentModification exception, which I think will 
> leave the decommission monitor broken until the NN is restarted normally 
> (again I am not sure).
> 2) In some cases a ConcurrentModificationException is not thrown. This is 
> because the iterator is a CyclicIterator. If the DN removed from maintenance 
> is the one after the iterator start key, and its the second last element in 
> the navigatable map, I believe the CyclicIterator will go into an infinite 
> loop. No concurrent modification exception will get thrown, as the TreeMap 
> iterator has no next(), so it creates a new TreeMap iterator, and then the 
> "stop key" for the cyclic iterator is never found in the map as its now gone, 
> and it just cycles forever. I think that is what happened here.
> This test in TestCyclicIteration proves the infinite loop is possible:
> {code}
>   @Test
>   public void testInfiniteLoop() {
>     final NavigableMap<Integer, Integer> map = new TreeMap<Integer, 
> Integer>();
>     map.put(0, 0);
>     map.put(2, 2);
>     map.put(4, 4);
>     map.put(6, 6);
>     System.out.println("map=" + map);
>     // Create iterator to start at 2, which means the first element returned
>     // should be 4.
>     // Inside the treemap iterator, next will already be set to 6
>     // after 4 is returned.
>     final Iterator<Map.Entry<Integer, Integer>>
>         it =  new CyclicIteration<>(map, 2).iterator();
>     int i = 0;
>     while(it.hasNext()) {
>       Map.Entry<Integer, Integer> e = it.next();
>       if (i == 0) {
>         map.remove(4);
>       }
>       System.out.println("Iteration #" + i++);
>     }
>   }
> {code}
> The root cause of the decommission monitor problem is the 
> concurrentModification of the outOfServiceNodeBlocks - so we should avoid 
> that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to