[ 
https://issues.apache.org/jira/browse/HDFS-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16583.
------------------------------------
    Resolution: Fixed

> DatanodeAdminDefaultMonitor can get stuck in an infinite loop
> -------------------------------------------------------------
>
>                 Key: HDFS-16583
>                 URL: https://issues.apache.org/jira/browse/HDFS-16583
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.4
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We encountered a case where the decommission monitor in the namenode got 
> stuck for about 6 hours. The logs give:
> {code}
> 2022-05-15 01:09:25,490 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.132:50010
> 2022-05-15 01:10:20,918 INFO org.apache.hadoop.http.HttpServer2: Process 
> Thread Dump: jsp requested
> <Thread dump>
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753665_3428271426
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753659_3428271420
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753662_3428271423
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753663_3428271424
> 2022-05-15 06:00:57,281 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.34:50010
> 2022-05-15 06:00:58,105 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
> held for 17492614 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1601)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:496)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>       Number of suppressed write-lock reports: 0
>       Longest write-lock held interval: 17492614
> {code}
> We only have the one thread dump triggered by the FC:
> {code}
> Thread 80 (DatanodeAdminMonitor-0):
>   State: RUNNABLE
>   Blocked count: 16
>   Waited count: 453693
>   Stack:
>     
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:538)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:494)
>     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     java.lang.Thread.run(Thread.java:748)
> {code}
> This was the line of code:
> {code}
>     private void check() {
>       final Iterator<Map.Entry<DatanodeDescriptor, AbstractList<BlockInfo>>>
>           it = new CyclicIteration<>(outOfServiceNodeBlocks,
>               iterkey).iterator();
>       final LinkedList<DatanodeDescriptor> toRemove = new LinkedList<>();
>       while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem
>           .isRunning()) {
>         numNodesChecked++;
>         final Map.Entry<DatanodeDescriptor, AbstractList<BlockInfo>>
>             entry = it.next();
>         final DatanodeDescriptor dn = entry.getKey();
>         AbstractList<BlockInfo> blocks = entry.getValue();
>         boolean fullScan = false;
>         if (dn.isMaintenance() && dn.maintenanceExpired()) {
>           // If maintenance expires, stop tracking it.
>           stopMaintenance(dn);
>           toRemove.add(dn);
>           continue;
>         }
>         if (dn.isInMaintenance()) {
>           // The dn is IN_MAINTENANCE and the maintenance hasn't expired yet.
>           continue;  //// >>> This line
>         }
> {code}
> With only one sample, we cannot figure out for sure if it is somehow stuck in 
> an infinite loop, but I suspect it is.
> The problem is two fold:
> 1) When we call stopMaintenance(dn), which we must have done as it logged the 
> "Stopping maintenance of dead node", the code looks like:
> {code}
>        if (dn.isMaintenance() && dn.maintenanceExpired()) {
>           // If maintenance expires, stop tracking it.
>           stopMaintenance(dn);
>           toRemove.add(dn);
>           continue;
>         }
> {code}
> Notice we add the DN to the toRemove list - this is to avoid concurrent 
> modifications on the outOfServiceNodeBlocks map. However inside 
> stopMaintenance we see:
> {code}
>       pendingNodes.remove(node);
>       outOfServiceNodeBlocks.remove(node);
> {code}      
> This should cause a concurrentModification exception, which I think will 
> leave the decommission monitor broken until the NN is restarted normally 
> (again I am not sure).
> 2) In some cases a ConcurrentModificationException is not thrown. This is 
> because the iterator is a CyclicIterator. If the DN removed from maintenance 
> is the one after the iterator start key, and its the second last element in 
> the navigatable map, I believe the CyclicIterator will go into an infinite 
> loop. No concurrent modification exception will get thrown, as the TreeMap 
> iterator has no next(), so it creates a new TreeMap iterator, and then the 
> "stop key" for the cyclic iterator is never found in the map as its now gone, 
> and it just cycles forever. I think that is what happened here.
> This test in TestCyclicIteration proves the infinite loop is possible:
> {code}
>   @Test
>   public void testInfiniteLoop() {
>     final NavigableMap<Integer, Integer> map = new TreeMap<Integer, 
> Integer>();
>     map.put(0, 0);
>     map.put(2, 2);
>     map.put(4, 4);
>     map.put(6, 6);
>     System.out.println("map=" + map);
>     // Create iterator to start at 2, which means the first element returned
>     // should be 4.
>     // Inside the treemap iterator, next will already be set to 6
>     // after 4 is returned.
>     final Iterator<Map.Entry<Integer, Integer>>
>         it =  new CyclicIteration<>(map, 2).iterator();
>     int i = 0;
>     while(it.hasNext()) {
>       Map.Entry<Integer, Integer> e = it.next();
>       if (i == 0) {
>         map.remove(4);
>       }
>       System.out.println("Iteration #" + i++);
>     }
>   }
> {code}
> The root cause of the decommission monitor problem is the 
> concurrentModification of the outOfServiceNodeBlocks - so we should avoid 
> that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to