[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375421#comment-17375421 ]
Daniel Ma edited comment on HDFS-15796 at 7/6/21, 9:57 AM: ----------------------------------------------------------- [~sodonnell] Thanks for reviewing, Actually you missed the for loop here: {code:java} //代码占位符 synchronized (pendingReconstruction) { List<DatanodeStorageInfo> targets = pendingReconstruction .getTargets(rw.getBlock()); if (targets != null) { for (DatanodeStorageInfo dn : targets) { if (!excludedNodes.contains(dn.getDatanodeDescriptor())) { excludedNodes.add(dn.getDatanodeDescriptor()); } } } } {code} The problem happens when the code above try to travel the DataNodes stored in pendingReconstruction object, while the DataNode list is also been modifing elsewhere. In other words, if you modify a List(delete or add an element) and visit it in the same time, ConcurrentModificationException will be casted. was (Author: daniel ma): [~sodonnell] Thanks for reviewing, Actually you missed the for loop here: {code:java} //代码占位符 synchronized (pendingReconstruction) { List<DatanodeStorageInfo> targets = pendingReconstruction .getTargets(rw.getBlock()); if (targets != null) { for (DatanodeStorageInfo dn : targets) { if (!excludedNodes.contains(dn.getDatanodeDescriptor())) { excludedNodes.add(dn.getDatanodeDescriptor()); } } } } {code} The problem happens when the code above try to travel the DataNodes stored in pendingReconstruction object, while the DataNode list is also be modified elsewhere. In other words, if you modify a List(delete or add an element) and visit it in the same time, ConcurrentModificationException will be casted. > ConcurrentModificationException error happens on NameNode occasionally > ---------------------------------------------------------------------- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.1.1 > Reporter: Daniel Ma > Priority: Critical > Attachments: 0001-HDFS-15796.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org