[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370995#comment-17370995 ]
Daniel Ma edited comment on HDFS-15796 at 6/29/21, 2:33 AM: ------------------------------------------------------------ [~weichiu] No idea what kind of condition can reproduce this problem. it seems the tergets object is modified elsewhere, when computeReconstrutionWorkForBlocks is in progress. {quote}// Step 2: choose target nodes for each reconstruction task for (BlockReconstructionWork rw : reconWork) { // Exclude all of the containing nodes from being targets. // This list includes decommissioning or corrupt nodes. final Set<Node> excludedNodes = new HashSet<>(rw.getContainingNodes()); List<DatanodeStorageInfo> targets = pendingReconstruction .getTargets(rw.getBlock()); if (targets != null) { for (DatanodeStorageInfo dn : targets) { if (!excludedNodes.contains(dn.getDatanodeDescriptor())) { excludedNodes.add(dn.getDatanodeDescriptor()); } } } // choose replication targets: NOT HOLDING THE GLOBAL LOCK final BlockPlacementPolicy placementPolicy = placementPolicies.getPolicy(rw.getBlock().getBlockType()); rw.chooseTargets(placementPolicy, storagePolicySuite, excludedNodes); }{quote} was (Author: daniel ma): [~weichiu] No idea what kind of condition can reproduce this problem. it seems the tergets object is modified elsewhere, when computeReconstrutionWorkForBlocks is in progress. {quote}// Step 2: choose target nodes for each reconstruction task for (BlockReconstructionWork rw : reconWork) { // Exclude all of the containing nodes from being targets. // This list includes decommissioning or corrupt nodes. final Set<Node> excludedNodes = new HashSet<>(rw.getContainingNodes()); List<DatanodeStorageInfo> targets = pendingReconstruction .getTargets(rw.getBlock()); if (targets != null) { for (DatanodeStorageInfo dn : targets) { if (!excludedNodes.contains(dn.getDatanodeDescriptor())) { excludedNodes.add(dn.getDatanodeDescriptor()); } } } // choose replication targets: NOT HOLDING THE GLOBAL LOCK final BlockPlacementPolicy placementPolicy = placementPolicies.getPolicy(rw.getBlock().getBlockType()); rw.chooseTargets(placementPolicy, storagePolicySuite, excludedNodes); }{quote} > ConcurrentModificationException error happens on NameNode occasionally > ---------------------------------------------------------------------- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.1.1 > Reporter: Daniel Ma > Priority: Critical > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org