[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370995#comment-17370995
 ] 

Daniel Ma edited comment on HDFS-15796 at 6/29/21, 2:33 AM:
------------------------------------------------------------

[~weichiu]  No idea what kind of condition can reproduce this problem. it seems 
the tergets object is modified elsewhere, when 
computeReconstrutionWorkForBlocks is in progress.
{quote}// Step 2: choose target nodes for each reconstruction task
for (BlockReconstructionWork rw : reconWork) {
    // Exclude all of the containing nodes from being targets.
    // This list includes decommissioning or corrupt nodes.
    final Set<Node> excludedNodes = new HashSet<>(rw.getContainingNodes());
    List<DatanodeStorageInfo> targets = pendingReconstruction
        .getTargets(rw.getBlock());
    if (targets != null) {
        for (DatanodeStorageInfo dn : targets) {
              if (!excludedNodes.contains(dn.getDatanodeDescriptor())) {
                   excludedNodes.add(dn.getDatanodeDescriptor());
               }
         }
     }

     // choose replication targets: NOT HOLDING THE GLOBAL LOCK
     final BlockPlacementPolicy placementPolicy =
     placementPolicies.getPolicy(rw.getBlock().getBlockType());
     rw.chooseTargets(placementPolicy, storagePolicySuite, excludedNodes);
}{quote}
 


was (Author: daniel ma):
[~weichiu]  No idea what kind of condition can reproduce this problem. it seems 
the tergets object is modified elsewhere, when 
computeReconstrutionWorkForBlocks is in progress.
{quote}// Step 2: choose target nodes for each reconstruction task
for (BlockReconstructionWork rw : reconWork) {
 // Exclude all of the containing nodes from being targets.
 // This list includes decommissioning or corrupt nodes.
 final Set<Node> excludedNodes = new HashSet<>(rw.getContainingNodes());
 List<DatanodeStorageInfo> targets = pendingReconstruction
 .getTargets(rw.getBlock());
 if (targets != null) {
 for (DatanodeStorageInfo dn : targets) {
 if (!excludedNodes.contains(dn.getDatanodeDescriptor())) {
 excludedNodes.add(dn.getDatanodeDescriptor());
 }
 }
 }

 // choose replication targets: NOT HOLDING THE GLOBAL LOCK
 final BlockPlacementPolicy placementPolicy =
 placementPolicies.getPolicy(rw.getBlock().getBlockType());
 rw.chooseTargets(placementPolicy, storagePolicySuite, excludedNodes);
}{quote}

> ConcurrentModificationException error happens on NameNode occasionally
> ----------------------------------------------------------------------
>
>                 Key: HDFS-15796
>                 URL: https://issues.apache.org/jira/browse/HDFS-15796
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.1.1
>            Reporter: Daniel Ma
>            Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>       at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>       at java.util.ArrayList$Itr.next(ArrayList.java:859)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to