[ https://issues.apache.org/jira/browse/HDFS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002045#comment-14002045 ]
Colin Patrick McCabe commented on HDFS-6428: -------------------------------------------- bq. Do we know what else is modifying bpSlices and causing the CME? Hopefully we aren't masking another bug. it seems like {{FsDatasetImpl#shutdownBlockPool}} calls {{FsVolumeList#removeBlockPool}}, which then calls {{FsVolumeImpl#shutdownBlockPool}}. That last function removes the entry from {{bpSlices}}. The key to understanding this change is to realize that {{FsVolumeImpl#shutdownBlockPool}} is always done under the {{FsDatasetImpl}} lock, since the first function in the chain (inside {{FsDatasetImpl}}) is synchronized. So to sum up: if we don't want to get CMEs, we need to add {{synchronized (dataset)}} blocks around the stuff that is accessing {{bpSlices}}. It looks like this has been done in a few places, but not in others, and there is a TODO in the code to fix this. Yongjun, can you put a {{synchronized (dataset)}} block inside {{FsVolumeImpl#shutdownBlockPool}}? Otherwise, it is confusing to realize that this must be done under the lock. Since Java monitors are re-entrant, this will work fine. +1 once this is addressed. > TestWebHdfsWithMultipleNameNodes failed with ConcurrentModificationException > ---------------------------------------------------------------------------- > > Key: HDFS-6428 > URL: https://issues.apache.org/jira/browse/HDFS-6428 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: HDFS-6428.001.patch > > > TestWebHdfsWithMultipleNameNodes failed as follows: > {code} > Running org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes > Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.643 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes > org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes Time elapsed: > 3.771 sec <<< ERROR! > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894) > at java.util.HashMap$EntryIterator.next(HashMap.java:934) > at java.util.HashMap$EntryIterator.next(HashMap.java:932) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:251) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:249) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1389) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1304) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1555) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1530) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1514) > at > org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:99) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)