[
https://issues.apache.org/jira/browse/HDFS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002045#comment-14002045
]
Colin Patrick McCabe commented on HDFS-6428:
--------------------------------------------
bq. Do we know what else is modifying bpSlices and causing the CME? Hopefully
we aren't masking another bug.
it seems like {{FsDatasetImpl#shutdownBlockPool}} calls
{{FsVolumeList#removeBlockPool}}, which then calls
{{FsVolumeImpl#shutdownBlockPool}}. That last function removes the entry from
{{bpSlices}}. The key to understanding this change is to realize that
{{FsVolumeImpl#shutdownBlockPool}} is always done under the {{FsDatasetImpl}}
lock, since the first function in the chain (inside {{FsDatasetImpl}}) is
synchronized.
So to sum up: if we don't want to get CMEs, we need to add {{synchronized
(dataset)}} blocks around the stuff that is accessing {{bpSlices}}. It looks
like this has been done in a few places, but not in others, and there is a TODO
in the code to fix this.
Yongjun, can you put a {{synchronized (dataset)}} block inside
{{FsVolumeImpl#shutdownBlockPool}}? Otherwise, it is confusing to realize that
this must be done under the lock. Since Java monitors are re-entrant, this
will work fine.
+1 once this is addressed.
> TestWebHdfsWithMultipleNameNodes failed with ConcurrentModificationException
> ----------------------------------------------------------------------------
>
> Key: HDFS-6428
> URL: https://issues.apache.org/jira/browse/HDFS-6428
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: webhdfs
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Attachments: HDFS-6428.001.patch
>
>
> TestWebHdfsWithMultipleNameNodes failed as follows:
> {code}
> Running org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.643 sec <<<
> FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes Time elapsed:
> 3.771 sec <<< ERROR!
> java.util.ConcurrentModificationException: null
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
> at java.util.HashMap$EntryIterator.next(HashMap.java:934)
> at java.util.HashMap$EntryIterator.next(HashMap.java:932)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:251)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:249)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1389)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1304)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1555)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1530)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1514)
> at
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:99)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)