[ 
https://issues.apache.org/jira/browse/HDFS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002045#comment-14002045
 ] 

Colin Patrick McCabe commented on HDFS-6428:
--------------------------------------------

bq. Do we know what else is modifying bpSlices and causing the CME? Hopefully 
we aren't masking another bug.

it seems like {{FsDatasetImpl#shutdownBlockPool}} calls 
{{FsVolumeList#removeBlockPool}}, which then calls 
{{FsVolumeImpl#shutdownBlockPool}}.  That last function removes the entry from 
{{bpSlices}}.  The key to understanding this change is to realize that 
{{FsVolumeImpl#shutdownBlockPool}} is always done under the {{FsDatasetImpl}} 
lock, since the first function in the chain (inside {{FsDatasetImpl}}) is 
synchronized.

So to sum up: if we don't want to get CMEs, we need to add {{synchronized 
(dataset)}} blocks around the stuff that is accessing {{bpSlices}}.  It looks 
like this has been done in a few places, but not in others, and there is a TODO 
in the code to fix this.

Yongjun, can you put a {{synchronized (dataset)}} block inside 
{{FsVolumeImpl#shutdownBlockPool}}?  Otherwise, it is confusing to realize that 
this must be done under the lock.  Since Java monitors are re-entrant, this 
will work fine.

+1 once this is addressed.

> TestWebHdfsWithMultipleNameNodes failed with ConcurrentModificationException
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-6428
>                 URL: https://issues.apache.org/jira/browse/HDFS-6428
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6428.001.patch
>
>
> TestWebHdfsWithMultipleNameNodes failed as follows:
> {code}
> Running org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.643 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes  Time elapsed: 
> 3.771 sec  <<< ERROR!
> java.util.ConcurrentModificationException: null
>         at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
>         at java.util.HashMap$EntryIterator.next(HashMap.java:934)
>         at java.util.HashMap$EntryIterator.next(HashMap.java:932)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:251)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:249)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1389)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1304)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1555)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1530)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1514)
>         at 
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:99)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to