[ https://issues.apache.org/jira/browse/HDFS-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951783#comment-15951783 ]
Wei-Chiu Chuang commented on HDFS-11515: ---------------------------------------- The latest patch is good, and I just saw a few nits: In the unit test, please make sure a test has a timeout. {code} @Test//(timeout = 180000) {code} Also, would you mind to move the test to TestSnapshotDeletion.java? It's really my fault to place it in TestRenamefWithSnapshots. +1 after that. P.S. This is not related to your patch, but while reviewing the patch, I think the synchronization block in the following code is not needed at all and it only introduces extra overhead. I'll post a jira soon. {code} public boolean nodeIncluded(INode node) { INode resolvedNode = resolveINodeReference(node); synchronized (includedNodes) { if (!includedNodes.contains(resolvedNode)) { includedNodes.add(resolvedNode); return false; } } return true; } {code} > -du throws ConcurrentModificationException > ------------------------------------------ > > Key: HDFS-11515 > URL: https://issues.apache.org/jira/browse/HDFS-11515 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, shell > Affects Versions: 2.8.0, 3.0.0-alpha2 > Reporter: Wei-Chiu Chuang > Assignee: Istvan Fajth > Attachments: HDFS-11515.001.patch, HDFS-11515.002.patch, > HDFS-11515.003.patch, HDFS-11515.test.patch > > > HDFS-10797 fixed a disk summary (-du) bug, but it introduced a new bug. > The bug can be reproduced running the following commands: > {noformat} > bash-4.1$ hdfs dfs -mkdir /tmp/d0 > bash-4.1$ hdfs dfsadmin -allowSnapshot /tmp/d0 > Allowing snaphot on /tmp/d0 succeeded > bash-4.1$ hdfs dfs -touchz /tmp/d0/f4 > bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1 > bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s1 > Created snapshot /tmp/d0/.snapshot/s1 > bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2 > bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3 > bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2/d4 > bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3/d5 > bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s2 > Created snapshot /tmp/d0/.snapshot/s2 > bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2/d4 > bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2 > bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3/d5 > bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3 > bash-4.1$ hdfs dfs -du -h /tmp/d0 > du: java.util.ConcurrentModificationException > 0 0 /tmp/d0/f4 > {noformat} > A ConcurrentModificationException forced du to terminate abruptly. > Correspondingly, NameNode log has the following error: > {noformat} > 2017-03-08 14:32:17,673 WARN org.apache.hadoop.ipc.Server: IPC Server handler > 4 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getContentSumma > ry from 10.0.0.198:49957 Call#2 Retry#0 > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) > at java.util.HashMap$KeyIterator.next(HashMap.java:956) > at > org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.tallyDeletedSnapshottedINodes(ContentSummaryComputationContext.java:209) > at > org.apache.hadoop.hdfs.server.namenode.INode.computeAndConvertContentSummary(INode.java:507) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.getContentSummary(FSDirectory.java:2302) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:4535) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1087) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getContentSummary(AuthorizationProviderProxyClientProtocol.java:5 > 63) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.jav > a:873) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210) > {noformat} > The bug is due to a improper use of HashSet, not concurrent operations. > Basically, a HashSet can not be updated while an iterator is traversing it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org