[ 
https://issues.apache.org/jira/browse/HDFS-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746329#comment-14746329
 ] 

Alex Ivanov commented on HDFS-9052:
-----------------------------------

[~jingzhao], please let me know if you have any additional comments on this 
since we're trying to figure out how to work around this problem in our 
production clusters.

> deleteSnapshot runs into AssertionError
> ---------------------------------------
>
>                 Key: HDFS-9052
>                 URL: https://issues.apache.org/jira/browse/HDFS-9052
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Alex Ivanov
>
> CDH 5.0.5 upgraded from CDH 5.0.0 (Hadoop 2.3)
> Upon deleting a snapshot, we run into the following assertion error. The 
> scenario is as follows:
> 1. We have a program that deletes snapshots in reverse chronological order.
> 2. The program deletes a couple of hundred snapshots successfully but runs 
> into the following exception:
> java.lang.AssertionError: Element already exists: 
> element=useraction.log.crypto, DELETED=[useraction.log.crypto]
> 3. There seems to be an issue with that snapshot, which causes a file, which 
> normally gets overwritten in every snapshot to be added to the SnapshotDiff 
> delete queue twice.
> 4. Once the deleteSnapshot is run on the problematic snapshot, if the 
> Namenode is restarted, it cannot be started again until the transaction is 
> removed from the EditLog.
> 5. Sometimes the bad snapshot can be deleted but the prior snapshot seems to 
> "inherit" the same issue.
> 6. The error below is from Namenode starting when the DELETE_SNAPSHOT 
> transaction is replayed from the EditLog.
> 2015-09-01 22:59:59,140 INFO  [IPC Server handler 0 on 8022] BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2342)) - BLOCK* addStoredBlock: blockMap 
> updated: 10.52.209.77:1004 is added to 
> blk_1080833995_7093259{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-16de62e5-f6e2-4ea7-aad9-f8567bded7d7:NORMAL|FINALIZED]]}
>  size 0
> 2015-09-01 22:59:59,140 INFO  [IPC Server handler 0 on 8022] BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2342)) - BLOCK* addStoredBlock: blockMap 
> updated: 10.52.209.77:1004 is added to 
> blk_1080833996_7093260{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-1def2b07-d87f-49dd-b14f-ef230342088d:NORMAL|FINALIZED]]}
>  size 0
> 2015-09-01 22:59:59,141 ERROR [IPC Server handler 0 on 8022] 
> namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(232)) - 
> Encountered exception on operation DeleteSnapshotOp 
> [snapshotRoot=/data/tenants/pdx-svt.baseline84/wddata, 
> snapshotName=s2015022614_maintainer_soft_del, 
> RpcClientId=7942c957-a7cf-44c1-880d-6eea690e1b19, RpcCallId=1]
> 2015-09-01 22:59:59,141 ERROR [IPC Server handler 0 on 8022] 
> namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(232)) - 
> Encountered exception on operation DeleteSnapshotOp 
> [snapshotRoot=/data/tenants/pdx-svt.baseline84/wddata, 
> snapshotName=s2015022614_maintainer_soft_del, 
> RpcClientId=7942c957-a7cf-44c1-880d-6eea690e1b19, RpcCallId=1]
> java.lang.AssertionError: Element already exists: 
> element=useraction.log.crypto, DELETED=[useraction.log.crypto]
>         at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
>         at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
>         at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:293)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:303)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDeletedINode(DirectoryWithSnapshotFeature.java:531)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:823)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:714)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:684)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:830)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:714)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectorySnapshottable.removeSnapshot(INodeDirectorySnapshottable.java:341)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:238)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:667)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:802)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:783)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to