[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736461#comment-13736461 ]
Jerry He commented on HBASE-8760: --------------------------------- The patch is working well up to step 12. I've not been able to re-create the problem. But I have seen problems and exceptions in both 0.94 and 0.95.2 during step 13 and 14 for a second level snapshot and clone. For example, in 0.94: {code} hbase(main):005:0> snapshot 'TestTable_clone', 'my_snapshot2' ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=my_snapshot2 table=TestTable_clone type=FLUSH } had an error. my_snapshot2 not found in proclist [] at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:359) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:2185) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=my_snapshot2 table=TestTable_clone type=FLUSH } due to exception:Missing parent hfile for: TestTable=83935cdbb327ac84f45a7248f4d58173-048d68de11a042e9aba294ab336ddbf3.630c188f55575e0cce497ba342b562bb:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Missing parent hfile for: TestTable=83935cdbb327ac84f45a7248f4d58173-048d68de11a042e9aba294ab336ddbf3.630c188f55575e0cce497ba342b562bb at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:85) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:282) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:349) ... 7 more Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Missing parent hfile for: TestTable=83935cdbb327ac84f45a7248f4d58173-048d68de11a042e9aba294ab336ddbf3.630c188f55575e0cce497ba342b562bb at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyStoreFile(MasterSnapshotVerifier.java:223) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.access$000(MasterSnapshotVerifier.java:85) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier$1.storeFile(MasterSnapshotVerifier.java:209) at org.apache.hadoop.hbase.util.FSVisitor.visitRegionStoreFiles(FSVisitor.java:115) {code} >From the logs, in this failure, 630c188f55575e0cce497ba342b562bb is a region >in TestTable_clone that went thru its own split. It was gone (not even in >.archive) after its split. But somehow there are remaining links/references to it in TestTable_clone. TestTable_clone have 3m plus rows. It could go thru compactions and splits on its own. That seems to have confuses snapshot operations. If you need to relevant master/region server logs, I can send to you or attach them here. > possible loss of data in snapshot taken after region split > ---------------------------------------------------------- > > Key: HBASE-8760 > URL: https://issues.apache.org/jira/browse/HBASE-8760 > Project: HBase > Issue Type: Bug > Components: snapshots > Affects Versions: 0.94.8, 0.95.1 > Reporter: Jerry He > Fix For: 0.98.0, 0.95.2, 0.94.12 > > Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, > HBASE-8760-0.94-v4.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch, > HBASE-8760-thz-v2.patch, HBASE-8760-thz-v3.patch, HBASE-8760-v4.patch > > > Right after a region split but before the daughter regions are compacted, we > have two daughter regions containing Reference files to the parent hfiles. > If we take snapshot right at the moment, the snapshot will succeed, but it > will only contain the daughter Reference files. Since there is no hold on the > parent hfiles, they will be deleted by the HFile Cleaner after they are no > longer needed by the daughter regions soon after. > A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira