[ 
https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736461#comment-13736461
 ] 

Jerry He commented on HBASE-8760:
---------------------------------

The patch is working well up to step 12. I've not been able to re-create the 
problem.

But I have seen problems and exceptions in both 0.94 and 0.95.2 during step 13 
and 14 for a second level snapshot and clone.

For example, in 0.94:
{code}
hbase(main):005:0> snapshot 'TestTable_clone', 'my_snapshot2'

ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
ss=my_snapshot2 table=TestTable_clone type=FLUSH } had an error.  my_snapshot2 
not found in proclist []
        at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:359)
        at 
org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:2185)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
Failed taking snapshot { ss=my_snapshot2 table=TestTable_clone type=FLUSH } due 
to exception:Missing parent hfile for: 
TestTable=83935cdbb327ac84f45a7248f4d58173-048d68de11a042e9aba294ab336ddbf3.630c188f55575e0cce497ba342b562bb:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException:
 Missing parent hfile for: 
TestTable=83935cdbb327ac84f45a7248f4d58173-048d68de11a042e9aba294ab336ddbf3.630c188f55575e0cce497ba342b562bb
        at 
org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:85)
        at 
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:282)
        at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:349)
        ... 7 more
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Missing 
parent hfile for: 
TestTable=83935cdbb327ac84f45a7248f4d58173-048d68de11a042e9aba294ab336ddbf3.630c188f55575e0cce497ba342b562bb
        at 
org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyStoreFile(MasterSnapshotVerifier.java:223)
        at 
org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.access$000(MasterSnapshotVerifier.java:85)
        at 
org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier$1.storeFile(MasterSnapshotVerifier.java:209)
        at 
org.apache.hadoop.hbase.util.FSVisitor.visitRegionStoreFiles(FSVisitor.java:115)
{code}
>From the logs, in this failure, 630c188f55575e0cce497ba342b562bb is a region 
>in TestTable_clone that went thru its own split. It was gone (not even in 
>.archive) after its split.
But somehow there are remaining links/references to it in TestTable_clone.
TestTable_clone have 3m plus rows. It could go thru compactions and splits on 
its own.  That seems to have confuses snapshot operations.  
If you need to relevant master/region server logs, I can send to you or attach 
them here.
                
> possible loss of data in snapshot taken after region split
> ----------------------------------------------------------
>
>                 Key: HBASE-8760
>                 URL: https://issues.apache.org/jira/browse/HBASE-8760
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.94.8, 0.95.1
>            Reporter: Jerry He
>             Fix For: 0.98.0, 0.95.2, 0.94.12
>
>         Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, 
> HBASE-8760-0.94-v4.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch, 
> HBASE-8760-thz-v2.patch, HBASE-8760-thz-v3.patch, HBASE-8760-v4.patch
>
>
> Right after a region split but before the daughter regions are compacted, we 
> have two daughter regions containing Reference files to the parent hfiles.
> If we take snapshot right at the moment, the snapshot will succeed, but it 
> will only contain the daughter Reference files. Since there is no hold on the 
> parent hfiles, they will be deleted by the HFile Cleaner after they are no 
> longer needed by the daughter regions soon after.
> A minimum we need to do is the keep these parent hfiles from being deleted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to