Szabolcs Bukros created HBASE-23995:
---------------------------------------

             Summary: Snapshoting a splitting region results in corrupted 
snapshot
                 Key: HBASE-23995
                 URL: https://issues.apache.org/jira/browse/HBASE-23995
             Project: HBase
          Issue Type: Bug
          Components: snapshots
    Affects Versions: 2.0.2
            Reporter: Szabolcs Bukros


The problem seems to originate from the fact that while the region split itself 
runs in a lock, the compactions following it run in separate threads. 
Alternatively the use of space quota policies can prevent compaction after a 
split and leads to the same issue.

In both cases the resulting snapshot will keep the split status of the parent 
region, but do not keep the references to the daughter regions, because they 
(splitA, splitB qualifiers) are stored separately in the meta table and do not 
propagate with the snapshot.

This is important because the in the freshly cloned table CatalogJanitor will 
find the parent region, realizes it is in split state, but because it can not 
find the daughter region references (haven't propagated) assumes parent could 
be cleaned up and deletes it. The archived region used in the snaphost only has 
back reference to the now also archived parent region and if the snapshot is 
deleted they both gets cleaned up. Unfortunately the daughter regions only 
contains hfile links, so at this point the data is lost.

How to reproduce:
{code:java}
hbase shell <<EOF
create 'test', 'cf'
(0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
flush 'test'
split 'test'
snapshot 'test', 'testshot'
EOF
{code}
This should make sure the snapshot is made before the compaction could be 
finished even with small amount of data.
{code:java}
sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
testshot -copy-to hdfs://target:8020/apps/hbase/data/imported-snapshot
{code}
I export the snapshot to make the usecase cleaner but deleting both the 
snapshot and the original table after the cloning should have the same effect.
{code:java}
clone_snapshot 'testshot', 'test2'
delete_snapshot "testshot"
{code}
I'm not sure what would be the best way to fix this. Preventing snapshots when 
a region is in split state, would make snapshot creation problematic. Forcing 
to run compaction as part of the split thread would make it rather slow. 
Propagating the daughter region references could prevent the deletion of the 
cloned parent region and the data would not be broken anymore but I'm not sure 
we have a logic in place that could pick up the pieces and finish the split 
process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to