[ 
https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743522#comment-13743522
 ] 

Jerry He commented on HBASE-8760:
---------------------------------

Hi, Matteo

Thank you for the time and effort you spent on this JIRA!  There had been more 
complexity and problems than anticipated.

I applied HBASE-9207, HBASE-9233, and then the HBASE-8760-0.94-v8.patch on my 
0.94 cluster.

I went through a few times the test steps outlined in my previous comment. 
Sometimes with minor changes in the steps.

There is one more issue. (Hopefully this is the last one!)
We should not include the offline regions' ServerName in the online snapshot 
procedure. Otherwise the snapshot procedure will timeout
while waiting for the obsolete ServerName if the ServerName has been changed, 
e.g. a re-start.

Attached a 0.94-v8-addendum. It is on top of HBASE-8760-0.94-v8.patch.

After this, I have not seen any failure or exceptions during the testing. 
The row counts always match. The logs are clean without errors or exceptions 
too.
                
> possible loss of data in snapshot taken after region split
> ----------------------------------------------------------
>
>                 Key: HBASE-8760
>                 URL: https://issues.apache.org/jira/browse/HBASE-8760
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.94.8, 0.95.1
>            Reporter: Jerry He
>             Fix For: 0.98.0, 0.94.12, 0.96.0
>
>         Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, 
> HBASE-8760-0.94-v4.patch, HBASE-8760-0.94-v5.patch, HBASE-8760-0.94-v6.patch, 
> HBASE-8760-0.94-v7.patch, HBASE-8760-0.94-v8-addendum.patch, 
> HBASE-8760-0.94-v8.patch, HBASE-8760-thz-v0.patch, HBASE-8760-trunk-v8.patch, 
> HBASE-8760-v4.patch, v4-patch-testing-0.94.zip, v4-patch-testing-0.95.2.zip
>
>
> Right after a region split but before the daughter regions are compacted, we 
> have two daughter regions containing Reference files to the parent hfiles.
> If we take snapshot right at the moment, the snapshot will succeed, but it 
> will only contain the daughter Reference files. Since there is no hold on the 
> parent hfiles, they will be deleted by the HFile Cleaner after they are no 
> longer needed by the daughter regions soon after.
> A minimum we need to do is the keep these parent hfiles from being deleted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to