[ 
https://issues.apache.org/jira/browse/HBASE-20878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542395#comment-16542395
 ] 

Allan Yang commented on HBASE-20878:
------------------------------------

uploaded a new patch, deleted the testing code.
I think checking the server is OK. It is like a safe fence. If we close the 
regions, and the RS which the regions is still online, we can safely proceed to 
the next state. The chance of the RS crashes just before we do this check and 
region close is very very small, and if it happens, aborting the merge is still 
OK to do.
On the other hand, checking the region itself, like checking the 
recovered.edit, is a bit of hack I think. What if we bring back DLR(distributed 
log replay) back and there is no recovered.edit.  
Still, it is open to discuss. Another question is, should we just aborting the 
merge, or we should retry just like you said above, [~Apache9]?

> Data loss if merging regions while ServerCrashProcedure executing
> -----------------------------------------------------------------
>
>                 Key: HBASE-20878
>                 URL: https://issues.apache.org/jira/browse/HBASE-20878
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>    Affects Versions: 3.0.0, 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Critical
>             Fix For: 3.0.0, 2.0.2, 2.1.1
>
>         Attachments: HBASE-20878.branch-2.0.001.patch, 
> HBASE-20878.branch-2.0.002.patch
>
>
> In MergeTableRegionsProcedure, we close the regions to merge using 
> UnassignProcedure. But, if the RS these regions on is crashed, a 
> ServerCrashProcedure will execute at the same time. UnassignProcedures will 
> be blockd until all logs are split. But since these regions are closed for 
> merging, the regions won't open again, the recovered.edit in the region dir 
> won't be replay, thus, data will loss.
> I provided a test to repo this case. I seriously doubt Split region procedure 
> also has this kind of problem. I will check later



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to