[jira] [Updated] (HBASE-9158) Serious bug in cyclic replication

Lars Hofhansl (JIRA) Thu, 08 Aug 2013 00:25:54 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lars Hofhansl updated HBASE-9158:
---------------------------------

    Attachment: 9158-0.94.txt

Here's a possibility fixing it in doMiniBatchMutation.

This is probably not correct, as we now have multiple calls to 
HLog.appendNoSync(...) and hence log-appending is no longer atomic; what if the 
3rd call fails?

                
> Serious bug in cyclic replication
> ---------------------------------
>
>                 Key: HBASE-9158
>                 URL: https://issues.apache.org/jira/browse/HBASE-9158
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0, 0.95.1, 0.94.10
>            Reporter: Lars Hofhansl
>            Priority: Critical
>             Fix For: 0.98.0, 0.95.2, 0.94.11
>
>         Attachments: 9158-0.94.txt
>
>
> While studying the code for HBASE-7709, I found a serious bug in the current 
> cyclic replication code. The problem is here in HRegion.doMiniBatchMutation:
> {code}
>       Mutation first = batchOp.operations[firstIndex].getFirst();
>       txid = this.log.appendNoSync(regionInfo, 
> this.htableDescriptor.getName(),
>                walEdit, first.getClusterId(), now, this.htableDescriptor);
> {code}
> Now note that edits replicated from remote cluster and local edits might 
> interleave in the WAL, we might also receive edit from multiple remote 
> clusters. Hence that <walEdit> might have edits from many clusters in it, but 
> all are just labeled with the clusterId of the first Mutation.
> Fixing this in doMiniBatchMutation seems tricky to do efficiently (imagine we 
> get a batch with cluster1, cluster2, cluster1, cluster2, ..., in that case 
> each edit would have to be its own batch). The coprocessor handling would 
> also be difficult.
> The other option is create batches of Puts grouped by the cluster id in 
> ReplicationSink.replicateEntries(...), this is not as general, but equally 
> correct. This is the approach I would favor.
> Lastly this is very hard to verify in a unittest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-9158) Serious bug in cyclic replication

Reply via email to