[ https://issues.apache.org/jira/browse/HBASE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734063#comment-13734063 ]
Jean-Daniel Cryans commented on HBASE-9158: ------------------------------------------- So I tried the test with and without the patch to ReplicationSink, and it behaves as expected in both cases 100% of the time. +1 > Serious bug in cyclic replication > --------------------------------- > > Key: HBASE-9158 > URL: https://issues.apache.org/jira/browse/HBASE-9158 > Project: HBase > Issue Type: Bug > Affects Versions: 0.98.0, 0.95.1, 0.94.10 > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Priority: Critical > Fix For: 0.98.0, 0.95.2, 0.94.11 > > Attachments: 9158-0.94.txt, 9158-0.94-v2.txt, 9158-0.94-v3.txt, > 9158-0.94-v4.txt, 9158-trunk-v1.txt, 9158-trunk-v2.txt, 9158-trunk-v3.txt, > 9158-trunk-v4.txt > > > While studying the code for HBASE-7709, I found a serious bug in the current > cyclic replication code. The problem is here in HRegion.doMiniBatchMutation: > {code} > Mutation first = batchOp.operations[firstIndex].getFirst(); > txid = this.log.appendNoSync(regionInfo, > this.htableDescriptor.getName(), > walEdit, first.getClusterId(), now, this.htableDescriptor); > {code} > Now note that edits replicated from remote cluster and local edits might > interleave in the WAL, we might also receive edit from multiple remote > clusters. Hence that <walEdit> might have edits from many clusters in it, but > all are just labeled with the clusterId of the first Mutation. > Fixing this in doMiniBatchMutation seems tricky to do efficiently (imagine we > get a batch with cluster1, cluster2, cluster1, cluster2, ..., in that case > each edit would have to be its own batch). The coprocessor handling would > also be difficult. > The other option is create batches of Puts grouped by the cluster id in > ReplicationSink.replicateEntries(...), this is not as general, but equally > correct. This is the approach I would favor. > Lastly this is very hard to verify in a unittest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira