[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vasu Mariyala updated HBASE-7709: --------------------------------- Attachment: HBASE-7709-rev5.patch 0.95-trunk-rev4.patch Attaching the patches 0.95-trunk-rev4.patch (0.95 and trunk) which stores the clusters as a list rather than set. The first cluster id in the list is the originating cluster and the subsequent entries indicate replication path. The patch HBASE-7709-rev5.patch (0.94) has the changes to ensure the api of 0.94 is the same as the api of 0.95 and trunk. These patches primarily address the monitoring issues mentioned by [~jeffreyz] > Infinite loop possible in Master/Master replication > --------------------------------------------------- > > Key: HBASE-7709 > URL: https://issues.apache.org/jira/browse/HBASE-7709 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 0.94.6, 0.95.1 > Reporter: Lars Hofhansl > Assignee: Vasu Mariyala > Fix For: 0.98.0, 0.94.12, 0.96.0 > > Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, > 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, > HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, > HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch > > > We just discovered the following scenario: > # Cluster A and B are setup in master/master replication > # By accident we had Cluster C replicate to Cluster A. > Now all edit originating from C will be bouncing between A and B. Forever! > The reason is that when the edit come in from C the cluster ID is already set > and won't be reset. > We have a couple of options here: > # Optionally only support master/master (not cycles of more than two > clusters). In that case we can always reset the cluster ID in the > ReplicationSource. That means that now cycles > 2 will have the data cycle > forever. This is the only option that requires no changes in the HLog format. > # Instead of a single cluster id per edit maintain a (unordered) set of > cluster id that have seen this edit. Then in ReplicationSource we drop any > edit that the sink has seen already. The is the cleanest approach, but it > might need a lot of data stored per edit if there are many clusters involved. > # Maintain a configurable counter of the maximum cycle side we want to > support. Could default to 10 (even maybe even just). Store a hop-count in the > WAL and the ReplicationSource increases that hop-count on each hop. If we're > over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira