[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569623#comment-13569623 ]
Ted Yu commented on HBASE-7709: ------------------------------- Looking at HLogKey#readFields(): {code} if (version.atLeast(Version.INITIAL)) { if (in.readBoolean()) { {code} >From the javadoc of readBoolean(): Reads one input byte and returns true if that byte is nonzero, false if that byte is zero. I think there is room to implement option #3 in the description. We can introduce new version (two, considering compression) where write, instead of true, the number of hops that HLog.Entry has gone through - starting with 1. A byte should suffice for this purpose. +1 on documenting this intricacy for 0.94.x in the refguide. I think we should create several subtasks for this JIRA. > Infinite loop possible in Master/Master replication > --------------------------------------------------- > > Key: HBASE-7709 > URL: https://issues.apache.org/jira/browse/HBASE-7709 > Project: HBase > Issue Type: Bug > Components: Replication > Reporter: Lars Hofhansl > Fix For: 0.96.0, 0.94.6 > > > We just discovered the following scenario: > # Cluster A and B are setup in master/master replication > # By accident we had Cluster C replicate to Cluster A. > Now all edit originating from C will be bouncing between A and B. Forever! > The reason is that when the edit come in from C the cluster ID is already set > and won't be reset. > We have a couple of options here: > # Optionally only support master/master (not cycles of more than two > clusters). In that case we can always reset the cluster ID in the > ReplicationSource. That means that now cycles > 2 will have the data cycle > forever. This is the only option that requires no changes in the HLog format. > # Instead of a single cluster id per edit maintain a (unordered) set of > cluster id that have seen this edit. Then in ReplicationSource we drop any > edit that the sink has seen already. The is the cleanest approach, but it > might need a lot of data stored per edit if there are many clusters involved. > # Maintain a configurable counter of the maximum cycle side we want to > support. Could default to 10 (even maybe even just). Store a hop-count in the > WAL and the ReplicationSource increases that hop-count on each hop. If we're > over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira