[ 
https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597711#comment-13597711
 ] 

Enis Soztutar commented on HBASE-7709:
--------------------------------------

I like option #2 better than this. It is more simpler. Jeff's idea is good, but 
has the problem of dealing with the topology changes. If the topology changes 
in a way to make the normal route to a cluster longer, than all the updates 
afterwards will be dropped unless we somehow clear the cached mappings. This 
brings in an operational burden of cleaning the caches of downstream clusters, 
once the admin changes the topology upstream. 
{code}
A -> B <-> C is changed to A -> B -> D -> C -> B 
{code}

Orthogonal to this, we also should be dropping the edits at the replication 
source, not the sink. We are doubling the network cost in cyclic cases. #2 also 
helps with this condition, because we can detect the sink cluster's id, and 
filter out. 

We can do a similar dynamic dictionary encoding for storing set of cluster ids. 
We can do it as a follow up optimization.


                
> Infinite loop possible in Master/Master replication
> ---------------------------------------------------
>
>                 Key: HBASE-7709
>                 URL: https://issues.apache.org/jira/browse/HBASE-7709
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.95.0, 0.94.6
>            Reporter: Lars Hofhansl
>            Assignee: Jeffrey Zhong
>             Fix For: 0.95.0, 0.94.7
>
>
> We just discovered the following scenario:
> # Cluster A and B are setup in master/master replication
> # By accident we had Cluster C replicate to Cluster A.
> Now all edit originating from C will be bouncing between A and B. Forever!
> The reason is that when the edit come in from C the cluster ID is already set 
> and won't be reset.
> We have a couple of options here:
> # Optionally only support master/master (not cycles of more than two 
> clusters). In that case we can always reset the cluster ID in the 
> ReplicationSource. That means that now cycles > 2 will have the data cycle 
> forever. This is the only option that requires no changes in the HLog format.
> # Instead of a single cluster id per edit maintain a (unordered) set of 
> cluster id that have seen this edit. Then in ReplicationSource we drop any 
> edit that the sink has seen already. The is the cleanest approach, but it 
> might need a lot of data stored per edit if there are many clusters involved.
> # Maintain a configurable counter of the maximum cycle side we want to 
> support. Could default to 10 (even maybe even just). Store a hop-count in the 
> WAL and the ReplicationSource increases that hop-count on each hop. If we're 
> over the max, just drop the edit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to