[ 
https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738999#comment-13738999
 ] 

Lars Hofhansl edited comment on HBASE-7709 at 8/13/13 11:22 PM:
----------------------------------------------------------------

Thanks all for the great work on this.

We currently have a pair of clusters in two datacenters in a master/master 
setup and want to migrate one of them to a new datacenter.  I'm trying to 
determine if this patch will be required for us and would love if someone would 
be willing to double check my thinking.

Currently we have
A -> B, B -> A

1. Setup C, create presplit tables with replication_scope enabled on them.
2. Add peer B \-> C  (New state A \-> B, B \-> A, B-> C)
3. Copy table on each table from B -> C
4. Stop applications in A
5. Wait for queues from A \-> B to clear
6. Remove peer A \-> B (New state B \-> A, B-> C
7. Remove peer B \-> A (New state B -> C)
8. Add peer C \-> B (New state B -> C, C -> B)
9. Start applications in C

Given that we can live with applications only running in a single datacenter 
for a period of time we don't ever need to have writes from one cluster 
replicate to a downstream loop.  Therefore I don't think this patch is required 
for this migration.  Does that sound correct?  So does the state of (A <-> B) 
-> C still trigger the problem?

Edit by LarsH fix formatting.
                
      was (Author: davelatham):
    Thanks all for the great work on this.

We currently have a pair of clusters in two datacenters in a master/master 
setup and want to migrate one of them to a new datacenter.  I'm trying to 
determine if this patch will be required for us and would love if someone would 
be willing to double check my thinking.

Currently we have
A -> B, B -> A

1. Setup C, create presplit tables with replication_scope enabled on them.
2. Add peer B -> C  (New state A -> B, B -> A, B-> C)
3. Copy table on each table from B -> C
4. Stop applications in A
5. Wait for queues from A -> B to clear
6. Remove peer A -> B (New state B -> A, B-> C
7. Remove peer B -> A (New state B -> C)
8. Add peer C -> B (New state B -> C, C -> B)
9. Start applications in C

Given that we can live with applications only running in a single datacenter 
for a period of time we don't ever need to have writes from one cluster 
replicate to a downstream loop.  Therefore I don't think this patch is required 
for this migration.  Does that sound correct?  So does the state of (A <-> B) 
-> C still trigger the problem?
                  
> Infinite loop possible in Master/Master replication
> ---------------------------------------------------
>
>                 Key: HBASE-7709
>                 URL: https://issues.apache.org/jira/browse/HBASE-7709
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.94.6, 0.95.1
>            Reporter: Lars Hofhansl
>             Fix For: 0.98.0, 0.95.2, 0.94.12
>
>         Attachments: HBASE-7709.patch, HBASE-7709-rev1.patch
>
>
>  We just discovered the following scenario:
> # Cluster A and B are setup in master/master replication
> # By accident we had Cluster C replicate to Cluster A.
> Now all edit originating from C will be bouncing between A and B. Forever!
> The reason is that when the edit come in from C the cluster ID is already set 
> and won't be reset.
> We have a couple of options here:
> # Optionally only support master/master (not cycles of more than two 
> clusters). In that case we can always reset the cluster ID in the 
> ReplicationSource. That means that now cycles > 2 will have the data cycle 
> forever. This is the only option that requires no changes in the HLog format.
> # Instead of a single cluster id per edit maintain a (unordered) set of 
> cluster id that have seen this edit. Then in ReplicationSource we drop any 
> edit that the sink has seen already. The is the cleanest approach, but it 
> might need a lot of data stored per edit if there are many clusters involved.
> # Maintain a configurable counter of the maximum cycle side we want to 
> support. Could default to 10 (even maybe even just). Store a hop-count in the 
> WAL and the ReplicationSource increases that hop-count on each hop. If we're 
> over the max, just drop the edit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to