[ 
https://issues.apache.org/jira/browse/CASSANDRA-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867008#comment-13867008
 ] 

sankalp kohli commented on CASSANDRA-6440:
------------------------------------------

I think I did not explain it well or I am confused :)
The problem is that say if you have 3 nodes in a cluster A,B and C and RF=2 and 
you trigger repair on B. It will repair 2 ranges. 
In one range, you will have A and B as the neighbor and in another B and C. The 
getNeighbors method in AES will be called twice. 
So if an operator trigger the repair like this, it will not work for both the 
ranges since neighbors are different. 
nodetool repair -hosts A,B,C keyspace 

The same problem will be if someone is running a repair on all keyspaces with 
different RF. Neighbors will not be the same so whatever you provide in -hosts 
will be wrong. 

One of the ways to fix this might be to do checks much before getNeighbors 
method in AES. 
Another way is to support -hosts if you are repairing one range and on 
keyspaces with same RF. 
   

> Repair should allow repairing particular endpoints to reduce WAN usage. 
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6440
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6440
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: sankalp kohli
>            Assignee: sankalp kohli
>            Priority: Minor
>         Attachments: 6440_repair.log, JIRA-6440-v2.diff, JIRA-6440.diff
>
>
> The way we send out data that does not match over WAN can be improved. 
> Example: Say there are four nodes(A,B,C,D) which are replica of a range we 
> are repairing. A, B is in DC1 and C,D is in DC2. If A does not have the data 
> which other replicas have, then we will have following streams
> 1) A to B and back
> 2) A to C and back(Goes over WAN)
> 3) A to D and back(Goes over WAN)
> One of the ways of doing it to reduce WAN traffic is this.
> 1) Repair A and B only with each other and C and D with each other starting 
> at same time t. 
> 2) Once these repairs have finished, A,B and C,D are in sync with respect to 
> time t. 
> 3) Now run a repair between A and C, the streams which are exchanged as a 
> result of the diff will also be streamed to B and D via A and C(C and D 
> behaves like a proxy to the streams).
> For a replication of DC1:2,DC2:2, the WAN traffic will get reduced by 50% and 
> even more for higher replication factors.
> Another easy way to do this is to have repair command take nodes with which 
> you want to repair with. Then we can do something like this.
> 1) Run repair between (A and B) and (C and D)
> 2) Run repair between (A and C)
> 3) Run repair between (A and B) and (C and D)
> But this will increase the traffic inside the DC as we wont be doing proxy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to