[ 
https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415478#comment-15415478
 ] 

Paulo Motta commented on CASSANDRA-9876:
----------------------------------------

Thanks for the follow-up. Updated patch and dtests LGTM.

bq. The reason why I added in the check for a token range was that the repair 
code as it is now doesn’t actually add only the common ranges between the 
specified hosts. I wasn’t sure if this is was the intended behavior or a bug.

You're right, thanks for pointing this out. I was having {{-pr}} option in 
mind, but it seems like it's not possible to combine {{-pr}} and {{-hosts}} 
since CASSANDRA-7317. As a matter of fact this limitation was discussed on 
parent ticket CASSANDRA-6440, and it seems like it's expected behavior.

bq. If this is intended behavior, then forcing the user to specify a token 
range that is common between the nodes prevents that exception from being 
thrown. Otherwise the error message, “Repair requires at least two endpoints 
that are neighbours before it can continue” can be confusing to the operator 
since the two specified nodes may actually share a common range.

Agreed, in any case I updated the error message to the following to make it 
clearer when {{--pull}} is not specified:
{noformat}
Specified hosts [127.0.0.2, 127.0.0.1] do not share range 
(-3074457345618258503,3074457345618258602] needed for repair. Either restrict 
repair ranges with -st/-et options, or specify one of the neighbors that share 
this range with this node: [/127.0.0.3, /127.0.0.4, /127.0.0.6].
{noformat}

When trying to reproduce this, I noticed two minor problems with the repair 
command so I included 2 ninja commits to fix those (could you have a look?):

1. When there is an exception while running repair, the 
[RepairRunner|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/RepairRunner.java#L108]
 prints a {{\[2016-08-10 09:16:41,291\] null}} message after the actual error 
message due to {{RepairRunnable}} not including any message in the {{COMPLETE}} 
event on 
[fireErrorAndComplete|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L106],
 so I added a {{Repair command #x finished with error}} message to avoid null 
being print when there is an error during repair.
2. Currently {{\-\-dc}} and {{\-\-hosts}} option are mutually exclusive on 
[ActiveRepairService.getNeighbors|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L226],
 but if you specify them together the {{--hosts}} option is silently ignored, 
so I added a minor check to avoid combining this two options.

Update branch and CI submissions links are below:

||trunk||
|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-9876]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9876-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9876-dtest/lastCompletedBuild/testReport/]|

After CI results look good and you verified the additional changes I will mark 
this as ready to commit. Can you open a dtest pull request to 
https://github.com/riptano/cassandra-dtest ?

> One way targeted repair
> -----------------------
>
>                 Key: CASSANDRA-9876
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9876
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Geoffrey Yu
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt
>
>
> Many applications use C* by writing to one local DC. The other DC is used 
> when the local DC is unavailable. When the local DC becomes available, we 
> want to run a targeted repair b/w one endpoint from each DC to minimize the 
> data transfer over WAN.  In this case, it will be helpful to do a one way 
> repair in which data will only be streamed from other DC to local DC instead 
> of streaming the data both ways. This will further minimize the traffic over 
> WAN. This feature should only be supported if a targeted repair is run 
> involving 2 hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to