[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415478#comment-15415478 ]
Paulo Motta commented on CASSANDRA-9876: ---------------------------------------- Thanks for the follow-up. Updated patch and dtests LGTM. bq. The reason why I added in the check for a token range was that the repair code as it is now doesn’t actually add only the common ranges between the specified hosts. I wasn’t sure if this is was the intended behavior or a bug. You're right, thanks for pointing this out. I was having {{-pr}} option in mind, but it seems like it's not possible to combine {{-pr}} and {{-hosts}} since CASSANDRA-7317. As a matter of fact this limitation was discussed on parent ticket CASSANDRA-6440, and it seems like it's expected behavior. bq. If this is intended behavior, then forcing the user to specify a token range that is common between the nodes prevents that exception from being thrown. Otherwise the error message, “Repair requires at least two endpoints that are neighbours before it can continue” can be confusing to the operator since the two specified nodes may actually share a common range. Agreed, in any case I updated the error message to the following to make it clearer when {{--pull}} is not specified: {noformat} Specified hosts [127.0.0.2, 127.0.0.1] do not share range (-3074457345618258503,3074457345618258602] needed for repair. Either restrict repair ranges with -st/-et options, or specify one of the neighbors that share this range with this node: [/127.0.0.3, /127.0.0.4, /127.0.0.6]. {noformat} When trying to reproduce this, I noticed two minor problems with the repair command so I included 2 ninja commits to fix those (could you have a look?): 1. When there is an exception while running repair, the [RepairRunner|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/RepairRunner.java#L108] prints a {{\[2016-08-10 09:16:41,291\] null}} message after the actual error message due to {{RepairRunnable}} not including any message in the {{COMPLETE}} event on [fireErrorAndComplete|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L106], so I added a {{Repair command #x finished with error}} message to avoid null being print when there is an error during repair. 2. Currently {{\-\-dc}} and {{\-\-hosts}} option are mutually exclusive on [ActiveRepairService.getNeighbors|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L226], but if you specify them together the {{--hosts}} option is silently ignored, so I added a minor check to avoid combining this two options. Update branch and CI submissions links are below: ||trunk|| |[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-9876]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9876-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9876-dtest/lastCompletedBuild/testReport/]| After CI results look good and you verified the additional changes I will mark this as ready to commit. Can you open a dtest pull request to https://github.com/riptano/cassandra-dtest ? > One way targeted repair > ----------------------- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement > Reporter: sankalp kohli > Assignee: Geoffrey Yu > Priority: Minor > Fix For: 3.x > > Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt > > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)