[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609614#comment-16609614 ] Marcus Eriksson commented on CASSANDRA-3200: While reviewing CASSANDRA-14693 I realised that the dtests for this were never committed, could you have a quick look [~bdeggleston]? https://github.com/krummas/cassandra-dtest/commits/marcuse/3200 and circle run: https://circleci.com/gh/krummas/cassandra/tree/marcuse%2Ffor_3200_dtests > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Marcus Eriksson >Priority: Minor > Labels: repair > Fix For: 4.0 > > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281801#comment-16281801 ] Marcus Eriksson commented on CASSANDRA-3200: the tests keep dying, but I've rerun the suspicious ones from the successful run above locally and they pass without problem, I'll get this committed > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Marcus Eriksson >Priority: Minor > Labels: repair > Fix For: 4.x > > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273511#comment-16273511 ] Blake Eggleston commented on CASSANDRA-3200: The last test run seems to have died. I restarted it [here|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/448/]. Assuming there aren't any related failures, I'm +1. > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Marcus Eriksson >Priority: Minor > Labels: repair > Fix For: 4.x > > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270669#comment-16270669 ] Marcus Eriksson commented on CASSANDRA-3200: pushed another commit with the review fixes here: https://github.com/krummas/cassandra/commits/marcuse/CASSANDRA-3200 tests ran here: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/440/ - looks like one of the new tests failed but it passes locally and it looks like an environment issue. Rerunning [here|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/441/] to make sure dtest branch is: https://github.com/krummas/cassandra-dtest/commits/marcuse/mt_calcs > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Marcus Eriksson >Priority: Minor > Labels: repair > Fix For: 4.x > > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202751#comment-16202751 ] Blake Eggleston commented on CASSANDRA-3200: bq. yeah I agree it duplicates a lot of code, but they also do different things - the asymmetric ones don't need the merkle trees for example since we compare everything outside of this class now. Let me know if you see a straight-forward way to do it. I'll try to break out the common code in a separate class. Hopefully the non-symmetric classes can be removed once we have confidence this works as well. Good point, I wasn’t paying attention to the stuff going on in their respective base classes bq. indentation looks good to me and they look good on github or am I misunderstanding you? The formatting of the matrices looks good, they just look weird starting at column 0 when the rest of the method / comment is indented 8 spaces. iow, something like this: {code} /* ... something ... A B C D E A = x x x B x x x C x x D = */ {code} Second round of review: Everything looks good for the most part, and your optimization / stream reduction stuff makes sense. There are just a few minor things: HostDifferences: * {{hasDifferencesFor}} isEmpty check is uneccesary ReducedDifferenceHolder * Probably don’t need this class, ImmutableMapshould be fine RepairOption * default for optimizeStreams seems to be false, but javadoc says it’s true AsymmetricLocalSyncTask * uncomment or remove logger info statement at line 95 AsymmetricSyncTask * startTime is compared to Long.MIN_VALUE in {{finished}}, but it never initialized to that value. Unless I’m mistaken, long values that aren’t explicitly initialized to some value become 0 by default, so that branch in finished will always run, even if {{run}} wasn’t called. > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Marcus Eriksson >Priority: Minor > Labels: repair > Fix For: 4.x > > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192808#comment-16192808 ] Marcus Eriksson commented on CASSANDRA-3200: haha yeah that was horribly unreadable coming back to it, sorry about that, refactored version up [here|https://github.com/krummas/cassandra/commits/marcuse/CASSANDRA-3200], hopefully it makes more sense now. bq. User facing: bq. symmetric/asymmetric nodetool naming option is ambiguous, not sure what a better name would be, maybe something about reducing or optimizing streams? made it {{--optimise-streams}} (or {{-os}}) bq. should be off by default done bq. AsymmetricSyncRequest/SyncTasks: bq. Could we just add a one-way flag to the existing requests / tasks? The new asymmetric classes duplicate most of the symmetric tasks code (I think). In the case of local sync task, the pullRepair flag is basically doing this already. yeah I agree it duplicates a lot of code, but they also do different things - the asymmetric ones don't need the merkle trees for example since we compare everything outside of this class now. Let me know if you see a straight-forward way to do it. I'll try to break out the common code in a separate class. Hopefully the non-symmetric classes can be removed once we have confidence this works as well. bq. IncomingRepairStreamTracker bq. fixing the container thing as mentioned above may fix this, but it’s difficult to figure out how this works. A top level java doc explaining how the duplicate streams are identified and reduced would be nice. hopefully the refactor makes this clearer, the class is now tiny and only actually tracks the incoming streams for each node bq. The class name doesn’t seem appropriate. Not all the streams are incoming, and it’s not tracking any continuous processes. Maybe RepairStreamReducer or RepairStreamOptimizer? Moved the reduce logic to {{ReduceHelper}} bq. Should be in the repair package. done bq. IncomingRepairStreamTrackerTest bq. Should throw exception instead of printing stack trace in static block done bq. Fix indentation of matrices in test comments indentation looks good to me and they look good on [github|https://github.com/krummas/cassandra/commit/f1cfc3af5206bd3f804bcf06d19b95e390466caa#diff-ec9d6471b46f4098dbce446cdbd25828R85] or am I misunderstanding you? bq. The content of the `differences` map, as set up in testSimpleReducing doesn’t make sense to me, why would node C be in node A’s map, but note vice versa? this is the way we compare trees - we compare A to C, but not the other way, and it makes no sense adding the C -> A difference since we don't need it for the calculations bq. I think it would be clearer to alias the contents of addresses 0-4 to static variables like A, B, C, etc. Parsing out the array indices when reading through the tests is difficult to follow. done > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Marcus Eriksson >Priority: Minor > Labels: repair > Fix For: 4.x > > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190560#comment-16190560 ] Blake Eggleston commented on CASSANDRA-3200: First round of review First, I think if there’s going to be a widely used data structure that has more than 1-2 levels of nested containers, it’s time to make some (simple) dedicated classes. For instance, IncomingRepairStreamTracker consumes and operates on {{Map>>}}. What each part of this structure represents, and the intended effect of each collection method call is not clear. Same sort of thing with {{Map >}}. Rolling these structures into classes, as well as putting the raw container manipulation behind more meaningfully named methods will make this patch much easier to understand. It will also allow you to test your container manipulation logic and actual algorithm logic separately. Some more specific stuff: User facing: * symmetric/asymmetric nodetool naming option is ambiguous, not sure what a better name would be, maybe something about reducing or optimizing streams? * should be off by default AsymmetricSyncRequest/SyncTasks: * Could we just add a one-way flag to the existing requests / tasks? The new asymmetric classes duplicate most of the symmetric tasks code (I think). In the case of local sync task, the pullRepair flag is basically doing this already. IncomingRepairStreamTracker * fixing the container thing as mentioned above may fix this, but it’s difficult to figure out how this works. A top level java doc explaining how the duplicate streams are identified and reduced would be nice. * The class name doesn’t seem appropriate. Not all the streams are incoming, and it’s not tracking any continuous processes. Maybe RepairStreamReducer or RepairStreamOptimizer? * Should be in the repair package. IncomingRepairStreamTrackerTest * Should throw exception instead of printing stack trace in static block * Fix indentation of matrices in test comments * The content of the `differences` map, as set up in testSimpleReducing doesn’t make sense to me, why would node C be in node A’s map, but note vice versa? * I think it would be clearer to alias the contents of addresses 0-4 to static variables like A, B, C, etc. Parsing out the array indices when reading through the tests is difficult to follow. > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Marcus Eriksson >Priority: Minor > Labels: repair > Fix For: 4.x > > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182268#comment-16182268 ] Marcus Eriksson commented on CASSANDRA-3200: rebased and fixed a few forgotten todos, also removed the pointless generics in IncomingRepairStreamTracker > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Marcus Eriksson >Priority: Minor > Labels: repair > Fix For: 4.x > > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014821#comment-16014821 ] sankalp kohli commented on CASSANDRA-3200: -- I think this will help a lot if you have many replicas. Reopening to see if we can work on it > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Labels: repair > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756479#comment-13756479 ] Sylvain Lebresne commented on CASSANDRA-3200: - Now that I think about it, last time I checked seriously I intended to do this at the hash level, which would be ideal. I.e., to compare all leafs together so you can say things like: A and B agree on this sub-range but C doesn't, while on that other sub-range it's C and B that agree but not A. A simpler solution could be to do it at the tree level. That is, just do a first path identifying which nodes fully agree on their tree, and when that's the case we could cut on the number of streaming job to do. Namely, if A and B are fully in sync, then there is no point in doing both A-C and B-C. This might save a bunch of transport already for a relatively reasonable effort, though I suspect that on heavily updated CF, it will be rare for two trees to fully agree due to timing issues. Repair: compare all trees together (for a given range/cf) instead of by pair in isolation - Key: CASSANDRA-3200 URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Currently, repair compare merkle trees by pair, in isolation of any other tree. What that means concretely is that if I have three node A, B and C (RF=3) with A and B in sync, but C having some range r inconsitent with both A and B (since those are consistent), we will do the following transfer of r: A - C, C - A, B - C, C - B. The fact that we do both A - C and C - A is fine, because we cannot know which one is more to date from A or C. However, the transfer B - C is useless provided we do A - C if A and B are in sync. Not doing that transfer will be a 25% improvement in that case. With RF=5 and only one node inconsistent with all the others, that almost a 40% improvement, etc... Given that this situation of one node not in sync while the others are is probably fairly common (one node died so it is behind), this could be a fair improvement over what is transferred. In the case where we use repair to rebuild completely a node, this will be a dramatic improvement, because it will avoid the rebuilded node to get RF times the data it should get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104361#comment-13104361 ] Sylvain Lebresne commented on CASSANDRA-3200: - Yes, having a not-bulky/continuous/incremental/ponies-powered repair would be nice. It's worth looking into it and I'm not even saying I won't help with that. That being said, I've heard a number of ideas on that (including the discussion on CASSANDRA-2699) and I have yet to be fully convinced by one of those idea. I do think it's not a simple problem. So until proved otherwise, the ETA for CASSANDRA-2699 is unknown and unlikely in the very near future. In the meantime, repair is there and used by people. Besides, while I understand that the past suckiness of the repair process may push one to think that we should throw everything away and use something completely new, I think it would be wise to first ask ourselves if we can't improve/built on what we have to make it good enough first. In particular, repair is already able to work on any token range. It would be relatively easy for instance to run more repair on smaller ranges. That plus the fact that both (validation) compaction and streaming can now be throttled, that could make repair much less bulky at a very little cost (in development time/new bug potentially added). And to get back to the issue at hand, it's actually not a complicated patch (given how repair works nowadays) and a very isolated one in what it will touch, so I see no reason why it wouldn't make it during the 1.0 series, while any potential replacement solution is almost guaranteed to not make it before 1.1 *at best*. Repair: compare all trees together (for a given range/cf) instead of by pair in isolation - Key: CASSANDRA-3200 URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 1.0.1 Currently, repair compare merkle trees by pair, in isolation of any other tree. What that means concretely is that if I have three node A, B and C (RF=3) with A and B in sync, but C having some range r inconsitent with both A and B (since those are consistent), we will do the following transfer of r: A - C, C - A, B - C, C - B. The fact that we do both A - C and C - A is fine, because we cannot know which one is more to date from A or C. However, the transfer B - C is useless provided we do A - C if A and B are in sync. Not doing that transfer will be a 25% improvement in that case. With RF=5 and only one node inconsistent with all the others, that almost a 40% improvement, etc... Given that this situation of one node not in sync while the others are is probably fairly common (one node died so it is behind), this could be a fair improvement over what is transferred. In the case where we use repair to rebuild completely a node, this will be a dramatic improvement, because it will avoid the rebuilded node to get RF times the data it should get. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104436#comment-13104436 ] Jonathan Ellis commented on CASSANDRA-3200: --- bq. it's actually not a complicated patch Really? Doesn't it require a lot more coordination between replicas? Repair: compare all trees together (for a given range/cf) instead of by pair in isolation - Key: CASSANDRA-3200 URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 1.0.1 Currently, repair compare merkle trees by pair, in isolation of any other tree. What that means concretely is that if I have three node A, B and C (RF=3) with A and B in sync, but C having some range r inconsitent with both A and B (since those are consistent), we will do the following transfer of r: A - C, C - A, B - C, C - B. The fact that we do both A - C and C - A is fine, because we cannot know which one is more to date from A or C. However, the transfer B - C is useless provided we do A - C if A and B are in sync. Not doing that transfer will be a 25% improvement in that case. With RF=5 and only one node inconsistent with all the others, that almost a 40% improvement, etc... Given that this situation of one node not in sync while the others are is probably fairly common (one node died so it is behind), this could be a fair improvement over what is transferred. In the case where we use repair to rebuild completely a node, this will be a dramatic improvement, because it will avoid the rebuilded node to get RF times the data it should get. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1310#comment-1310 ] Sylvain Lebresne commented on CASSANDRA-3200: - bq. Doesn't it require a lot more coordination between replicas? No. For a given range and cf, we already wait to have all the trees for that range and cf before scheduling the streaming repair. Repair: compare all trees together (for a given range/cf) instead of by pair in isolation - Key: CASSANDRA-3200 URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 1.0.1 Currently, repair compare merkle trees by pair, in isolation of any other tree. What that means concretely is that if I have three node A, B and C (RF=3) with A and B in sync, but C having some range r inconsitent with both A and B (since those are consistent), we will do the following transfer of r: A - C, C - A, B - C, C - B. The fact that we do both A - C and C - A is fine, because we cannot know which one is more to date from A or C. However, the transfer B - C is useless provided we do A - C if A and B are in sync. Not doing that transfer will be a 25% improvement in that case. With RF=5 and only one node inconsistent with all the others, that almost a 40% improvement, etc... Given that this situation of one node not in sync while the others are is probably fairly common (one node died so it is behind), this could be a fair improvement over what is transferred. In the case where we use repair to rebuild completely a node, this will be a dramatic improvement, because it will avoid the rebuilded node to get RF times the data it should get. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira