[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868022#comment-15868022 ] Sylvain Lebresne commented on CASSANDRA-9143: - Probably not a big deal but noticed that after this patch, {{CompactionManager.submitAntiCompaction()}} is now unused. Assuming that was intended, can one of you guys maybe clean it up ([~bdeggleston] and [~krummas])? > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > Fix For: 4.0 > > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854579#comment-15854579 ] Blake Eggleston commented on CASSANDRA-9143: Created dtest PR: https://github.com/riptano/cassandra-dtest/pull/1436 > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > Fix For: 4.x > > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850448#comment-15850448 ] Blake Eggleston commented on CASSANDRA-9143: They look good to me. I've pulled in your fixes, rebased and squashed against trunk, and pushed up to a new branch. | [trunk|https://github.com/bdeggleston/cassandra/tree/9143-trunk-squashed]|[dtest|http://cassci.datastax.com/view/Dev/view/bdeggleston/job/bdeggleston-9143-trunk-squashed-dtest/]|[testall|http://cassci.datastax.com/view/Dev/view/bdeggleston/job/bdeggleston-9143-trunk-squashed-testall/]| I've also addressed your comments on my dtest branch [here|https://github.com/bdeggleston/cassandra-dtest/commits/9143] > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > Fix For: 4.x > > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849059#comment-15849059 ] Marcus Eriksson commented on CASSANDRA-9143: Ok, this LGTM now - pushed up a few final fixes/nits here: https://github.com/krummas/cassandra/commits/blake/9143 could you rebase+squash+rerun tests if my fixes above look ok to you? > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829017#comment-15829017 ] Jeff Jirsa commented on CASSANDRA-9143: --- I know [~krummas] has review (and he's certainly the most qualified and most likely to give the best feedback here), but a I took a few notes as I read through the patches: - ARS {{registerParentRepairSession}} assert is inverted - https://github.com/bdeggleston/cassandra/commit/8e7eb081625b1749716f60bcb109ade8c84d8558#diff-93e6fa14f908d0ce3c24d56fbf484ba3R88 - double-checked locking needs volatile - Comment in CompactionStrategyManager about 2 strategies per data dir is now misleading, if not incorrect (also have pendingRepairs, which may be multiple other strategies?): https://github.com/bdeggleston/cassandra/blob/9143-trunk/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L59 - {{ConsistentSession.AbstractBuilder}} doesn't need to be public (especially with other {{AbstractBuilder}}s in the codebase) - {{LocalSessions.start()}} loads rows creating builders that should always work, but we've seen in the past (like CASSANDRA-12700) that we shouldn't rely on all of those values to be correct - maybe you can try to explicitly handle invalid rows being returned? If incomplete row/session, maybe that's failed by definition? - If you remove the enum definition for anticompaction request ( https://github.com/bdeggleston/cassandra/commit/76ee1a667818c5c72aa513c4a75777b1400cb69d#diff-9a5c76380064186d8f89003e1bab73bfL46 ), and we're in a mixed cluster for some reason, and get that verb on the wire, we'll throw - perhaps instead we should probably keep that verb, but log+ignore it if received. Less Importantly: - https://github.com/bdeggleston/cassandra/commit/8e7eb081625b1749716f60bcb109ade8c84d8558#diff-93e6fa14f908d0ce3c24d56fbf484ba3R303 - {{needsCleanup()}} could be renamed to avoid potentially confusing {{CompactionManager.needsCleanup()}}, especially since {{PendingRepairManager}} is very very much like {{CompactionManager}} (seems as if you may have started this in https://github.com/bdeggleston/cassandra/commit/ade7fe3373a1b44da02caaefc3180503d298e92b ) - https://github.com/bdeggleston/cassandra/blob/9143-trunk/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L605 - log from address as well, otherwise the log message is much less useful? - https://github.com/bdeggleston/cassandra/blob/9143-trunk/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L619 - could pass address for better logging here as well (and pretty much all of these, {{handlePrepareMessage}}, {{handleFinalizeCommitMessage}}, etc) > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709562#comment-15709562 ] Marcus Eriksson commented on CASSANDRA-9143: bq. I'd say we should keep full repairs simple. Don't do anti-compaction on them, and don't make them consistent. sounds good > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709261#comment-15709261 ] Blake Eggleston commented on CASSANDRA-9143: bq. Should we prioritize the pending-repair-cleanup compactions? Makes sense. bq. Is there any point in doing anticompaction after repair with -full repairs? Can we always do consistent repairs? We would need to anticompact already repaired sstables into pending, but that should not be a big problem? Good point. I'd say we should keep full repairs simple. Don't do anti-compaction on them, and don't make them consistent. Given the newness and relative complexity of consistent repair, it would be smart to have a full workaround in case we find a problem with it. If we're not going to do anti-compaction though, we should preserve repairedAt values of the sstables we're streaming around as part of a full repair. That will make is possible to fix corrupted or lost data in the repair buckets without adversely affecting the next incremental repair. bq. In handleStatusRequest - if we don't have the local session, we should probably return that the session is failed? That makes sense > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705274#comment-15705274 ] Marcus Eriksson commented on CASSANDRA-9143: Looks good in general - comments; * Rename the cleanup compaction task, very confusing wrt the current cleanup compactions * Should we prioritize the pending-repair-cleanup compactions? ** If we don't we might compare different datasets - a repair fails half way through and one node happens to move the pending data to unrepaired, operator retriggers repair and we would compare different datasets. If we instead move the data back as quickly as possible we minimize this window ** It would also help the next normal compactions as we might be able to include more sstables in the repaired/unrepaired strategies * Is there any point in doing anticompaction after repair with -full repairs? Can we always do consistent repairs? We would need to anticompact already repaired sstables into pending, but that should not be a big problem? * In CompactionManager#getSSTablesToValidate we still mark all unrepaired sstables as repairing - we don't need to do that for consistent repairs. And if we can do consistent repair for -full as well, all that code can be removed * In handleStatusRequest - if we don't have the local session, we should probably return that the session is failed? * Fixed some minor nits here: https://github.com/krummas/cassandra/commit/24ef8b2f6df98431d66519ee12452df3db84fd7d > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668657#comment-15668657 ] Blake Eggleston commented on CASSANDRA-9143: [~krummas], I pushed up a commit with better documentation. The javadoc I added to the {{ConsistentSession}} class is a fairly comprehensive description of the entire process. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657211#comment-15657211 ] Marcus Eriksson commented on CASSANDRA-9143: [~bdeggleston] just did a first read-through, and it looks quite straight forward, but I think one thing missing that would help with the review would be adding more comments, detailing the new message flow, error scenarios etc. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649189#comment-15649189 ] Blake Eggleston commented on CASSANDRA-9143: [~krummas] do you have time to review this? > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649177#comment-15649177 ] Blake Eggleston commented on CASSANDRA-9143: | [trunk|https://github.com/bdeggleston/cassandra/tree/9143-trunk] | [dtest|http://cassci.datastax.com/view/Dev/view/bdeggleston/job/bdeggleston-9143-trunk-dtest/] | [testall|http://cassci.datastax.com/view/Dev/view/bdeggleston/job/bdeggleston-9143-trunk-testall/] | | [3.0|https://github.com/bdeggleston/cassandra/tree/9143-3.0] | [dtest|http://cassci.datastax.com/view/Dev/view/bdeggleston/job/bdeggleston-9143-3.0-dtest/] | [testall|http://cassci.datastax.com/view/Dev/view/bdeggleston/job/bdeggleston-9143-3.0-testall/]| [dtest branch|https://github.com/bdeggleston/cassandra-dtest/tree/9143] I've tried to break this up into logical commits for each component of the change to make reviewing easier. The new incremental repair would work as follows: # persist session locally on each repair participant # anti-compact all unrepaired sstables intersecting with the range being repaired into a pending repair bucket # perform validation/sync against the sstables segregated in the pending anti compaction step # perform 2PC to promote pending repair sstables into repaired #* If this, or the validation/sync phase fails, the sstables are moved back into unrepaired Since incremental repair is the default in 3.0, I've also included a patch which fixes the consistency problems in 3.0, and is backwards compatible with the existing repair. That said, I'm not really convinced that making a change like this to repair in 3.0.x is a great idea. I'd be more in favor of disabling incremental repair, or at least not making it the default in 3.0.x. The compaction that gets kicked off after streamed sstables are added to the cfs means that whether repaired data is ultimately placed in the repaired or unrepaired bucket by anti-compaction is basically a crapshoot. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648298#comment-15648298 ] Blake Eggleston commented on CASSANDRA-9143: Just wanted to point out that [~pauloricardomg] found another source of repaired data inconsistency in CASSANDRA-10446. Since streamed data includes the repairedAt value for the in progress session, if the session fails, it's possible that a node will consider data repaired that another node may have never seen. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449782#comment-15449782 ] sankalp kohli commented on CASSANDRA-9143: -- I think we should fix this in 3.0 as incremental repair in default on it 3.0. Without this patch, it will be very difficult to run incremental repair in 3.0. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15445062#comment-15445062 ] Marcus Eriksson commented on CASSANDRA-9143: I think the approach makes sense - only worry is that if repair fails we will have increased the number of sstables on the node and for LCS we might have to drop those new sstables back to L0 due to other compactions going on during the repair. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440135#comment-15440135 ] Blake Eggleston commented on CASSANDRA-9143: bq. One approach would be to skip upfront anti-compaction if unrepaired set is above some size treshold The larger a repair job is, the more likely it is you'll see inconsistencies cause by compaction. The cost of inconsistencies will increase as well. My thinking was that we would add something like {{Map}} to the compaction manager, and let the sstable silos work normally. I don't know if it would make sense, but we could use a noop strategy for jobs under some size threshold. bq. some safety mechanism (timeout, etc) that releases sstables from the pending repair bucket Seems reasonable > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440059#comment-15440059 ] Paulo Motta commented on CASSANDRA-9143: bq. Both are really manifestations of the same root problem: incremental repair behaves unpredictably because data being repaired isn't kept separate from unrepaired data during repair. Maybe we should expand the problem description, and close CASSANDRA-8858 as a dupe? Thanks for clarifying, we should definitely update the title and description since a more general problem is being tackled here from the one originally stated on the ticket. I agree we should close CASSANDRA-8858 since that will be superseded by this. bq. We’d have to be optimistic and anti-compact all the tables and ranges we’re going to be repairing prior to validation. Obviously, failed ranges would have to be re-anticompacted back into unrepaired. The cost of this would have to be compared to the higher network io caused by the current state of things, and the frequency of failed ranges. I think that's a good idea and could also help mitigate repair impact on vnodes due to multiple flushes to run validations for every vnode (CASSANDRA-9491, CASSANDRA-10862), since we would only validate the anti-compacted sstables from the beginning of the parent repair session. On the other hand, we should think carefully about how sstables in the pending repair bucket will be handled, since holding compaction of these sstables for a long time could lead to poor read performance and extra compaction I/O after repair For frequently running incremental repair this shouldn't be a problem since repairs should be fast, but if many unrepaired sstables pile up (or in the case of full repairs), then this could become a problem. One approach would be to skip upfront anti-compaction if unrepaired set is above some size treshold (or full repairs) and fall back to anti-compaction at the end as done now. Also, there sould probably be some safety mechanism (timeout, etc) that releases sstables from the pending repair bucket if they're there for a long time as Marcus suggested on CASSANDRA-5351. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439978#comment-15439978 ] Blake Eggleston commented on CASSANDRA-9143: [~krummas], [~yukim], [~pauloricardomg], any thoughts here? > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436253#comment-15436253 ] sankalp kohli commented on CASSANDRA-9143: -- I was not aware that we are mixing repaired and not-repaired data during compaction when I created this ticket. If we are plan to move anti compaction as the first phase, we can fix both issues with this change instead of splitting them. Since there is not much overlap here..I think we should fix both here. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436054#comment-15436054 ] Blake Eggleston commented on CASSANDRA-9143: bq. it sounds slightly different from the original problem description Both are really manifestations of the same root problem: incremental repair behaves unpredictably because data being repaired isn't kept separate from unrepaired data during repair. Maybe we should expand the problem description, and close CASSANDRA-8858 as a dupe? bq. How do you plan to perform anti-compaction up-front? We’d have to be optimistic and anti-compact all the tables and ranges we’re going to be repairing prior to validation. Obviously, failed ranges would have to be re-anticompacted back into unrepaired. The cost of this would have to be compared to the higher network io caused by the current state of things, and the frequency of failed ranges. bq. I propose we start with the original idea of adding a 2PC to anti-compaction as suggested in the ticket description and perhaps on the top of that pursue anti-compaction checkpoints/hints in separate ticket This only solves part of the problem. We’re still leaking repaired data during compaction. I think it makes sense to talk about the over arching problem of keeping repaired and unrepaired data separate first. We can still handle each of the cases separately if it makes sense to. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435885#comment-15435885 ] Paulo Motta commented on CASSANDRA-9143: bq. Since sstables compacted since the beginning of a repair are excluded from anticompaction, normal compaction is enough to create large inconsistencies of the data each node considers repaired. This will cause repaired data to be considered unrepaired, which will cause a lot of unnecessary streaming on the next repair. While this is a relevant problem, it sounds slightly different from the original problem description, which is to improve the consistency of the repairedAt field, which can become inconsistent when a node fails mid-anti-compaction at the end of the parent repair session. Do you plan to tackle only the original problem, or also the problem of losing repair information from compacted sstables during repair (which is a bit harder problem)? bq. We do the anticompaction up front, but put the anticompacted data into the pending bucket. How do you plan to perform anti-compaction up-front? As Marcus pointed out, we defer anti-compaction to the end of the parent repair session to avoid re-anti-compacting multi-range sstables as repair progresses, so we need to have a strategy here to avoid or minimize that. But we could perhaps let operators trade-off increased I/O for more accurate repair information with anti-compaction check-points during long-running repairs. So I propose we start with the original idea of adding a 2PC to anti-compaction as suggested in the ticket description and perhaps on the top of that pursue anti-compaction checkpoints/hints in separate ticket? > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435649#comment-15435649 ] Blake Eggleston commented on CASSANDRA-9143: Since sstables compacted since the beginning of a repair are excluded from anticompaction, normal compaction is enough to create large inconsistencies of the data each node considers repaired. This will cause repaired data to be considered unrepaired, which will cause a lot of unnecessary streaming on the next repair. At a high level, the least complicated solution I’ve come up with is: Add a ‘pending repair’ bucket to the existing repaired and unrepaired sstable buckets. We do the anticompaction up front, but put the anticompacted data into the pending bucket. From here, the repair proceeds normally against the pending sstables, with the streamed sstables also going into the pending buckets. Once all nodes have completed streaming, the pending sstables are moved into the repaired bucket, or back into unrepaired if there’s a failure. This should keep each replica’s notion of what’s repaired identical, and minimize over streaming caused by large chunks of repaired data being classified as unrepaired. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487879#comment-14487879 ] sankalp kohli commented on CASSANDRA-9143: -- Make sense. We can send anticompaction request at parent session level and still do what I am suggesting. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487798#comment-14487798 ] Marcus Eriksson commented on CASSANDRA-9143: the reason we send anticompaction requests on the parent repair session level is that we want to do as little actual anticompaction as possible, ie, if the entire sstable is contained within the repaired range, we don't actually do any anticompaction on it, we just change the sstable metadata so, with a rf=3 cluster, and a nodetool repair -inc, we would not anticompact the same sstables 3 times, instead we would just update the metadata for the sstables on that node > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486455#comment-14486455 ] sankalp kohli commented on CASSANDRA-9143: -- Another improvement we can do is to not send anticompaction requests if there is no successful ranges. > Improving consistency of repairAt field across replicas > > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Priority: Minor > > We currently send an anticompaction request to all replicas. During this, a > node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons > leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session > level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another > message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as > repaired. > This will reduce the window of failure. We can also think of "hinting" > markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is > done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)