[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045933#comment-17045933 ] David Capwell commented on CASSANDRA-15553: --- I like the use of blocking filters, glad we don't transfer them =D Some of the tests make assumptions about time using thread.sleep; this could make them flaky. With CASSANDRA-15564 I setup utilities for querying repair state, so if that does end up happening we can always block waiting until the repair table is updating. I made the comment that the tests only pass because we look at the notifications, but with CASSANDRA-15564 that will be fixed so can ignore that comment. A test that would be good to add would be 1) run preview 2) delay the merle tree so the coordinator doesn't see it yet 3) run IR, wait for it to complete 4) unblock merle tree's If I am reading this correctly, the preview will fail even though the validation wouldn't. I am +1 > Preview repair should include sstables from finalized incremental repair > sessions > - > > Key: CASSANDRA-15553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15553 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > When running a preview repair we currently grab all repaired sstables, > problem is that we depend on compaction to move the sstables from pending to > repaired so we might have different data marked repaired on different nodes. > Including any sstables from finalized incremental repair sessions as repaired > will solve this. > Another problem is that validations don't start at exactly the same time on > different nodes, so if an incremental repair finishes while the preview > repair is running we might also validate the wrong repaired set. We should > fail the preview repair if an intersecting incremental repair finishes during > the preview repair. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045773#comment-17045773 ] Blake Eggleston commented on CASSANDRA-15553: - +1, and thanks for addressing each point in it's own commit. Makes it way easier to review the changes > Preview repair should include sstables from finalized incremental repair > sessions > - > > Key: CASSANDRA-15553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15553 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > When running a preview repair we currently grab all repaired sstables, > problem is that we depend on compaction to move the sstables from pending to > repaired so we might have different data marked repaired on different nodes. > Including any sstables from finalized incremental repair sessions as repaired > will solve this. > Another problem is that validations don't start at exactly the same time on > different nodes, so if an incremental repair finishes while the preview > repair is running we might also validate the wrong repaired set. We should > fail the preview repair if an intersecting incremental repair finishes during > the preview repair. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044977#comment-17044977 ] David Capwell commented on CASSANDRA-15553: --- Spoke to Marcus. I am fine with this patch failing preview rather than producing a false positive match for mismatch, though we should also improve it so we can remove this restriction later (just not this JIRA). Ill try to look at the update later today > Preview repair should include sstables from finalized incremental repair > sessions > - > > Key: CASSANDRA-15553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15553 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > When running a preview repair we currently grab all repaired sstables, > problem is that we depend on compaction to move the sstables from pending to > repaired so we might have different data marked repaired on different nodes. > Including any sstables from finalized incremental repair sessions as repaired > will solve this. > Another problem is that validations don't start at exactly the same time on > different nodes, so if an incremental repair finishes while the preview > repair is running we might also validate the wrong repaired set. We should > fail the preview repair if an intersecting incremental repair finishes during > the preview repair. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044304#comment-17044304 ] Marcus Eriksson commented on CASSANDRA-15553: - Pushed another [commit|https://github.com/krummas/cassandra/commit/b28d63aed8b09f49936b06b7ec436c53ca3e5ec9] which should handle the case where FINALIZE_COMMIT_MSG is lost or delayed - if we encounter any pending non-finalized intersecting sstables we fail the preview repair. > Preview repair should include sstables from finalized incremental repair > sessions > - > > Key: CASSANDRA-15553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15553 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > When running a preview repair we currently grab all repaired sstables, > problem is that we depend on compaction to move the sstables from pending to > repaired so we might have different data marked repaired on different nodes. > Including any sstables from finalized incremental repair sessions as repaired > will solve this. > Another problem is that validations don't start at exactly the same time on > different nodes, so if an incremental repair finishes while the preview > repair is running we might also validate the wrong repaired set. We should > fail the preview repair if an intersecting incremental repair finishes during > the preview repair. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043222#comment-17043222 ] Marcus Eriksson commented on CASSANDRA-15553: - bq. CassandraValidationIterator.java:61 - stray whitespace at top of class def https://github.com/krummas/cassandra/commit/142d11c1539ed677dfd4e1f21fb09a6fd68b79fc bq. LocalSessions$Listener - it would be nice to have a more descriptive method name `onIncrementalStateChange` or something. We have at least a few listeners with generic `onStateChange` methods https://github.com/krummas/cassandra/commit/c4248ddce35a14549e617379fc2fabdd275ee3a1 (made it {{onIRStateChange}}) bq. RepairSession#stateChange - we should return after forcingShutdown. Alternatively, we could use Iterables.any(ranges(), r -> r.intersects(session.ranges) instead of iterating over the ranges. https://github.com/krummas/cassandra/commit/dd8cafea5debfb9cf90794ed3f9cc4a258681717 bq. RepairSession#involvesTables - includesSSTables or containsSSTables might be a better name https://github.com/krummas/cassandra/commit/8f212ad77e53c0823a56fbc0d7da04f085200f56 ({{includesTables}} - we are not dealing with sstables here) > Preview repair should include sstables from finalized incremental repair > sessions > - > > Key: CASSANDRA-15553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15553 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > When running a preview repair we currently grab all repaired sstables, > problem is that we depend on compaction to move the sstables from pending to > repaired so we might have different data marked repaired on different nodes. > Including any sstables from finalized incremental repair sessions as repaired > will solve this. > Another problem is that validations don't start at exactly the same time on > different nodes, so if an incremental repair finishes while the preview > repair is running we might also validate the wrong repaired set. We should > fail the preview repair if an intersecting incremental repair finishes during > the preview repair. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040575#comment-17040575 ] David Capwell commented on CASSANDRA-15553: --- If compaction only compacted <= "largest contiguous" then it would work, the big risk is if there are bugs in IR where we can't promote that because of gaps (caused by ephemeral issues such as message loss, so would require CASSANDRA-15566) > Preview repair should include sstables from finalized incremental repair > sessions > - > > Key: CASSANDRA-15553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15553 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > When running a preview repair we currently grab all repaired sstables, > problem is that we depend on compaction to move the sstables from pending to > repaired so we might have different data marked repaired on different nodes. > Including any sstables from finalized incremental repair sessions as repaired > will solve this. > Another problem is that validations don't start at exactly the same time on > different nodes, so if an incremental repair finishes while the preview > repair is running we might also validate the wrong repaired set. We should > fail the preview repair if an intersecting incremental repair finishes during > the preview repair. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040573#comment-17040573 ] David Capwell commented on CASSANDRA-15553: --- That would be fine; min is still smaller than the max one seen, so that wouldn't break the snapshot isolation. > Preview repair should include sstables from finalized incremental repair > sessions > - > > Key: CASSANDRA-15553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15553 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > When running a preview repair we currently grab all repaired sstables, > problem is that we depend on compaction to move the sstables from pending to > repaired so we might have different data marked repaired on different nodes. > Including any sstables from finalized incremental repair sessions as repaired > will solve this. > Another problem is that validations don't start at exactly the same time on > different nodes, so if an incremental repair finishes while the preview > repair is running we might also validate the wrong repaired set. We should > fail the preview repair if an intersecting incremental repair finishes during > the preview repair. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040546#comment-17040546 ] Blake Eggleston commented on CASSANDRA-15553: - {quote}assuming compaction also maintains repairedAt {quote} not in a way that would make this possible. Once an sstable gets into the repaired bucket, it will be compacted with other repaired sstables, potentially with different repaired at timestamps (and keeping the min timestamp, I believe) . > Preview repair should include sstables from finalized incremental repair > sessions > - > > Key: CASSANDRA-15553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15553 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > When running a preview repair we currently grab all repaired sstables, > problem is that we depend on compaction to move the sstables from pending to > repaired so we might have different data marked repaired on different nodes. > Including any sstables from finalized incremental repair sessions as repaired > will solve this. > Another problem is that validations don't start at exactly the same time on > different nodes, so if an incremental repair finishes while the preview > repair is running we might also validate the wrong repaired set. We should > fail the preview repair if an intersecting incremental repair finishes during > the preview repair. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040506#comment-17040506 ] David Capwell commented on CASSANDRA-15553: --- Took a look and had to look closer at IR messaging, what I see is the following IR messaging is fire-and-forget pattern, so any ephemeral issues lead to messages not being seen (tests show this CASSANDRA-15564 and have been reported as issues with current repair CASSANDRA-15566). This patch relies on the FINALIZE_COMMIT_MSG being seen on the coordinator of the IR preview repair in order to detect conflict, but the message is seen asynchronously so may see this on the participants while validation is running and seen on the coordinator after all validations have been seen on the coordinator (so session is already complete); in this case you have the same issue as reported by this JIRA. This patch also affectively blocks preview and IR running for the same range as the preview will fail with conflict*, so IR should stop scheduling if preview is running, and preview should not be scheduled while IR is running (else we waste the resources on validation); effectively what ever is scheduling the repairs will have to be enhanced to handle this which adds more complexity to operators. I actually wonder if we can remove this restriction. What it looks like to me is that repairedAt is system time (aka, could have drift, could roll backwards, etc.), but we could keep track of largest one and make sure this counter is monotonic. With a data structure of * largest contiguous commit (long) * inFlight (array of long) We could make sure that we (coordinator) always produce a repairedAt larger than any we know of, and this lets preview take a snapshot of the state at the start of coordination. With this snapshot, we filter for repaired and repairedAt <= largest contiguous commit snapshot; this should give preview repair effectively snapshot isolation (assuming compaction also maintains repairedAt). * In CASSANDRA-15564 I show that preview doesn't properly check session failures, run [this test|https://github.com/apache/cassandra/pull/446/files#diff-af4a07a2b44695f510dddb0c102e1953R28] and [this one|https://github.com/apache/cassandra/pull/446/files#diff-ca9f3b43ad8ff955d6ddd2ef4d2b6904R28] without the change in the JIRA to see it. The reason your tests are different is because you don't use nodetool and directly monitor notifications. > Preview repair should include sstables from finalized incremental repair > sessions > - > > Key: CASSANDRA-15553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15553 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-alpha > > > When running a preview repair we currently grab all repaired sstables, > problem is that we depend on compaction to move the sstables from pending to > repaired so we might have different data marked repaired on different nodes. > Including any sstables from finalized incremental repair sessions as repaired > will solve this. > Another problem is that validations don't start at exactly the same time on > different nodes, so if an incremental repair finishes while the preview > repair is running we might also validate the wrong repaired set. We should > fail the preview repair if an intersecting incremental repair finishes during > the preview repair. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org