[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions

2020-02-26 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045933#comment-17045933
 ] 

David Capwell commented on CASSANDRA-15553:
---

I like the use of blocking filters, glad we don't transfer them =D

Some of the tests make assumptions about time using thread.sleep; this could 
make them flaky.  With CASSANDRA-15564 I setup utilities for querying repair 
state, so if that does end up happening we can always block waiting until the 
repair table is updating.

I made the comment that the tests only pass because we look at the 
notifications, but with CASSANDRA-15564 that will be fixed so can ignore that 
comment.

A test that would be good to add would be
1) run preview
2) delay the merle tree so the coordinator doesn't see it yet
3) run IR, wait for it to complete
4) unblock merle tree's

If I am reading this correctly, the preview will fail even though the 
validation wouldn't.

I am +1

> Preview repair should include sstables from finalized incremental repair 
> sessions
> -
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables, 
> problem is that we depend on compaction to move the sstables from pending to 
> repaired so we might have different data marked repaired on different nodes. 
> Including any sstables from finalized incremental repair sessions as repaired 
> will solve this.
> Another problem is that validations don't start at exactly the same time on 
> different nodes, so if an incremental repair finishes while the preview 
> repair is running we might also validate the wrong repaired set. We should 
> fail the preview repair if an intersecting incremental repair finishes during 
> the preview repair.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions

2020-02-26 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045773#comment-17045773
 ] 

Blake Eggleston commented on CASSANDRA-15553:
-

+1, and thanks for addressing each point in it's own commit. Makes it way 
easier to review the changes

> Preview repair should include sstables from finalized incremental repair 
> sessions
> -
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables, 
> problem is that we depend on compaction to move the sstables from pending to 
> repaired so we might have different data marked repaired on different nodes. 
> Including any sstables from finalized incremental repair sessions as repaired 
> will solve this.
> Another problem is that validations don't start at exactly the same time on 
> different nodes, so if an incremental repair finishes while the preview 
> repair is running we might also validate the wrong repaired set. We should 
> fail the preview repair if an intersecting incremental repair finishes during 
> the preview repair.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions

2020-02-25 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044977#comment-17044977
 ] 

David Capwell commented on CASSANDRA-15553:
---

Spoke to Marcus.  I am fine with this patch failing preview rather than 
producing a false positive match for mismatch, though we should also improve it 
so we can remove this restriction later (just not this JIRA).

Ill try to look at the update later today

> Preview repair should include sstables from finalized incremental repair 
> sessions
> -
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables, 
> problem is that we depend on compaction to move the sstables from pending to 
> repaired so we might have different data marked repaired on different nodes. 
> Including any sstables from finalized incremental repair sessions as repaired 
> will solve this.
> Another problem is that validations don't start at exactly the same time on 
> different nodes, so if an incremental repair finishes while the preview 
> repair is running we might also validate the wrong repaired set. We should 
> fail the preview repair if an intersecting incremental repair finishes during 
> the preview repair.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions

2020-02-25 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044304#comment-17044304
 ] 

Marcus Eriksson commented on CASSANDRA-15553:
-

Pushed another 
[commit|https://github.com/krummas/cassandra/commit/b28d63aed8b09f49936b06b7ec436c53ca3e5ec9]
 which should handle the case where FINALIZE_COMMIT_MSG is lost or delayed - if 
we encounter any pending non-finalized intersecting sstables we fail the 
preview repair.

> Preview repair should include sstables from finalized incremental repair 
> sessions
> -
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables, 
> problem is that we depend on compaction to move the sstables from pending to 
> repaired so we might have different data marked repaired on different nodes. 
> Including any sstables from finalized incremental repair sessions as repaired 
> will solve this.
> Another problem is that validations don't start at exactly the same time on 
> different nodes, so if an incremental repair finishes while the preview 
> repair is running we might also validate the wrong repaired set. We should 
> fail the preview repair if an intersecting incremental repair finishes during 
> the preview repair.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions

2020-02-24 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043222#comment-17043222
 ] 

Marcus Eriksson commented on CASSANDRA-15553:
-

bq. CassandraValidationIterator.java:61 - stray whitespace at top of class def

https://github.com/krummas/cassandra/commit/142d11c1539ed677dfd4e1f21fb09a6fd68b79fc

bq. LocalSessions$Listener - it would be nice to have a more descriptive method 
name `onIncrementalStateChange` or something. We have at least a few listeners 
with generic `onStateChange` methods

https://github.com/krummas/cassandra/commit/c4248ddce35a14549e617379fc2fabdd275ee3a1
 (made it {{onIRStateChange}})

bq. RepairSession#stateChange - we should return after forcingShutdown. 
Alternatively, we could use Iterables.any(ranges(), r -> 
r.intersects(session.ranges) instead of iterating over the ranges.

https://github.com/krummas/cassandra/commit/dd8cafea5debfb9cf90794ed3f9cc4a258681717

bq. RepairSession#involvesTables - includesSSTables or containsSSTables might 
be a better name

https://github.com/krummas/cassandra/commit/8f212ad77e53c0823a56fbc0d7da04f085200f56
 ({{includesTables}} - we are not dealing with sstables here)

> Preview repair should include sstables from finalized incremental repair 
> sessions
> -
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables, 
> problem is that we depend on compaction to move the sstables from pending to 
> repaired so we might have different data marked repaired on different nodes. 
> Including any sstables from finalized incremental repair sessions as repaired 
> will solve this.
> Another problem is that validations don't start at exactly the same time on 
> different nodes, so if an incremental repair finishes while the preview 
> repair is running we might also validate the wrong repaired set. We should 
> fail the preview repair if an intersecting incremental repair finishes during 
> the preview repair.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions

2020-02-19 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040575#comment-17040575
 ] 

David Capwell commented on CASSANDRA-15553:
---

If compaction only compacted <= "largest contiguous" then it would work, the 
big risk is if there are bugs in IR where we can't promote that because of gaps 
(caused by ephemeral issues such as message loss, so would require 
CASSANDRA-15566)

> Preview repair should include sstables from finalized incremental repair 
> sessions
> -
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables, 
> problem is that we depend on compaction to move the sstables from pending to 
> repaired so we might have different data marked repaired on different nodes. 
> Including any sstables from finalized incremental repair sessions as repaired 
> will solve this.
> Another problem is that validations don't start at exactly the same time on 
> different nodes, so if an incremental repair finishes while the preview 
> repair is running we might also validate the wrong repaired set. We should 
> fail the preview repair if an intersecting incremental repair finishes during 
> the preview repair.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions

2020-02-19 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040573#comment-17040573
 ] 

David Capwell commented on CASSANDRA-15553:
---

That would be fine; min is still smaller than the max one seen, so that 
wouldn't break the snapshot isolation.

> Preview repair should include sstables from finalized incremental repair 
> sessions
> -
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables, 
> problem is that we depend on compaction to move the sstables from pending to 
> repaired so we might have different data marked repaired on different nodes. 
> Including any sstables from finalized incremental repair sessions as repaired 
> will solve this.
> Another problem is that validations don't start at exactly the same time on 
> different nodes, so if an incremental repair finishes while the preview 
> repair is running we might also validate the wrong repaired set. We should 
> fail the preview repair if an intersecting incremental repair finishes during 
> the preview repair.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions

2020-02-19 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040546#comment-17040546
 ] 

Blake Eggleston commented on CASSANDRA-15553:
-

{quote}assuming compaction also maintains repairedAt
{quote}
not in a way that would make this possible. Once an sstable gets into the 
repaired bucket, it will be compacted with other repaired sstables, potentially 
with different repaired at timestamps (and keeping the min timestamp, I 
believe) .

> Preview repair should include sstables from finalized incremental repair 
> sessions
> -
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables, 
> problem is that we depend on compaction to move the sstables from pending to 
> repaired so we might have different data marked repaired on different nodes. 
> Including any sstables from finalized incremental repair sessions as repaired 
> will solve this.
> Another problem is that validations don't start at exactly the same time on 
> different nodes, so if an incremental repair finishes while the preview 
> repair is running we might also validate the wrong repaired set. We should 
> fail the preview repair if an intersecting incremental repair finishes during 
> the preview repair.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15553) Preview repair should include sstables from finalized incremental repair sessions

2020-02-19 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040506#comment-17040506
 ] 

David Capwell commented on CASSANDRA-15553:
---

Took a look and had to look closer at IR messaging, what I see is the following

IR messaging is fire-and-forget pattern, so any ephemeral issues lead to 
messages not being seen (tests show this CASSANDRA-15564 and have been reported 
as issues with current repair CASSANDRA-15566).  This patch relies on the 
FINALIZE_COMMIT_MSG being seen on the coordinator of the IR preview repair in 
order to detect conflict, but the message is seen asynchronously so may see 
this on the participants while validation is running and seen on the 
coordinator after all validations have been seen on the coordinator (so session 
is already complete); in this case you have the same issue as reported by this 
JIRA.

This patch also affectively blocks preview and IR running for the same range as 
the preview will fail with conflict*, so IR should stop scheduling if preview 
is running, and preview should not be scheduled while IR is running (else we 
waste the resources on validation); effectively what ever is scheduling the 
repairs will have to be enhanced to handle this which adds more complexity to 
operators.

I actually wonder if we can remove this restriction.  What it looks like to me 
is that repairedAt is system time (aka, could have drift, could roll backwards, 
etc.), but we could keep track of largest one and make sure this counter is 
monotonic.  With a data structure of 

* largest contiguous commit (long)
* inFlight (array of long)

We could make sure that we (coordinator) always produce a repairedAt larger 
than any we know of, and this lets preview take a snapshot of the state at the 
start of coordination. With this snapshot, we filter for repaired and 
repairedAt <= largest contiguous commit snapshot; this should give preview 
repair effectively snapshot isolation (assuming compaction also maintains 
repairedAt).

* In CASSANDRA-15564 I show that preview doesn't properly check session 
failures, run [this 
test|https://github.com/apache/cassandra/pull/446/files#diff-af4a07a2b44695f510dddb0c102e1953R28]
 and [this 
one|https://github.com/apache/cassandra/pull/446/files#diff-ca9f3b43ad8ff955d6ddd2ef4d2b6904R28]
 without the change in the JIRA to see it.  The reason your tests are different 
is because you don't use nodetool and directly monitor notifications.

> Preview repair should include sstables from finalized incremental repair 
> sessions
> -
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables, 
> problem is that we depend on compaction to move the sstables from pending to 
> repaired so we might have different data marked repaired on different nodes. 
> Including any sstables from finalized incremental repair sessions as repaired 
> will solve this.
> Another problem is that validations don't start at exactly the same time on 
> different nodes, so if an incremental repair finishes while the preview 
> repair is running we might also validate the wrong repaired set. We should 
> fail the preview repair if an intersecting incremental repair finishes during 
> the preview repair.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org