[ https://issues.apache.org/jira/browse/CASSANDRA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006545#comment-16006545 ]
Sergio Bossa commented on CASSANDRA-8272: ----------------------------------------- bq. my main point is that on principle we should be careful to look at the whole solution before comparing it to alternatives and deciding which one we "stick with". I've seen simple solutions get pretty messy once you fix all edge cases to the point that it wasn't the best solution anymore. Of course. And I indeed gave some thoughts on my own to the tombstones solution (as I'm sure [~adelapena] did as well), and I've found it quite more complex that the current one, with little/no gains in return, and, something I didn't mention before, not really complete for indexes covering multiple columns, or if we'll ever want to support multiple indexes per row: in such cases, mixing tombstones and valid column values for all combinations would easily turn into a mess IMHO, while actually returning the row and later post-filter is IMHO cleaner and less error prone. To be noted, we could still "skim" the row when we detect it's related to a stale entry and only keep the index-related columns (and easily add a merging step in the future for the multiple indexes cases): this would buy us the performance optimization you mentioned above, but I see it slightly error prone and I'd rather go with a functionally complete solution first. bq. It's in particular not true that fixing this bug will be "invalidated when filtering is applied" I disagree here: if filtering is applied on top of index results, you'll still get wrong results, which is confusing to me (as a user). I understand filtering is also orthogonal, so what about fixing filtering (that is, moving to coordinator-side filtering) only when indexes are present? bq. That [fixing other index implementations] I agree is something we should consider. Though tbh, I have doubts we can have a solution that is completely index agnostic. Of course. But we can still provide some API (i.e. the {{isSatisfiedBy()}} you mentioned) they can leverage. And if we do this kind of work on the SASI-enabled branches, we'll have two different index implementations to test the goodness of our API. bq. One thing that hasn't been mentioned is that the fix has impact on upgrades. Namely, in a mixed cluster, some replica will start to return invalid results and if the coordinator isn't upgraded yet, it won't filter those, which means we'll return invalid entries. Excellent point! And definitely something to avoid. bq. That does mean we should consider starting to filter entries on index queries coordinator-side in 3.0/3.11 (even though we never return them), and only do the replica-side parts in 4.0, with a fat warning that you need to only upgrade to 4.0 from a 3.X version that has the coordinator-side fix. Mmmhhhh ... clunky. And error prone as the 3.X code would be probably untestable. Couldn't the replica detect the coordinator version and return results accordingly? bq. Worth noting that this doesn't entirely fly for index using custom indexes: we'd need to have them implement the CustomExpression#isSatistiedBy method in 3.X in that scheme since we need it for the coordinator-side filtering as well, but making that method abstract in 3.X is, as said above, a breaking change. I'm not sure I get why you _have to_ make that abstract: I think it's fine to leave it as it is and warn users they'll have to override it on upgrade if they want consistent results. And for those implementations that can't implement it, we should maybe add a {{isConsistent}} predicate to disable "consistent filtering" altogether. > 2ndary indexes can return stale data > ------------------------------------ > > Key: CASSANDRA-8272 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8272 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > Assignee: Andrés de la Peña > Fix For: 3.0.x > > > When replica return 2ndary index results, it's possible for a single replica > to return a stale result and that result will be sent back to the user, > potentially failing the CL contract. > For instance, consider 3 replicas A, B and C, and the following situation: > {noformat} > CREATE TABLE test (k int PRIMARY KEY, v text); > CREATE INDEX ON test(v); > INSERT INTO test(k, v) VALUES (0, 'foo'); > {noformat} > with every replica up to date. Now, suppose that the following queries are > done at {{QUORUM}}: > {noformat} > UPDATE test SET v = 'bar' WHERE k = 0; > SELECT * FROM test WHERE v = 'foo'; > {noformat} > then, if A and B acknowledge the insert but C respond to the read before > having applied the insert, then the now stale result will be returned (since > C will return it and A or B will return nothing). > A potential solution would be that when we read a tombstone in the index (and > provided we make the index inherit the gcGrace of it's parent CF), instead of > skipping that tombstone, we'd insert in the result a corresponding range > tombstone. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org