[ https://issues.apache.org/jira/browse/CASSANDRA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039285#comment-16039285 ]
Andrés de la Peña commented on CASSANDRA-8272: ---------------------------------------------- Here is a new version of the patch for 3.11 and trunk: ||[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...adelapena:454617607063bfb554b841f0d891798404faf0b1]||[trunk|https://github.com/apache/cassandra/compare/trunk...adelapena:1416d9b082d7f93b187cbf67abd9a917735c4804]||[dtests|https://github.com/riptano/cassandra-dtest/compare/master...adelapena:CASSANDRA-8272]|| Unfortunately cassci service is not working right now, I'll add the test results as soon as it comes back to live. bq. No, I don't think we have to return all the rows not satisfying the index. I believe only returning those that are before the {{n}} th "valid" entry is enough. I don't think it's different from how we handle tombstones here: we don't return all tombstones, just the ones before the {{n}} th live results. bq. Note that both with those new "invalid" entries and with tombstones, it's possible that post-resolution on the coordinator we end up being short on results. That is, a "valid" result from A is canceled by a tombstone/"invalid" result of B and vice-versa and we end up with less results than requested. But that's where the short-read protection from {{DataResolver}} kicks in. Indeed, short-read protection solves the problem, so I have left the {{DataLimits.Counter}} as a stopping transformation. I have added [some dtests|https://github.com/adelapena/cassandra-dtest/blob/CASSANDRA-8272/secondary_indexes_test.py#L1205-L1343] checking these scenarios with indexes. bq. As an aside, had a very very quick scan of the patch, and I'll also note that in {{StorageProxy}} and {{SinglePartitionReadCommand.Group.executeInternal}}, using only the post-processor of the 1st command would break if the index actually makes assumption based on the command it's passed on, so it feels dodgy and I think we sould make sure it's applied to each command result individually. Yes, it is dodgy. I have changed it to apply the post processing to each command in the group. As we said, the patch for 3.11 only contains the changes in the coordinator side. I have added [a test|https://github.com/adelapena/cassandra/blob/8272-3.11/test/unit/org/apache/cassandra/index/CustomIndexTest.java#L804-L871] in {{CustomIndexTest}} that uses [a custom index implementation|https://github.com/adelapena/cassandra/blob/8272-3.11/test/unit/org/apache/cassandra/index/CustomIndexTest.java#L1180-L1256] to validate coordinator side filtering. The patch for trunk also modifies regular secondary indexes to send stale rows. SASI don't uses the mechanism because of the aforementioned problem with expressions evaluation and text analysis, I think we should fix this in a separate ticket. Please let me know what do you think. > 2ndary indexes can return stale data > ------------------------------------ > > Key: CASSANDRA-8272 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8272 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > Assignee: Andrés de la Peña > Fix For: 3.0.x > > > When replica return 2ndary index results, it's possible for a single replica > to return a stale result and that result will be sent back to the user, > potentially failing the CL contract. > For instance, consider 3 replicas A, B and C, and the following situation: > {noformat} > CREATE TABLE test (k int PRIMARY KEY, v text); > CREATE INDEX ON test(v); > INSERT INTO test(k, v) VALUES (0, 'foo'); > {noformat} > with every replica up to date. Now, suppose that the following queries are > done at {{QUORUM}}: > {noformat} > UPDATE test SET v = 'bar' WHERE k = 0; > SELECT * FROM test WHERE v = 'foo'; > {noformat} > then, if A and B acknowledge the insert but C respond to the read before > having applied the insert, then the now stale result will be returned (since > C will return it and A or B will return nothing). > A potential solution would be that when we read a tombstone in the index (and > provided we make the index inherit the gcGrace of it's parent CF), instead of > skipping that tombstone, we'd insert in the result a corresponding range > tombstone. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org