[ https://issues.apache.org/jira/browse/CASSANDRA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030870#comment-16030870 ]
Sylvain Lebresne commented on CASSANDRA-8272: --------------------------------------------- bq. So, if the query limit requires n rows, we should return not more than {{n}} rows satisfying the filter, and *all* the rows not satisfying the index but being pointed by a deleted index entry. No, I don't think we have to return *all* the rows not satisfying the index. I believe only returning those that are *before* the {{n}} th "valid" entry is enough. I don't think it's different from how we handle tombstones here: we don't return all tombstones, just the ones before the {{n}} th live results. Note that both with those new "invalid" entries and with tombstones, it's possible that post-resolution on the coordinator we end up being short on results. That is, a "valid" result from A is canceled by a tombstone/"invalid" result of B and vice-versa and we end up with less results than requested. But that's where the short-read protection from {{DataResolver}} kicks in. > 2ndary indexes can return stale data > ------------------------------------ > > Key: CASSANDRA-8272 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8272 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > Assignee: Andrés de la Peña > Fix For: 3.0.x > > > When replica return 2ndary index results, it's possible for a single replica > to return a stale result and that result will be sent back to the user, > potentially failing the CL contract. > For instance, consider 3 replicas A, B and C, and the following situation: > {noformat} > CREATE TABLE test (k int PRIMARY KEY, v text); > CREATE INDEX ON test(v); > INSERT INTO test(k, v) VALUES (0, 'foo'); > {noformat} > with every replica up to date. Now, suppose that the following queries are > done at {{QUORUM}}: > {noformat} > UPDATE test SET v = 'bar' WHERE k = 0; > SELECT * FROM test WHERE v = 'foo'; > {noformat} > then, if A and B acknowledge the insert but C respond to the read before > having applied the insert, then the now stale result will be returned (since > C will return it and A or B will return nothing). > A potential solution would be that when we read a tombstone in the index (and > provided we make the index inherit the gcGrace of it's parent CF), instead of > skipping that tombstone, we'd insert in the result a corresponding range > tombstone. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org