[ 
https://issues.apache.org/jira/browse/CASSANDRA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039285#comment-16039285
 ] 

Andrés de la Peña commented on CASSANDRA-8272:
----------------------------------------------

Here is a new version of the patch for 3.11 and trunk:

||[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...adelapena:454617607063bfb554b841f0d891798404faf0b1]||[trunk|https://github.com/apache/cassandra/compare/trunk...adelapena:1416d9b082d7f93b187cbf67abd9a917735c4804]||[dtests|https://github.com/riptano/cassandra-dtest/compare/master...adelapena:CASSANDRA-8272]||

Unfortunately cassci service is not working right now, I'll add the test 
results as soon as it comes back to live.

bq. No, I don't think we have to return all the rows not satisfying the index. 
I believe only returning those that are before the {{n}} th "valid" entry is 
enough. I don't think it's different from how we handle tombstones here: we 
don't return all tombstones, just the ones before the {{n}} th live results.
bq. Note that both with those new "invalid" entries and with tombstones, it's 
possible that post-resolution on the coordinator we end up being short on 
results. That is, a "valid" result from A is canceled by a tombstone/"invalid" 
result of B and vice-versa and we end up with less results than requested. But 
that's where the short-read protection from {{DataResolver}} kicks in.

Indeed, short-read protection solves the problem, so I have left the 
{{DataLimits.Counter}} as a stopping transformation. I have added [some 
dtests|https://github.com/adelapena/cassandra-dtest/blob/CASSANDRA-8272/secondary_indexes_test.py#L1205-L1343]
 checking these scenarios with indexes.

bq. As an aside, had a very very quick scan of the patch, and I'll also note 
that in {{StorageProxy}} and 
{{SinglePartitionReadCommand.Group.executeInternal}}, using only the 
post-processor of the 1st command would break if the index actually makes 
assumption based on the command it's passed on, so it feels dodgy and I think 
we sould make sure it's applied to each command result individually.

Yes, it is dodgy. I have changed it to apply the post processing to each 
command in the group. 

As we said, the patch for 3.11 only contains the changes in the coordinator 
side. I have added [a 
test|https://github.com/adelapena/cassandra/blob/8272-3.11/test/unit/org/apache/cassandra/index/CustomIndexTest.java#L804-L871]
 in {{CustomIndexTest}} that uses [a custom index 
implementation|https://github.com/adelapena/cassandra/blob/8272-3.11/test/unit/org/apache/cassandra/index/CustomIndexTest.java#L1180-L1256]
 to validate coordinator side filtering.

The patch for trunk also modifies regular secondary indexes to send stale rows. 
SASI don't uses the mechanism because of the aforementioned problem with 
expressions evaluation and text analysis, I think we should fix this in a 
separate ticket.

Please let me know what do you think.

> 2ndary indexes can return stale data
> ------------------------------------
>
>                 Key: CASSANDRA-8272
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8272
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Andrés de la Peña
>             Fix For: 3.0.x
>
>
> When replica return 2ndary index results, it's possible for a single replica 
> to return a stale result and that result will be sent back to the user, 
> potentially failing the CL contract.
> For instance, consider 3 replicas A, B and C, and the following situation:
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v text);
> CREATE INDEX ON test(v);
> INSERT INTO test(k, v) VALUES (0, 'foo');
> {noformat}
> with every replica up to date. Now, suppose that the following queries are 
> done at {{QUORUM}}:
> {noformat}
> UPDATE test SET v = 'bar' WHERE k = 0;
> SELECT * FROM test WHERE v = 'foo';
> {noformat}
> then, if A and B acknowledge the insert but C respond to the read before 
> having applied the insert, then the now stale result will be returned (since 
> C will return it and A or B will return nothing).
> A potential solution would be that when we read a tombstone in the index (and 
> provided we make the index inherit the gcGrace of it's parent CF), instead of 
> skipping that tombstone, we'd insert in the result a corresponding range 
> tombstone.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to