[ https://issues.apache.org/jira/browse/CASSANDRA-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007964#comment-16007964 ]
Sylvain Lebresne commented on CASSANDRA-8273: --------------------------------------------- bq. Obviously, moving the filtering to the coordinator would remove that problem, but doing so would, on top of not being trivial to implmenent, have serious performance impact since we can't know in advance how much data will be filtered and we may have to redo query to replica multiple times. That comment (from the description) is pretty old and isn't entirely accurate anymore so I want to amend it and expand on it. While it's obviously still true that moving filtering coordinator-side has performance impacts, it's now kind of trivial to do post-CASSANDRA-8099. Basically, I believe we just need to move the {{RowFilter#filter}} call that is currently in {{ReadCommand#executeLocally()}} to post-coordinator-reconciliation. Typically, to the {{postReconciliationProcessing()}} method that {{PartitionRangeReadCommand}} has that we would just generalize to all {{ReadCommand}} (that is, adding it to {{SinglePartitionReadCommand}}). In particular, while it's still true that we'll have to redo queries when filtering makes us fall short on a first try, the "short read protection" from {{DataResolver}} actually handles this for us reasonably nicely. Of course, there is the performance concerns, which concretely come in 2 flavors: # we'll transfer everything that is filtered from the replica to the coordinator while we don't today. # as a consequence and as mentioned above, we'll have to (usually) do multiple coordinator<->replica queries to get a particular count of final rows, when it's only one today. I do want to note the following though: * For CL.ONE, and as noted by Robert above, this is not really a big deal. There is actually no impact if you use a token-aware client. If you don't, then we could theoretically push the filtering on the replica in that specific case, but honestly, if you care about performance, you should be using token-awareness so I'm not convinced it's even worth adding any complexity for this (at the very least, for a v1, we don't currently ship the CL with queries to replica, and while I'm sure we'll want to change that for other reasons at some point, I don't think we should bother here). * For higher CL, it's definitively a bigger impact, but here the thing: if you use a higher CL, that implies that you actually care about and _rely on_ CL guarantees, so I think no kind of performance matters if we don't fulfill those guarantees, and not fixing a know correctness issue because it impact performance is imo backward. I'll also note that while the 2nd flavor will certainly have an impact, the short-read protection from {{DataResolver}} is actually not too stupid about this and will "regulate" his 2nd query based on how much was filtered on the 1st one to limit the impact somehow. Not awesome, but better than nothing. Anyway, I'm personally in favor of fixing this by moving filtering server-side, as while this has performance impact, we shouldn't be fast at the expense of correctness. And I have no clue how to fix this replica-side and no-one offered a proper option for that in ~3 years. Let's we make things correct now, and _then_ we can think about how to optimize. I also do want to remind for context that {{ALLOW FILTERING}} is something we strongly advertise as not-a-great-idea for anything performance sensitive in the first place, so that's imo all the more reason to not agonize over performance too much and favor correctness first and foremost. > Allow filtering queries can return stale data > --------------------------------------------- > > Key: CASSANDRA-8273 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8273 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > > Data filtering is done replica side. That means that a single replica with > stale data may make the whole query return that stale data. > For instance, consider 3 replicas A, B and C, and the following situation: > {noformat} > CREATE TABLE test (k int PRIMARY KEY, v1 text, v2 int); > CREATE INDEX ON test(v1); > INSERT INTO test(k, v1, v2) VALUES (0, 'foo', 1); > {noformat} > with every replica up to date. Now, suppose that the following queries are > done at {{QUORUM}}: > {noformat} > UPDATE test SET v2 = 2 WHERE k = 0; > SELECT * FROM test WHERE v1 = 'foo' AND v2 = 1; > {noformat} > then, if A and B acknowledge the insert but C respond to the read before > having applied the insert, then the now stale result will be returned. Let's > note that this is a problem related to filtering, not 2ndary indexes. > This issue share similarity with CASSANDRA-8272 but contrarily to that former > issue, I'm not sure how to fix it. Obviously, moving the filtering to the > coordinator would remove that problem, but doing so would, on top of not > being trivial to implmenent, have serious performance impact since we can't > know in advance how much data will be filtered and we may have to redo query > to replica multiple times. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org