[ https://issues.apache.org/jira/browse/CASSANDRA-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253305#comment-14253305 ]
Sylvain Lebresne commented on CASSANDRA-8087: --------------------------------------------- bq. It then gets passed to CFS.makeExtendedFilter(), which makes an ExtendedFilter with maxResults as the limit. Right, but what I mean is that when we actually query the underlying partition, the slice filter count might be a lot more than what we care for (it could be Integer.MAX_VALUE if there wasn't any LIMIT on the statement in the first place) and if that's the case, we will read a lot more than we should. This will be only true for the first partition, because after that we will update the SliceQueryFilter at the end of the loop of {{CFS.filter()}}, but still, it's potentially inefficient for that that first partition and might even end up blowing up the heap if the partition is big, which defeats the purpose of paging. I'll note that provided we don't blow up the heap then the resultSet returned to the user will be fine since we'll trim it in SelectStatement, but it's still a bug (provided I'm not missing something). Anyway, this is unrelated to this issue, and even if we fix it we can make sure to never set that count to 1 to not break the fix of this issue. So +1 on the patch. > Multiple non-DISTINCT rows returned when page_size set > ------------------------------------------------------ > > Key: CASSANDRA-8087 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8087 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Adam Holmberg > Assignee: Tyler Hobbs > Priority: Minor > Fix For: 2.0.12 > > Attachments: 8087-2.0-v2.txt, 8087-2.0.txt > > > Using the following statements to reproduce: > {code} > CREATE TABLE test ( > k int, > p int, > s int static, > PRIMARY KEY (k, p) > ); > INSERT INTO test (k, p) VALUES (1, 1); > INSERT INTO test (k, p) VALUES (1, 2); > SELECT DISTINCT k, s FROM test ; > {code} > Native clients that set result_page_size in the query message receive > multiple non-distinct rows back (one per clustered value p in row k). > This is only reproduced on 2.0.10. Does not appear in 2.1.0 > It does not appear in cqlsh for 2.0.10 because thrift. > See https://datastax-oss.atlassian.net/browse/PYTHON-164 for background -- This message was sent by Atlassian JIRA (v6.3.4#6332)