[jira] [Commented] (CASSANDRA-8087) Multiple non-DISTINCT rows returned when page_size set

Sylvain Lebresne (JIRA) Fri, 19 Dec 2014 03:38:04 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253305#comment-14253305
 ]


Sylvain Lebresne commented on CASSANDRA-8087:
---------------------------------------------

bq. It then gets passed to CFS.makeExtendedFilter(), which makes an 
ExtendedFilter with maxResults as the limit.

Right, but what I mean is that when we actually query the underlying partition, 
the slice filter count might be a lot more than what we care for (it could be 
Integer.MAX_VALUE if there wasn't any LIMIT on the statement in the first 
place) and if that's the case, we will read a lot more than we should.  This 
will be only true for the first partition, because after that we will update 
the SliceQueryFilter at the end of the loop of {{CFS.filter()}}, but still, 
it's potentially inefficient for that that first partition and might even end 
up blowing up the heap if the partition is big, which defeats the purpose of 
paging. I'll note that provided we don't blow up the heap then the resultSet 
returned to the user will be fine since we'll trim it in SelectStatement, but 
it's still a bug (provided I'm not missing something).

Anyway, this is unrelated to this issue, and even if we fix it we can make sure 
to never set that count to 1 to not break the fix of this issue. So +1 on the 
patch.


> Multiple non-DISTINCT rows returned when page_size set
> ------------------------------------------------------
>
>                 Key: CASSANDRA-8087
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8087
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Adam Holmberg
>            Assignee: Tyler Hobbs
>            Priority: Minor
>             Fix For: 2.0.12
>
>         Attachments: 8087-2.0-v2.txt, 8087-2.0.txt
>
>
> Using the following statements to reproduce:
> {code}
> CREATE TABLE test (
>                 k int,
>                 p int,
>                 s int static,
>                 PRIMARY KEY (k, p)
>             );
> INSERT INTO test (k, p) VALUES (1, 1);
> INSERT INTO test (k, p) VALUES (1, 2);
> SELECT DISTINCT k, s FROM test ;
> {code}
> Native clients that set result_page_size in the query message receive 
> multiple non-distinct rows back (one per clustered value p in row k).
> This is only reproduced on 2.0.10. Does not appear in 2.1.0
> It does not appear in cqlsh for 2.0.10 because thrift.
> See https://datastax-oss.atlassian.net/browse/PYTHON-164 for background



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8087) Multiple non-DISTINCT rows returned when page_size set

Reply via email to