[ https://issues.apache.org/jira/browse/CASSANDRA-9028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381688#comment-14381688 ]
Sylvain Lebresne commented on CASSANDRA-9028: --------------------------------------------- Well, the trace does says that all sstables have been "touched" as you said, and they have, but "touching" a sstable is world away from reading the entire partition in memory. The reason your first query does "touch" 2 sstables is that the code does not know which sstable will have results for the query, how much it will have nor which results will sort first. This is not particularly abnormal, there is so much the storage engine can deduce without reading any data, but this doesn't change the fact that as little as possible is read in each sstable and we certainly don't retrieve entire partitions unless we have to. The reason the 2nd request actually only hit a single sstable is that this request is more restricted and the engine is able to use that additional restriction to eliminate one of the sstable. For completness sake, I'll note that there is actually some optimization we're contemplating in CASSANDRA-8180 to avoid "touching" sstables in some cases. This might or might not help your first query, I honestly haven't looked closely enough at the example to say. It won't make a terribly huge difference in any case. > Optimize LIMIT execution to mitigate need for a full partition scan > ------------------------------------------------------------------- > > Key: CASSANDRA-9028 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9028 > Project: Cassandra > Issue Type: Improvement > Components: API, Core > Reporter: jonathan lacefield > Attachments: Data.1.json, Data.2.json, Data.3.json, test.ddl, > tracing.out > > > Currently, a SELECT statement for a single Partition Key that contains a > LIMIT X clause will fetch an entire partition from a node and place the > partition into memory prior to applying the limit clause and returning > results to be served to the client via the coordinator. > This JIRA is to request an optimization for the CQL LIMIT clause to avoid the > entire partition retrieval step, and instead only retrieve the components to > satisfy the LIMIT condition. > Ideally, any LIMIT X would avoid the need to retrieve a full partition. This > may not be possible though. As a compromise, it would still be incredibly > beneficial if a LIMIT 1 clause could be optimized to only retrieve the > "latest" item. Ideally a LIMIT 1 would "operationally behave" the same way > as a Clustering Key WHERE clause where the "latest", i.e. LIMIT 1 field, col > value was specified. > We can supply some trace results to help show the difference between 2 > different queries that preform the same logical function if desired. > For example, a query that returns the latest value for a clustering col > where QUERY 1 uses a LIMIT 1 clause and QUERY 2 uses a WHERE <clustering col> > = <latest value> -- This message was sent by Atlassian JIRA (v6.3.4#6332)