[ https://issues.apache.org/jira/browse/CASSANDRA-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113463#comment-14113463 ]
Sylvain Lebresne commented on CASSANDRA-7844: --------------------------------------------- tl;dr use {{DISTINCT}} ({{SELECT DISTINCT next_id FROM friends WHERE user='user1'}}). If you do a query like {{SELECT next_id FROM friends WHERE user='user1'}}, we will query the whole partition the same way we would do it if the partition key was the only thing selected (and you'll have to use {{DISTINCT}} in that case too to avoid it). See CASSANDRA-7305 for background. Now you may ask: ok, but I have a {{LIMIT 1}} in my query, so why does it scan the whole 6 tombstoned cells. Good question. That part lies in how we count row in {{SliceQueryFilter}}. Because cells are not grouped in CQL rows in the underlying storage engine, we have to decompose cell names and make sure we only count 1 for all the cells pertaining to the same CQL row. Since we only know that we've gathered all the cells of a particular CQL row when we get a cell from another CQL row, the stopping condition is {{columnCounter.live() > limit}}. In other words, with {{LIMIT 1}}, we stop as soon as we've seen the first cell of the 2nd live CQL row. But tombstones don't count towards the live count, so if you have 1 live CQL row following by scores of tombstones, we will read all the tombstones, because they are not "the first cell of the 2nd *live* row". I'll note that all this is not related to static columns in any way. It would be possible to change the logic to make a difference between "I've started counting the Xth row but there may still have more cells for that CQL row to add" and "I've counted X row but I've seen a tombstone that don't belong to the Xth row so if my limit is X I'm done". That said, it would probably be a bit ugly, and that would force us to decompose the cell names of tombstones which we currently don't do, so while it would improve what is a fairly degenerate case (remember that the solution to the example given in this ticket is to use {{DISTINCT}}), it would add a small overhead in general. Overall, I'm tempted to leave things as they are: we're hoping to make the storage engine a bit more aware of CQL rows in the future, and that will probably make that kind of things go away without requiring ugly special casing. > Fetching a single static column requires scanning to the first live CQL row > --------------------------------------------------------------------------- > > Key: CASSANDRA-7844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7844 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Nicolas Favre-Felix > Labels: perfomance > > Reading a single static column seems to do more work than needed, scanning > the partition until the first live CQL row before returning a value. > As I understand, static columns are stored separately from clustered rows > (see CASSANDRA-6956 for an issue that arised from this storage model). > Nevertheless, Cassandra doesn't optimize for the case where only static > columns and partition key dimensions are retrieved. > Selecting a static column on its own is possible: > {code} > > create table friends (user text, next_id int static, friend_id int, email > > text, primary key(user,friend_id)); > > insert into friends (user, next_id) values ('user1', 1); > > select * from friends where user = 'user1'; > user | friend_id | next_id | email > -------+-----------+---------+------- > user1 | null | 1 | null > (1 rows) > {code} > Let's insert and delete some clustered data: > {code} > > insert into friends (user, next_id, friend_id, email) values ('user1', 2, > > 1, 'abc@foo'); > > insert into friends (user, next_id, friend_id, email) values ('user1', 3, > > 2, 'def@foo'); > > insert into friends (user, next_id, friend_id, email) values ('user1', 4, > > 3, 'ghi@foo'); > > select * from friends where user = 'user1'; > user | friend_id | next_id | email > -------+-----------+---------+--------- > user1 | 1 | 4 | abc@foo > user1 | 2 | 4 | def@foo > user1 | 3 | 4 | ghi@foo > (3 rows) > > delete from friends where user = 'user1' and friend_id = 1; > > delete from friends where user = 'user1' and friend_id = 2; > > delete from friends where user = 'user1' and friend_id = 3; > {code} > And then fetch the static column again: > {code} > > TRACING ON > Now tracing requests. > > select next_id from friends where user = 'user1' limit 1; > next_id > --------- > 4 > (1 rows) > Tracing session: 597cc970-2e27-11e4-932f-c551d8e65d14 > activity | > timestamp | source | source_elapsed > ---------------------------------------------------------------------------+--------------+-----------+---------------- > execute_cql3_query | > 13:18:46,792 | 127.0.0.1 | 0 > Parsing SELECT next_id from friends where user = 'user1' LIMIT 1; | > 13:18:46,792 | 127.0.0.1 | 59 > Preparing statement | > 13:18:46,792 | 127.0.0.1 | 125 > Executing single-partition query on friends | > 13:18:46,792 | 127.0.0.1 | 357 > Acquiring sstable references | > 13:18:46,792 | 127.0.0.1 | 369 > Merging memtable tombstones | > 13:18:46,792 | 127.0.0.1 | 381 > Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones | > 13:18:46,792 | 127.0.0.1 | 445 > Merging data from memtables and 0 sstables | > 13:18:46,792 | 127.0.0.1 | 460 > Read 1 live and 6 tombstoned cells | > 13:18:46,792 | 127.0.0.1 | 504 > Request complete | > 13:18:46,792 | 127.0.0.1 | 711 > {code} > > > We went over tombstones instead of returning the static column immediately. > Is this possibly related to CASSANDRA-7085? -- This message was sent by Atlassian JIRA (v6.2#6252)