[ 
https://issues.apache.org/jira/browse/CASSANDRA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16226:
----------------------------------------
    Test and Documentation Plan: 
The patch includes a series of new tests in {{SSTablesIteratedTest}} that 
verify expected numbers of SSTables read for several query types across compact 
and non-compact tables. They should serve as reasonable documentation and 
guardrails against further regression.

The official [docs on compact 
storage|https://cassandra.apache.org/doc/latest/cql/appendices.html#appendix-c-dropping-compact-storage]
 and the in-tree docs (in {{ddl.rst}})will need some rework as well, both to 
indicate that it will live on in 4.0, and to take into account the concerns in 
this issue.

  was:
The patch includes a series of new tests in {{SSTablesIteratedTest}} that 
verify expected numbers of SSTables read for several query types across compact 
and non-compact tables. They should serve as reasonable documentation and 
guardrails against further regression.

The official [docs on compact 
storage|https://cassandra.apache.org/doc/latest/cql/appendices.html#appendix-c-dropping-compact-storage]
 will need some rework as well, both to indicate that it will live on in 4.0, 
and to take into account the concerns in this issue.


> COMPACT STORAGE SSTables created before 3.0 are not correctly skipped by 
> timestamp due to missing primary key liveness info
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16226
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16226
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Local Write-Read Paths
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>              Labels: perfomance, upgrade
>             Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> This was discovered while tracking down a spike in the number of  SSTables 
> per read for a COMPACT STORAGE table after a 2.1 -> 3.0 upgrade. Before 3.0, 
> there is no direct analog of 3.0's primary key liveness info. When we upgrade 
> 2.1 COMPACT STORAGE SSTables to the mf format, we simply don't write row 
> timestamps, even if the original mutations were INSERTs. On read, when we 
> look at SSTables in order from newest to oldest max timestamp, we expect to 
> have this primary key liveness information to determine whether we can skip 
> older SSTables after finding completely populated rows.
> ex. I have three SSTables in a COMPACT STORAGE table with max timestamps 
> 1000, 2000, and 3000. There are many rows in a particular partition, making 
> filtering on the min and max clustering effectively a no-op. All data is 
> inserted, and there are no partial updates. A fully specified row with 
> timestamp 2500 exists in the SSTable with a max timestamp of 3000. With a 
> proper row timestamp in hand, we can easily ignore the SSTables w/ max 
> timestamps of 1000 and 2000. Without it, we read 3 SSTables instead of 1, 
> which likely means a significant performance regression. 
> The following test illustrates this difference in behavior between 2.1 and 
> 3.0:
> https://github.com/maedhroz/cassandra/commit/84ce9242bedd735ca79d4f06007d127de6a82800
> A solution here might be as simple as having 
> {{SinglePartitionReadCommand#canRemoveRow()}} only inspect primary key 
> liveness information for non-compact/CQL tables. Tombstones seem to be 
> handled at a level above that anyway. (One potential problem with that is 
> whether or not the distinction will continue to exist in 4.0, and dropping 
> compact storage from a table doesn't magically make pk liveness information 
> appear.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to