[ https://issues.apache.org/jira/browse/CASSANDRA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221542#comment-17221542 ]
Caleb Rackliffe edited comment on CASSANDRA-16226 at 10/27/20, 4:16 PM: ------------------------------------------------------------------------ Hi [~kornelpal]. Take a look at {{UpdateStatement#addUpdateForKey()}}... {noformat} // We update the row timestamp (ex-row marker) only on INSERT (#6782) // Further, COMPACT tables semantic differs from "CQL3" ones in that a row exists only if it has // a non-null column, so we don't want to set the row timestamp for them. if (type.isInsert() && cfm.isCQLTable()) params.addPrimaryKeyLivenessInfo(); {noformat} ...and {{LegacyLayout}}... {noformat} else if (column.isPrimaryKeyColumn() && metadata.isCQLTable()) {noformat} COMPACT tables will never have primary key liveness info, even if those tables are created in 3.0+, so running {{upgradesstables}} doesn't help (at least as far as I can tell). The patch I've posted simply restores the way this optimization worked for COMPACT tables before the 3.0 storage engine rewrite. was (Author: maedhroz): Hi [~kornelpal]. Take a look at {{UpdateStatement#addUpdateForKey()}}... {noformat} // We update the row timestamp (ex-row marker) only on INSERT (#6782) // Further, COMPACT tables semantic differs from "CQL3" ones in that a row exists only if it has // a non-null column, so we don't want to set the row timestamp for them. if (type.isInsert() && cfm.isCQLTable()) params.addPrimaryKeyLivenessInfo(); {noformat} COMPACT tables will never have primary key liveness info, even if those tables are created in 3.0+, so running {{upgradesstables}} doesn't help (at least as far as I can tell). The patch I've posted simply restores the way this optimization worked for COMPACT tables before the 3.0 storage engine rewrite. > COMPACT STORAGE SSTables created before 3.0 are not correctly skipped by > timestamp due to missing primary key liveness info > --------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-16226 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16226 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Reporter: Caleb Rackliffe > Assignee: Caleb Rackliffe > Priority: Normal > Labels: perfomance, upgrade > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > This was discovered while tracking down a spike in the number of SSTables > per read for a COMPACT STORAGE table after a 2.1 -> 3.0 upgrade. Before 3.0, > there is no direct analog of 3.0's primary key liveness info. When we upgrade > 2.1 COMPACT STORAGE SSTables to the mf format, we simply don't write row > timestamps, even if the original mutations were INSERTs. On read, when we > look at SSTables in order from newest to oldest max timestamp, we expect to > have this primary key liveness information to determine whether we can skip > older SSTables after finding completely populated rows. > ex. I have three SSTables in a COMPACT STORAGE table with max timestamps > 1000, 2000, and 3000. There are many rows in a particular partition, making > filtering on the min and max clustering effectively a no-op. All data is > inserted, and there are no partial updates. A fully specified row with > timestamp 2500 exists in the SSTable with a max timestamp of 3000. With a > proper row timestamp in hand, we can easily ignore the SSTables w/ max > timestamps of 1000 and 2000. Without it, we read 3 SSTables instead of 1, > which likely means a significant performance regression. > The following test illustrates this difference in behavior between 2.1 and > 3.0: > https://github.com/maedhroz/cassandra/commit/84ce9242bedd735ca79d4f06007d127de6a82800 > A solution here might be as simple as having > {{SinglePartitionReadCommand#canRemoveRow()}} only inspect primary key > liveness information for non-compact/CQL tables. Tombstones seem to be > handled at a level above that anyway. (One potential problem with that is > whether or not the distinction will continue to exist in 4.0, and dropping > compact storage from a table doesn't magically make pk liveness information > appear.) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org