[ https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216645#comment-17216645 ]
Marcus Eriksson commented on CASSANDRA-15369: --------------------------------------------- Just one more question; Before, in [searchIterator()|https://github.com/apache/cassandra/blob/ccab496d2d37c86341d364dea6c27513fda27331/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java#L139] we returned EMPTY_STATIC_ROW, [now|https://github.com/apache/cassandra/pull/473/files#diff-6e27ca8fc225036969f774910f2142568fcf85ab588a100c5d3484ac412048f3R121] we return null, why is that? Don't think it makes a difference but would probably be good with a comment why we return null. > Fake row deletions and range tombstones, causing digest mismatch and sstable > growth > ----------------------------------------------------------------------------------- > > Key: CASSANDRA-15369 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15369 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/Memtable, Local/SSTable > Reporter: Benedict Elliott Smith > Assignee: Zhao Yang > Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta > > > As assessed in CASSANDRA-15363, we generate fake row deletions and fake > tombstone markers under various circumstances: > * If we perform a clustering key query (or select a compact column): > * Serving from a {{Memtable}}, we will generate fake row deletions > * Serving from an sstable, we will generate fake row tombstone markers > * If we perform a slice query, we will generate only fake row tombstone > markers for any range tombstone that begins or ends outside of the limit of > the requested slice > * If we perform a multi-slice or IN query, this will occur for each > slice/clustering > Unfortunately, these different behaviours can lead to very different data > stored in sstables until a full repair is run. When we read-repair, we only > send these fake deletions or range tombstones. A fake row deletion, > clustering RT and slice RT, each produces a different digest. So for each > single point lookup we can produce a digest mismatch twice, and until a full > repair is run we can encounter an unlimited number of digest mismatches > across different overlapping queries. > Relatedly, this seems a more problematic variant of our atomicity failures > caused by our monotonic reads, since RTs can have an atomic effect across (up > to) the entire partition, whereas the propagation may happen on an > arbitrarily small portion. If the RT exists on only one node, this could > plausibly lead to fairly problematic scenario if that node fails before the > range can be repaired. > At the very least, this behaviour can lead to an almost unlimited amount of > extraneous data being stored until the range is repaired and compaction > happens to overwrite the sub-range RTs and row deletions. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org