[ https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206597#comment-17206597 ]
Zhao Yang commented on CASSANDRA-15369: --------------------------------------- [~marcuse] do you still plan to review this ticket? should I find another reviewer? thanks.. > Fake row deletions and range tombstones, causing digest mismatch and sstable > growth > ----------------------------------------------------------------------------------- > > Key: CASSANDRA-15369 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15369 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/Memtable, Local/SSTable > Reporter: Benedict Elliott Smith > Assignee: Zhao Yang > Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta, 4.0-triage > > > As assessed in CASSANDRA-15363, we generate fake row deletions and fake > tombstone markers under various circumstances: > * If we perform a clustering key query (or select a compact column): > * Serving from a {{Memtable}}, we will generate fake row deletions > * Serving from an sstable, we will generate fake row tombstone markers > * If we perform a slice query, we will generate only fake row tombstone > markers for any range tombstone that begins or ends outside of the limit of > the requested slice > * If we perform a multi-slice or IN query, this will occur for each > slice/clustering > Unfortunately, these different behaviours can lead to very different data > stored in sstables until a full repair is run. When we read-repair, we only > send these fake deletions or range tombstones. A fake row deletion, > clustering RT and slice RT, each produces a different digest. So for each > single point lookup we can produce a digest mismatch twice, and until a full > repair is run we can encounter an unlimited number of digest mismatches > across different overlapping queries. > Relatedly, this seems a more problematic variant of our atomicity failures > caused by our monotonic reads, since RTs can have an atomic effect across (up > to) the entire partition, whereas the propagation may happen on an > arbitrarily small portion. If the RT exists on only one node, this could > plausibly lead to fairly problematic scenario if that node fails before the > range can be repaired. > At the very least, this behaviour can lead to an almost unlimited amount of > extraneous data being stored until the range is repaired and compaction > happens to overwrite the sub-range RTs and row deletions. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org