[ 
https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055162#comment-17055162
 ] 

Benedict Elliott Smith commented on CASSANDRA-15369:
----------------------------------------------------

I think _probably_ it is preferable to generate fake row deletions where 
possible, since their semantics are much better than range tombstones.  If the 
user is lucky, they might never see a range tombstone.

Since it's anyway impossible today to deal with range tombstones, we need a 
separate effort there, and so it's probably reasonable to leave unsolved for 
now the cases that _require_ fake RTs.  We will either need to guarantee RTs 
are replicated as inserted (without any subdivisions we currently produce) or 
that they are only accounted for in digest via non-RT data (since otherwise 
there seems no possible way to ensure a consistent digest for a response).  
Either way, it's probably better to do our best to avoid the scenario 
altogether, and use row deletions wherever possible.

> Fake row deletions and range tombstones, causing digest mismatch and sstable 
> growth
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15369
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15369
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>            Reporter: Benedict Elliott Smith
>            Priority: Normal
>             Fix For: 4.0, 3.0.x, 3.11.x
>
>
> As assessed in CASSANDRA-15363, we generate fake row deletions and fake 
> tombstone markers under various circumstances:
>  * If we perform a clustering key query (or select a compact column):
>  * Serving from a {{Memtable}}, we will generate fake row deletions
>  * Serving from an sstable, we will generate fake row tombstone markers
>  * If we perform a slice query, we will generate only fake row tombstone 
> markers for any range tombstone that begins or ends outside of the limit of 
> the requested slice
>  * If we perform a multi-slice or IN query, this will occur for each 
> slice/clustering
> Unfortunately, these different behaviours can lead to very different data 
> stored in sstables until a full repair is run.  When we read-repair, we only 
> send these fake deletions or range tombstones.  A fake row deletion, 
> clustering RT and slice RT, each produces a different digest.  So for each 
> single point lookup we can produce a digest mismatch twice, and until a full 
> repair is run we can encounter an unlimited number of digest mismatches 
> across different overlapping queries.
> Relatedly, this seems a more problematic variant of our atomicity failures 
> caused by our monotonic reads, since RTs can have an atomic effect across (up 
> to) the entire partition, whereas the propagation may happen on an 
> arbitrarily small portion.  If the RT exists on only one node, this could 
> plausibly lead to fairly problematic scenario if that node fails before the 
> range can be repaired. 
> At the very least, this behaviour can lead to an almost unlimited amount of 
> extraneous data being stored until the range is repaired and compaction 
> happens to overwrite the sub-range RTs and row deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to