[ 
https://issues.apache.org/jira/browse/CASSANDRA-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971743#comment-16971743
 ] 

Benedict Elliott Smith commented on CASSANDRA-15368:
----------------------------------------------------

Hi [~dimitarndimitrov],

I think you have it the wrong wrong way around; in your parlance, we need:

* oldMemtable.accepts(<HW>) returns false
* oldMemtable.accepts(<LW>) returns false
* newMemtable.accepts(<HW>) returns true
* newMemtable.accepts(<LW>) returns true

If you look at the new documentation introduced in CASSANDRA-15367 
[here|https://github.com/belliottsmith/cassandra/commit/ed6adf5eabe62f8ce6a1341e0c5423ba53036197#diff-f0a15c3588b56c5ce53ece7c48e325b5R109],
 you'll see that there is a region at the start of all memtables where some 
records from the prior {{group}}, that may have arbitrarily delayed obtaining 
their {{ReplayPosition}}, are intermixed with those of the later group.  This 
region is essentially owned by both memtables, but only the later memtable 
invalidates the relevant commit log records.  The problem occurs if the earlier 
flush fails (and we do not terminate the process), _or_ if the process 
terminates with the later flush having completed (since we will use the 
start/end {{ReplayPosition}} associated with the sstable to invalidate the 
commit log in the same way).



> Failing to flush Memtable without terminating process results in permanent 
> data loss
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15368
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15368
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log, Local/Memtable
>            Reporter: Benedict Elliott Smith
>            Priority: Normal
>             Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> {{Memtable}} do not contain records that cover a precise contiguous range of 
> {{ReplayPosition}}, since there are only weak ordering constraints when 
> rolling over to a new {{Memtable}} - the last operations for the old 
> {{Memtable}} may obtain their {{ReplayPosition}} after the first operations 
> for the new {{Memtable}}.
> Unfortunately, we treat the {{Memtable}} range as contiguous, and invalidate 
> the entire range on flush.  Ordinarily we only invalidate records when all 
> prior {{Memtable}} have also successfully flushed.  However, in the event of 
> a flush that does not terminate the process (either because of disk failure 
> policy, or because it is a software error), the later flush is able to 
> invalidate the region of the commit log that includes records that should 
> have been flushed in the prior {{Memtable}}
> More problematically, this can also occur on restart without any associated 
> flush failure, as we use commit log boundaries written to our flushed 
> sstables to filter {{ReplayPosition}} on recovery, which is meant to 
> replicate our {{Memtable}} flush behaviour above.  However, we do not know 
> that earlier flushes have completed, and they may complete successfully 
> out-of-order.  So any flush that completes before the process terminates, but 
> began after another flush that _doesn’t_ complete before the process 
> terminates, has the potential to cause permanent data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to