[
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283315#comment-15283315
]
Benedict commented on CASSANDRA-9669:
-------------------------------------
I'm not sure if the behaviour for locking on the base table when flushing
indexes was present when I wrote this patch, but both that sync (in
SecondaryIndexManager), and this one (in truncation), are ill advised:
* Synchronizing on a base table for an action that is triggered by an acton
that itself must synchronize on the base table is asking for trouble
* Synchronizing in either case over a lengthy operation that may itself
synchronize is also asking for trouble
As it is, I threw the synchronized in there along with some other modifications
to make the truncation of views less obviously broken. I think it's probably
still broken, just less obviously so, so it may as well be taken out to make
this right.
> If sstable flushes complete out of order, on restart we can fail to replay
> necessary commit log records
> -------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-9669
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
> Project: Cassandra
> Issue Type: Bug
> Components: Local Write-Read Paths
> Reporter: Benedict
> Assignee: Benedict
> Priority: Critical
> Labels: correctness
> Fix For: 2.2.7, 3.7, 3.0.7
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order,
> on restart we simply take the maximum replay position of any sstable on disk,
> and ignore anything prior.
> It is quite possible for there to be two flushes triggered for a given table,
> and for the second to finish first by virtue of containing a much smaller
> quantity of live data (or perhaps the disk is just under less pressure). If
> we crash before the first sstable has been written, then on restart the data
> it would have represented will disappear, since we will not replay the CL
> records.
> This looks to be a bug present since time immemorial, and also seems pretty
> serious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)