[ https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027080#comment-17027080 ]
Benedict Elliott Smith commented on CASSANDRA-15367: ---------------------------------------------------- bq. but I’m not sure if it’s worth addressing I don't think any deadlock is acceptable to ignore. Hmm. If we don't go with one of the other approaches I've suggested, I'll have to find some time in a week to see if there's a variant of this suggested approach that works in this respect. bq. <random-idea> I think this is something I have proposed before, but it's not trivial. I had planned to implement something like this as part of my work addressing this problem, but decided not to given the complexity. The idea would be to introduce a linked-list of deferred updates, and merge them either on future reads or writes, but ensuring everyone sees a consistent view with this approach, while minimising duplicated work and ensuring progress, is less trivial than I imagined when I proposed it a while ago. bq. About removing the lock, I’m sure 15511 will help with contention, and we should commit it, however I think there will still be pathological cases where faster updates won’t be enough We can benchmark this specific scenario, but all we really care about is if the aggregate behaviour for all 21 operations is good enough to warrant removal of the lock, and the commensurate reduction in complexity when reasoning about the system (that has been _amply_ demonstrated by this ticket). IMO, the performance numbers from 15511 more than cross this threshold, but we can certainly explore further verification work to be certain. > Memtable memory allocations may deadlock > ---------------------------------------- > > Key: CASSANDRA-15367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15367 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log, Local/Memtable > Reporter: Benedict Elliott Smith > Assignee: Benedict Elliott Smith > Priority: Normal > Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x > > > * Under heavy contention, we guard modifications to a partition with a mutex, > for the lifetime of the memtable. > * Memtables block for the completion of all {{OpOrder.Group}} started before > their flush began > * Memtables permit operations from this cohort to fall-through to the > following Memtable, in order to guarantee a precise commitLogUpperBound > * Memtable memory limits may be lifted for operations in the first cohort, > since they block flush (and hence block future memory allocation) > With very unfortunate scheduling > * A contended partition may rapidly escalate to a mutex > * The system may reach memory limits that prevent allocations for the new > Memtable’s cohort (C2) > * An operation from C2 may hold the mutex when this occurs > * Operations from a prior Memtable’s cohort (C1), for a contended partition, > may fall-through to the next Memtable > * The operations from C1 may execute after the above is encountered by those > from C2 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org