[ 
https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027080#comment-17027080
 ] 

Benedict Elliott Smith commented on CASSANDRA-15367:
----------------------------------------------------

bq.  but I’m not sure if it’s worth addressing

I don't think any deadlock is acceptable to ignore.  Hmm.  If we don't go with 
one of the other approaches I've suggested, I'll have to find some time in a 
week to see if there's a variant of this suggested approach that works in this 
respect.

bq. <random-idea>

I think this is something I have proposed before, but it's not trivial.  I had 
planned to implement something like this as part of my work addressing this 
problem, but decided not to given the complexity.  The idea would be to 
introduce a linked-list of deferred updates, and merge them either on future 
reads or writes, but ensuring everyone sees a consistent view with this 
approach, while minimising duplicated work and ensuring progress, is less 
trivial than I imagined when I proposed it a while ago.

bq. About removing the lock, I’m sure 15511 will help with contention, and we 
should commit it, however I think there will still be pathological cases where 
faster updates won’t be enough

We can benchmark this specific scenario, but all we really care about is if the 
aggregate behaviour for all 21 operations is good enough to warrant removal of 
the lock, and the commensurate reduction in complexity when reasoning about the 
system (that has been _amply_ demonstrated by this ticket).  IMO, the 
performance numbers from 15511 more than cross this threshold, but we can 
certainly explore further verification work to be certain.

> Memtable memory allocations may deadlock
> ----------------------------------------
>
>                 Key: CASSANDRA-15367
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15367
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log, Local/Memtable
>            Reporter: Benedict Elliott Smith
>            Assignee: Benedict Elliott Smith
>            Priority: Normal
>             Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> * Under heavy contention, we guard modifications to a partition with a mutex, 
> for the lifetime of the memtable.
> * Memtables block for the completion of all {{OpOrder.Group}} started before 
> their flush began
> * Memtables permit operations from this cohort to fall-through to the 
> following Memtable, in order to guarantee a precise commitLogUpperBound
> * Memtable memory limits may be lifted for operations in the first cohort, 
> since they block flush (and hence block future memory allocation)
> With very unfortunate scheduling
> * A contended partition may rapidly escalate to a mutex
> * The system may reach memory limits that prevent allocations for the new 
> Memtable’s cohort (C2) 
> * An operation from C2 may hold the mutex when this occurs
> * Operations from a prior Memtable’s cohort (C1), for a contended partition, 
> may fall-through to the next Memtable
> * The operations from C1 may execute after the above is encountered by those 
> from C2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to