[ 
https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023512#comment-17023512
 ] 

Benedict Elliott Smith commented on CASSANDRA-15367:
----------------------------------------------------

So, I think anyway it's unclear to me what the best approach is between these 
two, as one permits competition amongst operations permitted to grow memtable 
memory usage unboundedly, and the other potentially reduces parallelism 
briefly.  The number of parallelism reductions is limited to the number of 
memtable flushes on the system, which can actually be quite frequent, but 
probably not frequent enough to matter.  However the number of competing 
operations to create memory pressure on memtables is also relatively few.  But 
neither are desirable outcomes.

I wonder if a mixture of approach would be a good idea though.  Try to 
introduce your suggestion of ignoring the lock for operations that could 
deadlock (by potentially marking during the critical section, which is likely 
an imperceptible cost given the rate of barrier issue), so that we do not harm 
parallelism.  But also prevent operations from re-allocating memory into 
memtable space, as this memory cannot be reclaimed until a (slow) flush occurs, 
potentially harming node stability when we're past our memtable limit.

> Memtable memory allocations may deadlock
> ----------------------------------------
>
>                 Key: CASSANDRA-15367
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15367
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log, Local/Memtable
>            Reporter: Benedict Elliott Smith
>            Assignee: Benedict Elliott Smith
>            Priority: Normal
>             Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> * Under heavy contention, we guard modifications to a partition with a mutex, 
> for the lifetime of the memtable.
> * Memtables block for the completion of all {{OpOrder.Group}} started before 
> their flush began
> * Memtables permit operations from this cohort to fall-through to the 
> following Memtable, in order to guarantee a precise commitLogUpperBound
> * Memtable memory limits may be lifted for operations in the first cohort, 
> since they block flush (and hence block future memory allocation)
> With very unfortunate scheduling
> * A contended partition may rapidly escalate to a mutex
> * The system may reach memory limits that prevent allocations for the new 
> Memtable’s cohort (C2) 
> * An operation from C2 may hold the mutex when this occurs
> * Operations from a prior Memtable’s cohort (C1), for a contended partition, 
> may fall-through to the next Memtable
> * The operations from C1 may execute after the above is encountered by those 
> from C2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to