[ https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023133#comment-17023133 ]
Blake Eggleston edited comment on CASSANDRA-15367 at 1/25/20 12:16 AM: ----------------------------------------------------------------------- Just so I understand, the scenario you’re describing could be described like this? * there are 2 tables, T1 & T2, writing against OpGroup 1 (OP[1]) * a write comes in for T1 and is assigned to op group 1 (W1[1]) * T2 starts to flush, OpGroup is bumped to OP[2]. T2 is now waiting on W1[1] * a write comes in for T1 and is assigned to op group 2 (W2[2]) * W2[2] acquires a lock, but is unable to allocate memory until T2 flushes * W1[1] is blocking the T2 flush, and is unable to acquire the lock * deadlock was (Author: bdeggleston): Just so I understand, the scenario you’re describing could be described like this? * there are 2 tables, T1 & T2, writing against OpGroup 1 (OP[1]) * a write comes in for T1 and is assigned to op group 1 (W1[1]) * T2 starts to flush, OpGroup is bumped to OP[2]. T2 is now waiting on W1[1] * a write comes in for T1 and is assigned to op group 2 (W2[2]) * W2[2] acquires a lock, but is unable to allocate memory until T2 flushes * W1[0] is blocking the T2 flush, and is unable to acquire the lock * deadlock > Memtable memory allocations may deadlock > ---------------------------------------- > > Key: CASSANDRA-15367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15367 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log, Local/Memtable > Reporter: Benedict Elliott Smith > Assignee: Benedict Elliott Smith > Priority: Normal > Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x > > > * Under heavy contention, we guard modifications to a partition with a mutex, > for the lifetime of the memtable. > * Memtables block for the completion of all {{OpOrder.Group}} started before > their flush began > * Memtables permit operations from this cohort to fall-through to the > following Memtable, in order to guarantee a precise commitLogUpperBound > * Memtable memory limits may be lifted for operations in the first cohort, > since they block flush (and hence block future memory allocation) > With very unfortunate scheduling > * A contended partition may rapidly escalate to a mutex > * The system may reach memory limits that prevent allocations for the new > Memtable’s cohort (C2) > * An operation from C2 may hold the mutex when this occurs > * Operations from a prior Memtable’s cohort (C1), for a contended partition, > may fall-through to the next Memtable > * The operations from C1 may execute after the above is encountered by those > from C2 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org