[ https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023361#comment-17023361 ]
Benedict Elliott Smith edited comment on CASSANDRA-15367 at 1/25/20 1:00 AM: ----------------------------------------------------------------------------- bq. I don’t understand how that could be done without reintroducing the contention gc problem. Simply by making it fast enough. CASSANDRA-15511 manages to get the costs significantly lower than today, even with 16-threads actively spinning on a dual-socket 24-core machine. It manages to compete fairly well even against itself with locking enabled (although the lock variant does manage lower garbage, it isn't dramatic overall). The spreadsheet I posted compares a number of possible approaches, settings and workloads. Of course, there would still be the potential for some wasted work that could be usefully spent elsewhere, but I think the gains are minimal. There are also other options available to us for minimising contention besides a lock, that I've broached before (e.g. on failure to update, tag the write onto a linked-list and merge lazily on read, potentially attempting to apply the merge to the tree each time to reduce duplicated work). bq. either merge these two classes, or make one control the other It might well be possible to merge the management of these things, it's an interesting idea and something to consider. bq. Rough example with lazy naming here Thanks. I'll have to dig into that next week, failing to reckon with it right now. was (Author: benedict): bq. I don’t understand how that could be done without reintroducing the contention gc problem. Simply by making it fast enough. CASSANDRA-15511 manages to get the costs significantly lower than today, even with 16-threads actively spinning on a dual-socket 24-core machine. It manages to compete fairly well even against itself with locking enabled (although the lock variant does manage lower garbage, it isn't dramatic overall). The spreadsheet I posted compares a number of possible approaches, settings and workloads. Of course, there would still be the potential for some wasted work that could be usefully spent elsewhere, but I think the gains are minimal. There are also other options available to us for minimising contention besides a lock, that I've broached before (on failure to update, tag the write onto a linked-list and merge lazily on read, potentially attempting to apply the merge to the tree each time to reduce duplicated work). bq. either merge these two classes, or make one control the other It might well be possible to merge the management of these things, it's an interesting idea and something to consider. bq. Rough example with lazy naming here Thanks. I'll have to dig into that next week, failing to reckon with it right now. > Memtable memory allocations may deadlock > ---------------------------------------- > > Key: CASSANDRA-15367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15367 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log, Local/Memtable > Reporter: Benedict Elliott Smith > Assignee: Benedict Elliott Smith > Priority: Normal > Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x > > > * Under heavy contention, we guard modifications to a partition with a mutex, > for the lifetime of the memtable. > * Memtables block for the completion of all {{OpOrder.Group}} started before > their flush began > * Memtables permit operations from this cohort to fall-through to the > following Memtable, in order to guarantee a precise commitLogUpperBound > * Memtable memory limits may be lifted for operations in the first cohort, > since they block flush (and hence block future memory allocation) > With very unfortunate scheduling > * A contended partition may rapidly escalate to a mutex > * The system may reach memory limits that prevent allocations for the new > Memtable’s cohort (C2) > * An operation from C2 may hold the mutex when this occurs > * Operations from a prior Memtable’s cohort (C1), for a contended partition, > may fall-through to the next Memtable > * The operations from C1 may execute after the above is encountered by those > from C2 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org