[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836818#comment-13836818 ]
Benedict commented on CASSANDRA-5549: ------------------------------------- Without switch lock, we won't have anything preventing writes coming through when we're over-burdened with memory use by memtables. What I'd like to suggest is effectively a global Semaphore, with permits equal to the size allocated for memtables; on KS.apply(RM) we estimate the size of the RM and take that many permits. Once we've added the RM and know better how much it occupies, we adjust the Semaphore to (more) accurately reflect the amount of memory in use. When we flush a memtable we release permits equal to the *estimated size* of each RM. This may be pushing the boat out, but would probably result in not relying on memtable live metering/scanning for size estimation, which we could retire. Either way we're estimating the size, but with this approach we're keeping *tight* control over the (estimated) memory allocated to memtables, whereas at the moment we have some tricks that we hope keep it there. If we estimate space used cautiously, we should be able to better guarantee no OOM, at least from this part of the code. I have a *reasonably* straight forward scheme for estimating size used by a RM that should be as good as we currently have. Basic premise is to calculate average space used by an item in ConcurrentSkipListMap using metering at startup with a map of size, say, 1M entries, rounded up. If we depend on CASSANDRA-6271 we can easily calculate exact overhead for the BTrees, or otherwise can do a similar metering approach for SnapTreeMap. So we have an overhead per row and per value. Separately we track how much space we are using for a given memtable's slab allocator. We use the RM's data size only for the initial estimation, to decide if we have room, and ignore it once it's actually added, as it will be accounted for in the slaballocator. > Remove Table.switchLock > ----------------------- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug > Reporter: Jonathan Ellis > Assignee: Vijay > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)