[ 
https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836818#comment-13836818
 ] 

Benedict commented on CASSANDRA-5549:
-------------------------------------

Without switch lock, we won't have anything preventing writes coming through 
when we're over-burdened with memory use by memtables.

What I'd like to suggest is effectively a global Semaphore, with permits equal 
to the size allocated for memtables; on KS.apply(RM) we estimate the size of 
the RM and take that many permits. Once we've added the RM and know better how 
much it occupies, we adjust the Semaphore to (more) accurately reflect the 
amount of memory in use. When we flush a memtable we release permits equal to 
the *estimated size* of each RM.

This may be pushing the boat out, but would probably result in not relying on 
memtable live metering/scanning for size estimation, which we could retire. 
Either way we're estimating the size, but with this approach we're keeping 
*tight* control over the (estimated) memory allocated to memtables, whereas at 
the moment we have some tricks that we hope keep it there. If we estimate space 
used cautiously, we should be able to better guarantee no OOM, at least from 
this part of the code. 

I have a *reasonably* straight forward scheme for estimating size used by a RM 
that should be as good as we currently have. Basic premise is to calculate 
average space used by an item in ConcurrentSkipListMap using metering at 
startup with a map of size, say, 1M entries, rounded up. If we depend on 
CASSANDRA-6271 we can easily calculate exact overhead for the BTrees, or 
otherwise can do a similar metering approach for SnapTreeMap. So we have an 
overhead per row and per value. Separately we track how much space we are using 
for a given memtable's slab allocator. We use the RM's data size only for the 
initial estimation, to decide if we have room, and ignore it once it's actually 
added, as it will be accounted for in the slaballocator.



> Remove Table.switchLock
> -----------------------
>
>                 Key: CASSANDRA-5549
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 2.1
>
>         Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png
>
>
> As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write 
> path.  ReentrantReadWriteLock is not lightweight, even if there is no 
> contention per se between readers and writers of the lock (in Cassandra, 
> memtable updates and switches).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to