[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887295#comment-13887295 ] Benedict commented on CASSANDRA-5549: - Awesome! I've fixed the formatting, added a couple of comments and opened a pull request. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887229#comment-13887229 ] Jonathan Ellis commented on CASSANDRA-5549: --- (Please submit a pull request for jamm-0.2.6. Note that jamm uses "normal" brace-on-same-line convention.) > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887225#comment-13887225 ] Jonathan Ellis commented on CASSANDRA-5549: --- Committed! > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886121#comment-13886121 ] Benedict commented on CASSANDRA-5549: - I've rebased from trunk and fixed the bugs and pushed (-f) to [merge-5549|https://github.com/belliottsmith/cassandra/tree/merge-5549]. There were two small issues: in RangeTombstoneList I was setting start[i] = null before I copied its value; and in FBUtilities.hashToBigInteger() my "equivalency" rewrite wasn't so equivalent after all, so I reverted it. Whoops. There are still three new tests failing, but this is because we fixed a bug and so DefsTest no longer times out; CASSANDRA-6636 addresses these new issues. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884283#comment-13884283 ] Jonathan Ellis commented on CASSANDRA-5549: --- (tried to squash only-5549 onto trunk to see if it was something that got fixed recently in trunk and got a zillion conflicts, so rebasing that would be useful in any case) > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884285#comment-13884285 ] Benedict commented on CASSANDRA-5549: - Right, I'll take a look first thing tomorrow. Don't trust myself to actually correct anything right now. I've got a squashed/rebased version that's only a week or so old, so I'll update that against your changes and upload. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884276#comment-13884276 ] Jonathan Ellis commented on CASSANDRA-5549: --- I'm seeing tests fail on both your branch and mine that do not fail in trunk, e.g. RangeTombstoneListTest. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882598#comment-13882598 ] Jonathan Ellis commented on CASSANDRA-5549: --- Fair enough. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882591#comment-13882591 ] Benedict commented on CASSANDRA-5549: - I'm not totally convinced this is better, but I think I can make it worth with the off-heap stuff. I think it will get a little ugly, as we'll need two allocators as well - we'll need some method in the AbstractAllocator interface to getOffHeapPartner() and getOnHeapPartner(), or something, so that Memtable can avoid special casing the OffHeapAllocator. But the OnHeapPartner won't actually support allocating BBs, it will just be for querying and updating amounts of memory. Not relishing it, but don't mind pushing it off until we have to address that problem. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882570#comment-13882570 ] Jonathan Ellis commented on CASSANDRA-5549: --- Okay, went with plan B of unifying MT/Pool and MO/Allocator. Pretty happy with how this worked out: https://github.com/jbellis/cassandra/commits/5549-3 In my mind, off heap stuff can now be added by creating an appropriate Allocator class, and a Pool implementation that owns two sub-pools. (I'm seeing a bunch of test failures but I don't think it's from this refactor.) > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881668#comment-13881668 ] Benedict commented on CASSANDRA-5549: - MemoryTracker -> PoolLedger MemoryOwner -> AllocatorLedger ? > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881666#comment-13881666 ] Jonathan Ellis commented on CASSANDRA-5549: --- bq. I can see your criticism of Pool being in fact two pools, but I think it can also be viewed as just one. It only ever allocates one kind of resource, but in some cases those resources may occupy memory both on and off heap. Well, the real point is that we have two things (Pool/Allocator, MT/MO) that operate on (connected on-off heap reservoirs / a single reservoir) and the names should reflect that relationship. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881626#comment-13881626 ] Benedict commented on CASSANDRA-5549: - bq. I think what I'm having a hard time with is, Pool should be a reservoir that we allocate out of and instead it's a container of two such reservoirs (MT) I can see your criticism of Pool being in fact two pools, but I think it can also be viewed as just one. It only ever allocates one kind of resource, but in some cases those resources may occupy memory both on and off heap. The confusing thing, perhaps, is that we overload the trackers/owners to track not only what the pool itself is using, but also what overheads we incur in using the pool from Memtable. This is a bit unclean conceptually, but much cleaner in implementation. I also think that renaming MT to Pool would be confusing. Certainly an OffHeapPool will legitimately _pool_ (and cache) the resources it allocates, so it could get confusing if we call the thing logging the amount the pool, and the thing actually pooling something else. MemoryOwner -> Allocator I'm very much a -1 on, as it really doesn't allocate anything. That said, I can see where you're coming from. Only, I have already introduced AllocatorGroup into my follow on patch, which I think probably makes LinkedAllocatorsGroup pretty confusing :-) I want to offer some alternatives, but I can't think of anything better. MemoryTracker -> Ledger? MemoryOwner -> Account? Pool -> Reservoir works for me > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881589#comment-13881589 ] Jonathan Ellis commented on CASSANDRA-5549: --- I kind of like the Linked prefix since we can apply that at the MO level too: MemoryTracker -> Pool Pool -> LinkedPools MemoryOwner -> Allocator PoolAllocator -> LinkedAllocators > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881588#comment-13881588 ] Jonathan Ellis commented on CASSANDRA-5549: --- Aside: we have one (two) MemoryOwner per Memtable right? I don't think it's worth obfuscating {{owns}} with an updater instead of just using AtomicLong. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881584#comment-13881584 ] Jonathan Ellis commented on CASSANDRA-5549: --- I think what I'm having a hard time with is, Pool should be a reservoir that we allocate out of and instead it's a container of two such reservoirs (MT). I propose renaming: MemoryTracker -> Pool Pool -> LinkedPools? DualPool? OnOffHeapPools? Reservoir? PoolPool? > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875379#comment-13875379 ] Benedict commented on CASSANDRA-5549: - Merged your changes, and pushed a version without NBQ et al. One thing I rolled back was your deletion of the commented out code. I uncommented it. That was a massive oversight of something I was using for stress testing the memory management stuff for slab allocators :-/ > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874878#comment-13874878 ] Jonathan Ellis commented on CASSANDRA-5549: --- Pushed seal back to issue and includes to isAfter. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874840#comment-13874840 ] Benedict commented on CASSANDRA-5549: - bq. The problem is that it's not "just" a memory fence, but is tied to the Group behavior. (Once issued, the old group is sealed and a new Group started.) True, but from the point of view of the Barrier API, that doesn't really seem like important information to me, although it is a necessity of function that it happens. I think it is important to convey, especially, that the Barrier itself doesn't really exist until you call this method, it's just a place holder, which seal() doesn't seem to. A few alternatives: fix(), impose(), slice(). Coming up a bit empty on really good alternatives, though, and probably not worth much more discussion. I am not completely against seal(), as the word sounds good when said with Barrier. So whatever works for you. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874831#comment-13874831 ] Jonathan Ellis commented on CASSANDRA-5549: --- bq. issue() is the normal parlance when talking about memory fences The problem is that it's not "just" a memory fence, but is tied to the Group behavior. (Once issued, the old group is sealed and a new Group started.) bq. isAfter()? WFM. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874824#comment-13874824 ] Benedict commented on CASSANDRA-5549: - issue() is the normal parlance when talking about memory fences, in my experience, which this is roughly a high level model of, so I would like to keep it. However we could go with partition()? Or I can wrack my brains some more... OpOrdering -> OpOrder is good. I'm definitely with you on Group as well, now. I'm not fond of includes(). Maybe isAfter()? > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874817#comment-13874817 ] Jonathan Ellis commented on CASSANDRA-5549: --- In the meantime, I've pushed a few more renames for your enjoyment. (OpOrdering -> OpOrder, issue -> seal, accept -> includes.) Of these, I am least fond of {{seal}} which I think is better than issue (which could mean a lot of different things including creation) but still not super explanatory withoutreading javadoc. Maybe {{sealGroup}} or even {{sealCurrentGroup}}? > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874815#comment-13874815 ] Benedict commented on CASSANDRA-5549: - bq. and we'll be good to go here. Awesome. bq. let's build this on top of CLQ for now I'll upload a patch later on today. Got errands to run and don't want to rush swapping it in WaitQueue, as it requires a slightly different approach. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874798#comment-13874798 ] Jonathan Ellis commented on CASSANDRA-5549: --- With an alternative identified for CASSANDRA-6557, let's build this on top of CLQ for now. I'll finish review of the Pool stuff and we'll be good to go here. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874786#comment-13874786 ] Benedict commented on CASSANDRA-5549: - I realised as I was going to sleep this morning (strangely the best time to realise you've made a mistake, I find) that my fix for the bugfix was itself busted. Re-uploaded minor tweak to the branch. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874262#comment-13874262 ] Benedict commented on CASSANDRA-5549: - FYI, on an unrelated note I've merged your changes and pushed an unrelated bug fix to [5549|https://github.com/belliottsmith/cassandra/tree/only-5549] > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874191#comment-13874191 ] Benedict commented on CASSANDRA-5549: - bq. When does CLQ block? Are you thinking of LBQ? It's double locked, so lock to read, lock to write. Also double-lock to Iterator remove. So by NonBlockingQueue, I mean WaitFreeQueue, not NoBlockingMethodQueue :-) bq. So if we can call NBQ an optimization If it weren't for WaitQueue I'd say it were just an optimisation, but it's likely to cause real wasted cycles. I've seen it happening, it's not a theoretical risk, the only question is how big the problem would be given the use case of WaitQueue and the answer is, I'm not sure. There's a reasonable chance it will only trigger wasted cycles infrequently, in which case we can address it later. CASSANDRA-6557 really needs it to fix the bug I spotted, though, so we'll have to put it in before release anyway. Not sure that buys you much respite! > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874178#comment-13874178 ] Jonathan Ellis commented on CASSANDRA-5549: --- When does CLQ block? Are you thinking of LBQ? As a progress update: I'm done with OpOrdering, CFS, Memtable, et al., and now looking at the plumbing (NBQ and the Allocator stuff). So if we can call NBQ an optimization and split it out, great. Otherwise, I don't see a good place to split things. :) > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874175#comment-13874175 ] Benedict commented on CASSANDRA-5549: - Maybe split : - WaitQueue, NBQ - BTree UpdateFunction - OpOrdering + CL utilisation - WaitQueue timing + CL timing - ObjectSizes + JAMM - RoW ? > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874168#comment-13874168 ] Benedict commented on CASSANDRA-5549: - bq. WaitQueue has changed enough that git no longer recognizes the new one as sharing history with the old one. Can you summarize? Sure: - It includes the aforementioned improvements with NBQ (including now being non-blocking, except in the case of signaller/waiter trampling on each other occasionally in pthread world). - It abstracts the Signal, so that we can have other implementations: wait for all, wait for any, for example. - It supports timing the amount of time spent waiting. I have toyed with splitting this out into another ticket, but I did it whilst I was going, as I thought having the amount of time spent waiting on Memtable (or CL) flush would be a helpful metric. That said, perhaps we could try splitting this whole thing into a few smaller tickets, make it more manageable for you to review. I kind of suspect the main body won't diminish a lot though, which is why I didn't in the first place. But every little probably helps :-) > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874158#comment-13874158 ] Benedict commented on CASSANDRA-5549: - bq. We have ANBQ -> NBQV -> NBQ, with no other implementations. NBQV and NBQ should be kept separate, as having two different queues modifying state could get ugly. They could strictly speaking probably be merged at the moment, but unpicking them in future might be difficult, and I'm not sure it's helpful. ABQ I use for OffHeap memtables, as I have a more memory efficient queue for storing references necessary for GC (at least until we move more off-heap). bq. What is the difference between NBQ and CLQ? I'd be inclined to introduce NBQ in a separate ticket. FIrstly, NBQ is non-blocking, CLQ is not. This is a nice property to have in and of itself, as we now have an almost totally non-blocking read and write path (actually codahale is now one of the only main blocking points, other than the obvious memtable/CL flushes), which maybe not immediately useful may allow us some neat optimisations in the near future. Secondly, NBQ provides some very useful methods, most specifically appendIfTail() and removeHeadIf(). Even though CLQ *could* offer these methods, it doesn't, and it makes certain things very difficult to write. These are mostly useful for the off-heap memtables, but I also use them for CASSANDRA-6557 to fix a potentially dangerous bug, and make the code easier to understand. Lastly, and the only reason to include it in _this_ ticket, is the snap() functionality that allows a persistent view of the queue that (mostly) does not change. In WaitQueue, since it is now used extensively in more highly concurrent places with spurious wake ups, this is really essential: waiting threads can (actually very easily) wake-up, find they have no work to do, and go back to sleep again _before the signalling thread has reached the end of the queue_, meaning they get woken up again... potentially repeatedly. This is especially a risk due to the use of pthread mutexes to implement LockSupport.park/unpark - whilst the signalling thread releases the waiting thread's mutex before waking it, if that thread is state changing at that time anyway, the signalling thread will wait (a full scheduler delay), and this can happen repeatedly. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874043#comment-13874043 ] Jonathan Ellis commented on CASSANDRA-5549: --- We have ANBQ -> NBQV -> NBQ, with no other implementations. Can we combine these? What is the difference between NBQ and CLQ? I'd be inclined to introduce NBQ in a separate ticket. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874005#comment-13874005 ] Jonathan Ellis commented on CASSANDRA-5549: --- WaitQueue has changed enough that git no longer recognizes the new one as sharing history with the old one. Can you summarize? > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873300#comment-13873300 ] Benedict commented on CASSANDRA-5549: - bq. Do we rely on accept returning true before issue, or can we make that IllegalState? It's a necessity of the semantics. For accept() to be properly synchronised with issue(), it needs to be called as soon as the Barrier is exposed, so that false can happen the instant issue() happens. Probably should have commented that better. The semantics of accept remain consistent, though - namely that it only returns true for any operations started before the (eventual) issue(). > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873026#comment-13873026 ] Jonathan Ellis commented on CASSANDRA-5549: --- Do we rely on accept returning true before issue, or can we make that IllegalState? > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871924#comment-13871924 ] Benedict commented on CASSANDRA-5549: - For accepts() to work correctly the Ordered on which it is called MUST be exposed atomically wrt changing the "current" Ordered/Group. This is at the very least difficult with the scheme you suggest, and I think probably impossible. At least, not without synchronising on the OpOrdered, and doing everything that Barrier does, in the caller. Note I have scrapped a lengthy message about how *expiring* atomically would also be ugly (and still would be necessary to some extent, making the code even uglier in the caller than it is in Barrier), as I hadn't yet realised that exposing atomically would be nearly impossible, which I hope underlines how difficult this is, and that encapsulating as much of it as possible in Barrier is a Good Thing. So that we only have to get it right the once. Stylistically I also think it is *much* clearer to separate the two concepts. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871641#comment-13871641 ] Jonathan Ellis commented on CASSANDRA-5549: --- Pushed more refactorage to my branch. I have a nagging feeling that OpOrdering could be done with two classes instead of three, merging Barrier and Ordered: {code} public void consume() { SharedState state = this.state; state.setReplacement(new State()) state.doSomethingToPrepareForBarrier(); state.opGroup = ordering.currentActiveGroup(); state.opGroup.expire() state.opGroup.await(); this.state = state.getReplacement(); state.doSomethingWithExclusiveAccess(); } public void produce() { Group opGroup = ordering.start(); try { state.doProduceWork(); } finally { opGroup.finishOne(); } } {code} (We could still provide an accepts() method for the benefit of getMemtableFor, but I don't see that requiring a 3rd class either.) > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867933#comment-13867933 ] Benedict commented on CASSANDRA-5549: - I've merged your changes and removed all of the mentioned unnecessary changes, as well as improved some javadoc, and fixed the bug in PostFlush. Same repository as before. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867399#comment-13867399 ] Benedict commented on CASSANDRA-5549: - bq. I get that, but it's good hygiene to keep things self contained (and I don't have to review it until then . Sure, and I shouldn't get into bad habits anyway. I was already feeling a guilty about this. I should have split the branch a long time ago and never got into this problem :-) bq. But if it's not critical then I'm definitely partial to this form as being easier to understand. It's definitely not, at least not yet. We allocate so much on the write path this will disappear in the noise. But it would be nice to be slowly moving towards a day when that isn't the case, and baby steps in the right direction help. bq. How good are you at reading JIT assembly? Not good enough, but this is probably a good opportunity to fix that a little. As much as it isn't super important right now, it's a useful thing in and of itself. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867396#comment-13867396 ] Jonathan Ellis commented on CASSANDRA-5549: --- bq. I should mention that I hope to introduce a patch next week that relies on most of these things we'll be stripping out I get that, but it's good hygiene to keep things self contained (and I don't have to review it until then :). bq. It's also not trivial to design a test that tells you for sure that it does/doesn't How good are you at reading JIT assembly? :) But if it's not critical then I'm definitely partial to this form as being easier to understand. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867373#comment-13867373 ] Benedict commented on CASSANDRA-5549: - As to your change at 6e029a3ee4943a30af078a5c2b6d483c348ffc1f, I'm not totally convinced it will remain stack allocated, due to the allocation being split into two locations, that would need to be coalesced into one by the compiler. It's possible it does this, but it's not obvious that it would. It's also not trivial to design a test that tells you for sure that it does/doesn't. I doubt it will make a huge difference to performance, especially at the moment, though. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867345#comment-13867345 ] Benedict commented on CASSANDRA-5549: - bq. Looks like the memtables.isEmpty path in Flush.run will result in PostFlush running with a null lastReplayPosition which will cause NPE. Damn. Should have been more careful with my ninja bugfix. bq. PoolAllocator.stateUpdater is unused. Should transition be CAS-ing? Used by OffHeapAllocator. Can remove it until then. bq. PoolAllocator.Gc is also unused. Let's take this out and add back in when needed. Makes sense bq. PoolAllocator.setDiscarding and writeBarrier.markBlocking are both commented as allowing allocations to exceed the memory limit. Only isBlocking actually appears to affect this. You're absolutely correct: PoolAllocator.setDiscarding() no longer does this. I narrowed the scope under which overshoot was permitted by introducing markBlocking, but apparently didn't keep the javadocs up to date. bq. There are several unused methods in OpOrdering. (If we're planning to use this in a future patch, let's also leave these out for now.) I am a little less comfortable about this, as the most important unused methods are safe*, which are somewhat interwoven with functionality. Although we can remove them relatively cleanly if you prefer. bq. Signal javadoc ends mid-sentence. Would be nice to have javadoc on all the methods too. Whoops. bq. Why wrap Pool constructor parameters in Setup object? Simpler to just inline the helper methods into the constructor. Do you mean to have protected methods we can call from within the constructor and overridden by the extending class? This is another case of not needed just yet, but if you want to provide a custom PoolCleaner or MemoryTracker, you need to construct them in the Pool constructor so that the object is fully constructed and can be passed to the PoolCleaner and MemoryTracker constructors. Can always introduce this in my next patch though. bq. Should we bother allowing Pool cleaner to be null? Good point. We could not. At the moment the only pools I intend to create without a cleaner in the near future, I probably intend to create "unbounded", so it actually should be fine (the cleaner will never be called). We can always introduce the checks later if we need them. I should mention that I hope to introduce a patch next week that relies on most of these things we'll be stripping out. But it shouldn't make a huge difference either way. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867306#comment-13867306 ] Jonathan Ellis commented on CASSANDRA-5549: --- Comments so far: Looks like the memtables.isEmpty path in Flush.run will result in PostFlush running with a null lastReplayPosition which will cause NPE. PoolAllocator.stateUpdater is unused. Should transition be CAS-ing? PoolAllocator.Gc is also unused. Let's take this out and add back in when needed. PoolAllocator.setDiscarding and writeBarrier.markBlocking are both commented as allowing allocations to exceed the memory limit. Only isBlocking actually appears to affect this. There are several unused methods in OpOrdering. (If we're planning to use this in a future patch, let's also leave these out for now.) Signal javadoc ends mid-sentence. Would be nice to have javadoc on all the methods too. Why wrap Pool constructor parameters in Setup object? Simpler to just inline the helper methods into the constructor. Should we bother allowing Pool cleaner to be null? OCD at https://github.com/jbellis/cassandra/commits/5549-2. 6e029a3ee4943a30af078a5c2b6d483c348ffc1f is the only non-cosmetic one. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864173#comment-13864173 ] Benedict commented on CASSANDRA-5549: - I just realised there was a fairly glaring bug with Flush/PostFlush. I was 'optimising' the path to discard CL segments when there were no 2is, but this was dangerous because of potential reordering of flushes (so a later flush could mark CLS unused when an earlier flush was still in progress). I've fixed and uploaded. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863148#comment-13863148 ] Benedict commented on CASSANDRA-5549: - Largely the same as before, the mutation thread blocks until enough memory becomes available to complete the request. The difference is, depending on where/why the breach occurs, it may not block until after it completes its modification (as some of the memory bookkeeping is batched to the end for ease and speed). Any call to MemoryOwner.allocate() is potentially a blocking call, if there isn't enough room available to satisfy the allocation. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863143#comment-13863143 ] Jonathan Ellis commented on CASSANDRA-5549: --- What happens now when I hit my memtable memory ceiling, before flush makes more room? > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861886#comment-13861886 ] Benedict commented on CASSANDRA-5549: - I have a patch available for this [here|https://github.com/belliottsmith/cassandra/tree/only-5549] I've been a little reticent to post it, as it's a bit of a monster of a patch, but I think I've now done my best to keep it well commented and mostly limit unnecessary changes. There are some changes that may appear over engineered for their current use, but I am using these in a continuation of this patch for off-heap memtables. I'll describe some of these below, but unpicking still useful changes seemed wasteful. If they get in the way of review we can revisit that decision. There are several main areas of updates: 1) Removal of switchLock itself: The main work here is actually in the OpOrdering synchronisation class. This class explains itself, so I won't go into detail here, but provides an easy mechanism for ensuring we can coordinate our updates to Memtables so that we know what CL position they contain data to, and to know when the memtable is safe to be written to disk. The actual flushing of the memtable has been refactored a little also, to keep ordering guarantees. 2) Allocators and Memory Management: by removing the switch lock, we get rid of our ability to control heap growth by row mutations. To fix this, I've created the concept of a PoolAllocator, with associated Pool that has fixed memory limits. Any allocation requires the pool to allot room from its limit to the allocator (this is dealt with by MemoryTracker and MemoryOwner). This required a lot of minor modifications all over the place, to make measurement of object sizes at modification time cheap and accurate. Mostly I've achieved this by modifying jamm - a new branch is [here|https://github.com/belliottsmith/jamm/tree/guess] so that it will always give us a useful answer. Wherever we used to be using ObjectSizes adhoc in a class (generally incorrectly it turns out, not unsurprisingly as the API isn't obvious) I now *always* call measure() on an instance of the object and store that in a static field, and use simpler methods for any dynamic space use. Worth noting: I've renamed IMeasureableMemory.memorySize() to excessHeapSize(), and I've modified (where applicable) its value to only count data we wouldn't otherwise be storing. This only makes a difference in a few places, but I think is an important distinction. This change also makes any limit on flush queue size irrelevant, so the metric we use for controlling flushing is instead a ratio of in-use-memory to memory-limit, ignoring any already flushing data, which once breached will trigger a flush of the largest CFS. 3) Some concurrency primitives: NonBlockingQueue (and related classes) and WaitQueue. NonBlockingQueue is used more extensively in the off heap changes, but I leave it in here because it improves WaitQueue a lot, and we rely on WaitQueue much more with the proliferation of the OpOrdering operations. It helps us move much closer to completely non-blocking read/write operations also. We also use it to get rid of the Thread.yield() in SlabAllocator. I've aimed to keep NBQ as simple as possible. 4) CommitLog has been updated to use OpOrdering, and also includes a bug fix. I considered splitting this into a separate ticket, but it's such a tiny proportion of the overall changes I'm not sure it warrants it. The bug fix we may want to split out if this takes a while to go through. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838802#comment-13838802 ] Benedict commented on CASSANDRA-5549: - For a given run of the JVM the overhead is constant for each type of object allocated, and the objects allocated can be predicted accurately given the number of columns we are storing. I've done object size measurement before :-) I don't see anything in CASSANDRA-4860 that is surprising, but perhaps I've missed something specific you're worrying about? In general it's dealing with "miscalculating" the portion of a ByteBuffer we're referencing. This is a concern for live bytes, not retained bytes, and my scheme outlined above was for retained bytes which is what we care about for memory constraints, but I will also be replacing the live bytes calculation, since it will be easy to do at the same time. But the same approach works, care is just needed. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838796#comment-13838796 ] Vijay commented on CASSANDRA-5549: -- Well it is not exactly a constant overhead you might want to look at o.a.c.u.ObjectSizes (CASSANDRA-4860)... > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838782#comment-13838782 ] Benedict commented on CASSANDRA-5549: - bq. how does the switch RW Lock to a kind of CAS operation change this schematics? In forceFlush() we obtain the writeLock, but do not relinquish it until we have successfully added to the flushWriters queue. The flushWriters queue length also configures how often we should flush, so that once it is full, we are effectively "out of memory". This is hardly a *precise* mechanism for memory control, but it is the one we currently use, and it definitely needs a replacement. bq. IMHO, that might not be good enough since Java's memory over head is not considered. And calculating the object size is not cheap either I don't see your concerns here? We can easily and cheaply calculate the costs - we precompute the overheads, and simply apply them on a per row and per key basis. The overheads are pretty fixed for both - for SnapTreeMap they're exactly the same for each key, and with CASSANDRA-6271 they are relatively easily computable (or simply countable, in log(32,N) time - probably I will opt for computing first, then counting after insertion to get exact amount used). Java's memory overhead is included in any calculation. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Benedict > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838627#comment-13838627 ] Vijay commented on CASSANDRA-5549: -- {quote} Without switch lock, we won't have anything preventing writes coming through when we're over-burdened with memory use by memtables. {quote} I should be missing something, how does the switch RW Lock to a kind of CAS operation change this schematics? Are we talking about additional requirement/enhancements to this ticket? {quote} When we flush a memtable we release permits equal to the estimated size of each RM {quote} IMHO, that might not be good enough since Java's memory over head is not considered. And calculating the object size is not cheap either > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Vijay > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836818#comment-13836818 ] Benedict commented on CASSANDRA-5549: - Without switch lock, we won't have anything preventing writes coming through when we're over-burdened with memory use by memtables. What I'd like to suggest is effectively a global Semaphore, with permits equal to the size allocated for memtables; on KS.apply(RM) we estimate the size of the RM and take that many permits. Once we've added the RM and know better how much it occupies, we adjust the Semaphore to (more) accurately reflect the amount of memory in use. When we flush a memtable we release permits equal to the *estimated size* of each RM. This may be pushing the boat out, but would probably result in not relying on memtable live metering/scanning for size estimation, which we could retire. Either way we're estimating the size, but with this approach we're keeping *tight* control over the (estimated) memory allocated to memtables, whereas at the moment we have some tricks that we hope keep it there. If we estimate space used cautiously, we should be able to better guarantee no OOM, at least from this part of the code. I have a *reasonably* straight forward scheme for estimating size used by a RM that should be as good as we currently have. Basic premise is to calculate average space used by an item in ConcurrentSkipListMap using metering at startup with a map of size, say, 1M entries, rounded up. If we depend on CASSANDRA-6271 we can easily calculate exact overhead for the BTrees, or otherwise can do a similar metering approach for SnapTreeMap. So we have an overhead per row and per value. Separately we track how much space we are using for a given memtable's slab allocator. We use the RM's data size only for the initial estimation, to decide if we have room, and ignore it once it's actually added, as it will be accounted for in the slaballocator. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Vijay > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835123#comment-13835123 ] Benedict commented on CASSANDRA-5549: - I suggest the following (somewhat complex seeming approach), building on my patch for 3578: We extract the CLS.Allocation object into a CommitState object that is allocated in CFS.apply() prior to performing CL.add(). The CLS.AppendLock object is rewritten to be a more custom job which we will for now call MutationBarrier, and extracted along with CommitState. MutationBarrier will be a synchronisation primitive that permits issuing periodic barriers that ensure operations started prior to the barrier have all completed, and also can be used to create a token on an about-to-be-issued barrier that ensures operations started after the barrier (when it *is* issued) know not to interfere with any state of an object that is using the barrier. This was probably unclear, but the steps of CFS.apply() using it may clarify it: # Allocate CommitState, register operation with MutationBarrier # CL.add() - on exit, has updated CommitState with position and segment of replay position.* # Checks Memtable's BarrierToken to see if we are permitted to modify: #* if it's absent we simply make our modification and scoot; #* if it's present and permits us to make our modification, we do so but ALSO update a ReplayPosition property (with cas, ensuring it is >= the one we have in CommitState) #* if it's present and does not permit us to modify, we look up its replacement Memtable and repeat # Release our hold on the MutationBarrier, signalling any waiters When we flush a memtable, we: # Set the ReplayPosition # Create the replacement Memtable and chain it from ourselves # Set the BarrierToken # Wait on the Barrier # Flush to disk, using the ReplayPosition we have maintained Note that we will no longer perform any reference counting on the Memtables. I will ensure that the mutation calls are all non-blocking, but may for correctness and simplicity make those attempting to flush/issue a new barrier take out a (possibly spin-) lock on the MB in order to issue a Token atomically. Thoughts? \*CL.sync() will use the MB to fulfil the AppendLock role also. CL.add() will release a hold on the MB that is related to CL only, to permit CL to proceed immediately. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Vijay > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781214#comment-13781214 ] Vijay commented on CASSANDRA-5549: -- Hi Ryan, Can you give a shot at https://github.com/Vijay2win/cassandra/commits/5549-v2 on 10M keys atleast. Rebased [~jbellis] branch I moved the CommitLogAllocator forceFlush back to separate thread, removed isDirty boolean since isClean is called in a separate thread and hence shouldn't help performance on writes, rest is all Jonathan... My benchmark on 32 physical core machine shows a better performance than earlier. ~72 vs ~84 http://pastebin.com/GRPMUcSB > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Vijay > Labels: performance > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750764#comment-13750764 ] Ryan McGuire commented on CASSANDRA-5549: - Looks pretty similar on the eight core machines: !5549-sunnyvale.png! [Data for this benchmark here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.remove_switchlock.sunnyvale.json&metric=interval_op_rate&operation=stress-write&smoothing=1] > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750213#comment-13750213 ] Ryan McGuire commented on CASSANDRA-5549: - Yes, this is on the quad cores. I'll try the other cluster. > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750203#comment-13750203 ] Jonathan Ellis commented on CASSANDRA-5549: --- Was this on the 4 core machines? Can you test on 8? > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750184#comment-13750184 ] Ryan McGuire commented on CASSANDRA-5549: - [~jbellis] I ran a benchmark against your branch comparing it to the prior commit. I didn't see any difference in write performance: !5549-removed-switchlock.png! [Data for this benchmark here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.removed_switchlock.json&metric=interval_op_rate&operation=stress-write&smoothing=1] > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis > Fix For: 2.1 > > Attachments: 5549-removed-switchlock.png > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651624#comment-13651624 ] Jonathan Ellis commented on CASSANDRA-5549: --- Rebased my switchLock-removal patch from CASSANDRA-5064. Sylvain's comments from then: # I see nothing that prevents flushing the same memtable multiple times. # getting the commit log context and switching the memtable is not done atomically with respect to writes. So a write can be pushed in the commit log after the context we're getting but still reach the memtable we're about to flush. For normal update, this is mostly inefficient in that we'll kept commit logs around long than necessary and potentially replay some update unnecessarily, but for counter this is a bug. # it's also possible that for postFlush tasks to not be scheduled in the order the commit log context were acquired. So we could discard a commitlog for which the data is not yet fully flushed. I haven't yet addressed these concerns; it's just a straight-up rebase so far. However, unit tests pass, so it's probably Good Enough to see what kind of performance impact this would have. /cc [~danielnorberg] > Remove Table.switchLock > --- > > Key: CASSANDRA-5549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis > Fix For: 2.0 > > > As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write > path. ReentrantReadWriteLock is not lightweight, even if there is no > contention per se between readers and writers of the lock (in Cassandra, > memtable updates and switches). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira