[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986642#comment-13986642 ] Michael Shuler commented on CASSANDRA-6694: --- Comparing the default 2.1 dtest runs with offheap dtest runs shows the same results: http://cassci.datastax.com/job/cassandra-2.1_dtest/132/ http://cassci.datastax.com/job/cassandra-2.1_offheap_dtest/4/ The 2.1 unit tests got a default config with this commit for memtable_allocation_type: offheap_objects and they appear pretty stable. (some new unit test errors have come up since CASSANDRA-6855, but no big regression appears to have come from this commit) Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 Attachments: 6694.fix1.txt The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984175#comment-13984175 ] Benedict commented on CASSANDRA-6694: - I've pushed another slightly tweaked [branch|https://github.com/belliottsmith/cassandra/tree/6694-final2]. This is taken from Pavel's branch, except I fixed two of the equals() tidies (they were missing outer parentheses) and reintroduced the _use_ of the MemtableAllocator in CFS.FlushLargestMemtable and CFS.logFlush for indexes, but only if there is an underlying CFS to the index (so no API change) I've also rebased to latest 2.1. [~slebresne] if you're happy with AbstractNativeCell can you commit when you get a chance? Thanks Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984208#comment-13984208 ] Benedict commented on CASSANDRA-6694: - Pushed one last change, based on comments offline by Sylvain, to construct the full text representation of the Identifier when we can't find it in CFMetaData. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983417#comment-13983417 ] Benedict commented on CASSANDRA-6694: - I've pushed a completed branch [here|https://github.com/belliottsmith/cassandra/tree/6694-reorg2] I've taken to completion your flattening of the PoolAllocator and DataAllocator hierarchies, implemented DecoratedKey, reintroduced the extra unit tests, fixed some bugs with the Cell hierarchy, slightly rejigged the data layout for native cell to simplify offset calculation and fixed a performance regression and the message digest optimisation. The only thing I haven't done is the refactors I would like to perform before we finally commit this, so as to make review easier for others. Note I'm still running dtests and doing some final vetting, but I wanted to post this message now as I reckon this version is most likely ready and this is somewhat time critical, and because I want to avoid any duplicated effort in getting a final patch together. I think I've addressed your concern's [~iamaleksey], however with the following notes: bq. getAllocator() doesn’t belong to SecondaryIndex, API-wise. CFS#logFlush() and CFS.FLCF#run() should just use SecondaryIndexManager#getIndexesNotBackedByCfs() and get their allocators directly instead of using SIM#getIndexes() and checking for null. This was a conscious decision to permit custom 2i use our allocators and count towards book keeping for memory utilisation. bq. Composite/CellName/CellNameType/etc#copy() all now have an extra CFMetaData argument, while only NativeCell really uses it. Can we isolate its usage to a NativeCell-specific methods and leave the rest alone? Not sure how we do that when either can be present when you want to perform these calls. Possible I'm missing something obvious though, so please do let me know :) Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983425#comment-13983425 ] Benedict commented on CASSANDRA-6694: - Oh, also, [~iamaleksey]: Your assertion about super columns and sparse composites appears to be broken by the CliTest somehow. I haven't investigated, but this is why I introduced that branch. I've stripped out the condition and just always take that branch if we fail to lookup in cfMetaData, to deal with names being dropped whilst we aren't expecting, so it no longer assumes this but also copes with it. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983666#comment-13983666 ] Aleksey Yeschenko commented on CASSANDRA-6694: -- bq. This was a conscious decision to permit custom 2i use our allocators and count towards book keeping for memory utilisation. I feel like you are lacking context here, wrt custom 2i actually are and what implementations we have. The ones that exist don't need it, so IMO this was a wrong decision, even if conscious. Have a look at DSE's Solr implementation if curious. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983670#comment-13983670 ] Benedict commented on CASSANDRA-6694: - I don't mind dropping it, but it seems a harmless addition for users who implement their own buffered writes for secondary indexes, so that they can consider the amount of data they are using for their own state when deciding which CFs to flush. The fact that DSE doesn't do this doesn't mean it isn't useful. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983675#comment-13983675 ] Aleksey Yeschenko commented on CASSANDRA-6694: -- Sure, but I prefer to add stuff when it's clear that there is someone who actually needs it, and not some hypothetical user that doesn't exist. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983708#comment-13983708 ] Benedict commented on CASSANDRA-6694: - To make shipping easier, I've pushed a rebased and squashed branch [here|https://github.com/belliottsmith/cassandra/tree/6694-reorg2-rebase] Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983709#comment-13983709 ] Pavel Yaskevich commented on CASSANDRA-6694: Here is a [new branch|https://github.com/xedin/cassandra/compare/6694-final] with all of the changes from 6694-reorg2 squashed and added a couple of commits to cleanup and remove secondary index getAllocator which is unnecessary right now. I was about to push similar refactoring for memtable pools to my branch, which made review much faster :) I'm +1 on combination of the squashed changes and cleanup which is in my branch, still not sure about CellName implementation in AbstractNativeCell tho, that is not my realm, so it would be nice if Sylvain (or somebody as close to that code, if anybody) could take a look... Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980847#comment-13980847 ] Benedict commented on CASSANDRA-6694: - On the whole it looks good, but I have the following comments/concerns: # DecoratedKey still isn't implemented (should be a relatively minor addition) # The performance regression for MessageDigest updating is still there # AbstractCell.localCopy(..MemtableAllocator) needs to be overridden; as it is you'll always get a regular Cell back # You're still using static method implementations, it looks like? Cell.diff and Cell.reconcile # I'm not a fan of mixing the util.memory hierarchy with knowledge of the memtable hierarchy. If we plan on this, I'd much prefer to move the whole lot into e.g. db.memtable; this might make most sense anyway # I'd like to move the Cell implementations out of db into something (e.g. .memtable) as it's very crowded in there, and they're a dozen or so related classes that are easily extracted # Given how many different kinds of allocator we now have (including IAllocator), I'd really like to rename AbstractAllocator to something more descriptive like ByteBufferAllocator Still need to verify all of the changes within the Cells, as comparison is currently tricky due to different hierarchy confusing git, but on the whole this branch is good if we address these concerns. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981434#comment-13981434 ] Pavel Yaskevich commented on CASSANDRA-6694: 1-2, 5-7 I will address once the main functionality is settled. bq. AbstractCell.localCopy(..MemtableAllocator) needs to be overridden; as it is you'll always get a regular Cell back Good catch, I forgot to change that before I pushed. Have amended it to the original allocator commit and force pushed to my branch, so it's available. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980585#comment-13980585 ] Pavel Yaskevich commented on CASSANDRA-6694: I have pushed allocation pools and minor refactoring to [my branch|https://github.com/xedin/cassandra/compare/CASSANDRA-6694], also addressed some of the problems from [~iamaleksey]'s comment expect concerns about CellName implementation for AbstractNativeCell which are mutual. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1394#comment-1394 ] Aleksey Yeschenko commented on CASSANDRA-6694: -- h3. Benedict’s original branch ABTC.ColumnUpdater#apply() calls update.reconcile(existing) and skips localCopy() if reconciled == existing. This means that we should optimise all reconcile() implementations to prioritise the argument cell in case of ties for optimal savings (and ties happen often enough, from retries and whatnot + potentially counter updates if we decide to do that thing when batch commit log is enabled). Currently we do the opposite. Would be easiest to simply swap the call to existing.reconcile(update). getAllocator() doesn’t belong to SecondaryIndex, API-wise. CFS#logFlush() and CFS.FLCF#run() should just use SecondaryIndexManager#getIndexesNotBackedByCfs() and get their allocators directly instead of using SIM#getIndexes() and checking for null. Composite/CellName/CellNameType/etc#copy() all now have an extra CFMetaData argument, while only NativeCell really uses it. Can we isolate its usage to a NativeCell-specific methods and leave the rest alone? At least NativeCell#cql3ColumnName() can throw NPE when calling metadata.getColumnDefinition(buffer).name. Just because it’s SIMPLE_SPARSE doesn’t mean all the column names are predefined - it’s legal to insert non-predefined cells w/ default_validator validator via Thrift/CQL2. NativeCell#copy(), COMPOUND_SPARSE branch - there is no way a compound sparse comparator and cfType = Super can coexist. Supers are all compound dense. Generally, NativeCell methods seem to assume a bit too much about the sizes and about what can and what can’t be present/absent. You can even guarantee presence of a ColumnIdentifier for COMPOUND_SPARSE, and yet NativeCell#copy() would throw an AssertionError is that’s the case. And CFMetaData is mutable, too, and it is possible to remove a column via ALTER TABLE at any time. I’m not comfortable +1-ing it until Sylvain has a look at at least these bits (just the NativeCell methods). Allocator hierarchy is confusing - I won’t claim having understood it entirely, as are the names there. ‘Data’ prefix in DataAllocator is absolutely meaningless in the context. Maybe MemtableAllocator would be more meaningful? Don’t have suggestions for the rest of the names and for making that hierarchy more straightforward, but I can live with it as it is. I very much dislike the Impl thing though. This is an uncomfortable step back in Cell* hierarchy readability. Basic things like using IDEA’s Find Usages on Cell.Impl#localCopy() not showing Counter/Expiring/Deleted counterparts’ usage are annoying. This is my largest, and, really the only fundamental issue with the branch. Other than that, and too many assumptions in certain NativeCell methods, I’m okay with the branch. Overall it looks reasonable, and is actually less invasive than I was afraid it would be. Nits: AbstractMemory formatting is all messed up. h3. Pavel’s refactoring branch Doesn’t build (although trivial-ish to make it build) and is incomplete (as expected), and that does complicate judging the ugliness of the result. Same issues and potential issues in AbstractNativeCells methods as in NativeCell methods in the other branch. Can’t form an opinion on Pavel’s Allocator/Pool approach, because it’s not here yet, and I’m not sure I got it right from just reading the comments. This *Cell hierarchy, though, I feel a lot more comfortable with. I feel strongly that we should borrow the Impl-less Cell hierarchy from this branch, if nothing else (and there isn’t much else yet) - this is my biggest issue with the original. As for the rest of it - the time is running low, we have to ship 2.1 eventually. Any chance you could flesh it out in the next few days, maybe until Monday, Pavel? If not, I’m not sure if we should block beta2 further :\ Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead,
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975540#comment-13975540 ] Benedict commented on CASSANDRA-6694: - [~iamaleksey] have you had a chance to take a look and form an opinion? I'm happy to proceed with either approach, but we want to get a move on with one or the other if we intend to include this in 2.1. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975677#comment-13975677 ] Aleksey Yeschenko commented on CASSANDRA-6694: -- [~benedict] It's on the top of my TODO list - so I'm looking at your branch now. Haven't looked at Pavel's yet. Need a few days, unless something more urgent distracts me (which is very unlikely). Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972524#comment-13972524 ] Benedict commented on CASSANDRA-6694: - So, on the whole I really don't perceive this approach as better: there's a great deal of code duplication now (set to get worse still when you finish the refactor for DecoratedKey), between each of the correspondingly named cell implementations. Personally I think the Impl approach is neater as a result of avoiding that (this may be more pronounced if we decide to optimise equals() is you suggested). That said, if this moves us forwards I can live with it, if you can address point 1 below. There are a few problems though: # I am *very* opposed to a public setPeer() method. This is a deal breaker for me - but it can be avoided with a bit more refactoring. # Your optimised updateDigest function is actually much slower than the old implementation for all but the smallest values: an optimised version needs to batch the contents into an array (stored in a ThreadLocal) and call updateDigest with the array, unless the total size is very small (there's a crossover point on my laptop of about 12 bytes, under which it's faster to call update(byte)). # AbstractNativeCell.getBytes actually calls setBytes # excessHeapSize... should be unsharedHeapSize... # There should be no hashCode method in Buffer\*Cell - I removed these for a reason. Because we can have a Cell that is a CellName, and vice-versa, using a Cell as a key for a map is likely dangerous. Since we don't do it anywhere, it's safe to simply remove the methods. There may be other minor issues, I'll hold off giving it a formal review until we decide the direction we're going. To respond to a few of your comments: bq. CounterUpdateCell interface is missing as well as NativeCounterUpdateCell implementation to match it. There shouldn't be one for the time being - we can never construct one. bq. CounterUpdateCell should be BufferCounterUpdateCell as it extends BufferCell Same reason - it doesn't exist as either or, so I made a conscious decision to leave it as a CounterUpdateCell: the fact that it extends BufferCell is kind of unimportant. It's purpose is somewhat different, and I think it is better left named CounterUpdateCell, as that is its purpose (to carry a counter update as far as the memtable, and no further). bq. Impl classes extends another Impl classes which doesn't make much sense as all of the methods are static. This brings in the namespace of the extended class' static methods, which is useful. bq. When taken out of context like that it doesn't really make sense but what I meant, there are situation where we don't really need to get BB from the CellName but can transfer bytes directly (especially for the native cell implementations). Sure, but again: scope of ticket, and care needs to be taken when doing this (e.g. your updateDigest modifications) Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972776#comment-13972776 ] Pavel Yaskevich commented on CASSANDRA-6694: To address all of your comments this is not intended for any kind of review yet, it is just an idea demonstration that's why I basically carried over all of the methods from original implementations, didn't rename or move stuff. Also I'm fine if methods in both implementations are going to return constant values like serializationFlags or isMarkedForDeleted, a part from that there is not much of the code duplication, duplication is also going to be minimized when hashCode and other methods go away, which would probably only leave us with dataSize and serializedSize duplication but I guess we can come up with something clever for native cells there too. Regarding the point about updateDigest - it's meant more like representation of kind of things we can do if we have two different implementations of it, not optimized for performance yet. bq. There shouldn't be one for the time being - we can never construct one. and bq. Same reason - it doesn't exist as either or, so I made a conscious decision to leave it as a CounterUpdateCell: the fact that it extends BufferCell is kind of unimportant. It's purpose is somewhat different, and I think it is better left named CounterUpdateCell, as that is its purpose (to carry a counter update as far as the memtable, and no further). It is constructed in ColumnFamily and ColumnSerializer. If it's supposed to be only one implementation for now let's name it appropriately and use like all other buffered cells. bq. This brings in the namespace of the extended class' static methods, which is useful. By why do we care and what does it give us as those interfaces are called directly and static methods don't override each other? bq. Sure, but again: scope of ticket, and care needs to be taken when doing this (e.g. your updateDigest modifications) I don't really follow what are you implying with that, the scope is introduce native implementations as optimized as possible so why do we miss out of such low hanging fruit?... Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972796#comment-13972796 ] Benedict commented on CASSANDRA-6694: - bq. the scope is introduce native implementations -as optimized as possible- Otherwise we need to do a lot more than the changes you are suggesting :) bq. Also I'm fine if methods in both implementations are going to return constant values like serializationFlags or isMarkedForDeleted Well, these are still duplication - it is not clear as a result where the definition of these behaviours live. If the semantics change in future, it may introduce errors unnecessarily. Either way equals(), reconcile() and validateFields() will still be issues. You don't seem to have implemented most of these methods yet (looks like your code doesn't actually compile). These methods are each non-trivial amounts of code duplication, equals() especially so is we optimise it as you want to. CounterCell.diff() will also need to be duplicated. But, like I said, I can probably live with all of this if we address the setPeer() issue. equals() should probably still end up in a shared static method, at the very least, though. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972832#comment-13972832 ] Marcus Eriksson commented on CASSANDRA-6694: I'm +1 on [~benedict]s branch (have not looked at the one by [~xedin] yet) nits; * A few methods in Cell.Impl look redundant, isMarkedForDelete/isLive for example, kept around for symmetry? * License header in DeletedCell and ExpiringCell * Javadoc comment in NativeAllocator looks wrong Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973220#comment-13973220 ] Pavel Yaskevich commented on CASSANDRA-6694: bq. Well, these are still duplication - it is not clear as a result where the definition of these behaviours live. If the semantics change in future, it may introduce errors unnecessarily. Either way equals(), reconcile() and validateFields() will still be issues. You don't seem to have implemented most of these methods yet (looks like your code doesn't actually compile). These methods are each non-trivial amounts of code duplication, equals() especially so is we optimise it as you want to. CounterCell.diff() will also need to be duplicated. Most of the duplicated methods are methods with static behavior which is not going to change e.g. isMarkedForDelete, getMarkedForDeleteAt or serializationFlags. CounterCell.diff and reconcile are living in the interface for now. I will address setPeer(long) problem and hashCode. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973226#comment-13973226 ] Benedict commented on CASSANDRA-6694: - bq. CounterCell.diff and reconcile are living in the interface for now Ah. This is a Java 8 only feature, which is why I missed it. Not really feasible. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973297#comment-13973297 ] Pavel Yaskevich commented on CASSANDRA-6694: I'm not talking about default methods in interfaces, I'm just saying that I added static diff/reconcile to CounterCell for now :) Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973395#comment-13973395 ] Aleksey Yeschenko commented on CASSANDRA-6694: -- bq. It's purpose is somewhat different, and I think it is better left named CounterUpdateCell, as that is its purpose (to carry a counter update as far as the memtable, and no further). FWIW it doesn't even make it to a memtable in 2.1, ever. That said, not calling it BufferCounterUpdateCell would be bothering my consistency OCD, a lot, and I'm not done with counters until 3.0. Can you make my OCD a tiny favor and call it consistently with the other implementations? (: Thanks. bq. There should be no hashCode method in Buffer*Cell - I removed these for a reason. Because we can have a Cell that is a CellName, and vice-versa, using a Cell as a key for a map is likely dangerous. Since we don't do it anywhere, it's safe to simply remove the methods. Maybe we should just throw UnsupportedOperationException then, but leave the methods? I agree that using Cell-s as keys is very unlikely, but stuff like this has bitten us before. Haven't read either branch yet, but planning to soon, just wanted to jump at the opportunity to bikeshed a bit. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973405#comment-13973405 ] Benedict commented on CASSANDRA-6694: - bq. Can you make my OCD a tiny favor and call it consistently with the other implementations? (: Thanks. Sure. I have a preference to keep it that way, but not a strong one. bq. Maybe we should just throw UnsupportedOperationException then, but leave the methods? I agree that using Cell-s as keys is very unlikely, but stuff like this has bitten us before. Also sure. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973435#comment-13973435 ] Pavel Yaskevich commented on CASSANDRA-6694: Regarding, the hashCode that's what we do, I do it in AbstractCell now, Benedict does it in both BufferCell and NativeCell. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973534#comment-13973534 ] Pavel Yaskevich commented on CASSANDRA-6694: Ok, hashCode and setPeer changes are now pushed to the same branch, AbstractNativeCell is independent of NativeAllocation now because NativeAllocator returns aligned peer directly, which allows peer field to be made final in AbstractNativeCell. Also I have pushed set/get logic for data size associated with the pointer to the NativeAllocator as it's basically it's metadata, IMO it's a bit cleaner comparing to how that is done in Benedict's branch where NativeAllocation tracks pointer alignment to size (internalPeer() { return peer + 4; }) but NativeAllocator takes care of allocating 4 additional bytes to requested size. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973538#comment-13973538 ] Benedict commented on CASSANDRA-6694: - I don't think this is the right approach: with the changes we are making, we are pretty much precluding doing anything fancy with GC (we'll have to rely on malloc for now). As such the size is no longer providing any useful book keeping information to the NativeAllocator. It should be dealt with entirely in the AbstractNativeCell - its concept of size is entirely unique to it for now. This also, separately, makes packing structs of NativeCell a lot more straight forward. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973571#comment-13973571 ] Pavel Yaskevich commented on CASSANDRA-6694: I just don't like that in NativeAllocation we assume that NativeAllocator has reserved 4 bytes for us. So I decided to put everything into NativeAllocator and only return useful space so we don't have to + 4 every time we need a peer. It could be done in AbstractNativeCell which would allocate size + 4 or it could be done in NativeAllocator and it would tell how big allocation was based on the area pointer that it returned (which is was NativeAllocator.getDataSize(areaPointer) does) on demand, either of those places (AbstractNativeCell or NativeAllocator) works for me. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973582#comment-13973582 ] Benedict commented on CASSANDRA-6694: - The only reason it was happening in NativeAllocator was to support moving the peer around (so you need to know how much memory you're copying). NativeAllocation assuming it has (i.e. _being defined as having_) a size prefix is fine when it is tightly coupled with NativeAllocator (like it is in my branch) - but once you have it as a final field in another object, NativeAllocator should simply have no say in the matter. It never needs to know the size of the allocation, so we should just redefine what our AbstractNativeCell considers to be its size in its sizeOf() calculation. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973595#comment-13973595 ] Pavel Yaskevich commented on CASSANDRA-6694: Sure, if you like that better I will change that right away, anyhow if we need it in allocator for some reason we can change it. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973618#comment-13973618 ] Pavel Yaskevich commented on CASSANDRA-6694: Done, I have force pushed to my branch, now AbstractNativeCell is handling size, NativeAllocator has nothing to do with it. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973633#comment-13973633 ] Benedict commented on CASSANDRA-6694: - Thanks. Although it looks like you haven't updated any of the offsets to work with the new layout? As to the other changes you've made: I do not like the pollution of PoolAllocator with supportsNative(). Since this branch is supposed to be pushing idiomatic Java usage, let's stick to using interfaces for specialisation since we can. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973645#comment-13973645 ] Pavel Yaskevich commented on CASSANDRA-6694: Why it does - internalPeer does + 4 and internalSize does - 4 when all get/set methods use internalPeer() + offset. Regarding (and I was waiting for that) supportsNative() and allocateNative - I did that because I don't want to put time into adding DataAllocator and DataPool interfaces that your code has just yet, once it's decided which way we want to go I will remove allocateNative and do proper work there. This still intended as just an idea presentation for how to handle Cell without Impl classes. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973647#comment-13973647 ] Benedict commented on CASSANDRA-6694: - bq. This still intended as just an idea presentation for how to handle Cell without Impl classes. OK, cool. Glad we're staying on topic :) bq. Why it does - internalPeer does + 4 and internalSize does - 4 My mistake. I was expecting to see the static OFFSET fields updated - we should probably optimise that before we finish up (now that we can), but obviously fine for now. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970487#comment-13970487 ] Pavel Yaskevich commented on CASSANDRA-6694: Also it seems like for some of the methods e.g. updateDigest, delta, dataSize, diff, reconcile, hashCode etc. it would be much better to have native implementations which work with underlying bytes directly from day one. Some of them, for example, use value().remaining(), value().compareTo(), value().duplicate(), or name.toByteBuffer() convert data from one representation to another for no real reason, so we can actually end up generating a lot more temporary objects then we anticipate. There is another concern related to value() method which converts pointer to DirectBuffer, the problem is that (at least in OpenJDK and I think Oracle done the same) initialization of that class is synchronized and creates PhantomReference, which with most collectors only be purged by Full GC. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970603#comment-13970603 ] Benedict commented on CASSANDRA-6694: - bq. for now we are allocating BufferCounterCell which allows as to use CounterCell.Impl.reconcile for both implementations We only allocate a new object in the case that the reconcile result isn't one or the other of the original inputs. This object is only incredibly short lived, and we decided it was easier than passing through the allocator for reconcile. This may be slightly worse than we'd like as a result of the different cellname layouts, but that can be smoothed over with time. It's cleaner in the Memtable +ABC code to keep the pool allocation separate from this reconcile, and it's small fry compared to the other stuff we're doing on write of counters. bq. it would be much better to have native implementations which work with underlying bytes directly from day one Agreed that some of these would be nice, however we rarely (if ever) call these, and as per your comments wrt zero-copy (CASSANDRA-6842), if we aren't worried about copying the contents, we shouldn't be worried about allocating temporary objects. Now, there are some methods I would say would be nice to have native implementations of sooner than later (e.g. updateDigest), but I don't think they're by any means _essential_. What is going to be *far* more impactful is CASSANDRA-6755, as this has a reasonably large negative impact on name lookups (and to a lesser degree slicing) from a memtable record. That said, some of them would be quite easy to implement. So I'm not totally opposed to delivering them from day 1, I just wanted this patch set to be clearly readable and well contained. It's pretty big as it is. I think it would be nice to put any of these optimisations in a second ticket. If you're suggesting we drop the Impl hierarchy entirely from this patchset and just duplicate the methods and optimise, I can maybe get behind that. However optimising reconcile() and equals() gets ugly quickly if you want to be able to deal with either side of the equation being one or the other (often we'll reconcile different kinds, but not always). So we still need a shared implementation, but one that is capable of detecting the kind of Cell on each side, and selects the correct version of the method. Leaving very few methods we'll be optimising in Native*Cell, so most of the code will be duplicated unnecessarily if we take that route. But I can live with that if the reviewers can all live with the increased size of the patchset. bq. name.toByteBuffer() convert data from one representation to another for no real reason Why do you say no real reason? This is the serialization format, so we have to convert to it. That's the definition of what toByteBuffer() should return. We only call it when writing to disk or to the network, and is no different from the original implementation in that regard. That's not to say with time we cannot change this, but there's not much we can do yet. bq. There is another concern related to ... DirectBuffer ... initialization of that class is synchronized and creates PhantomReference I construct it using unsafe, which skips all constructors. So there is no synchronization or PhantomReference creation. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972205#comment-13972205 ] Pavel Yaskevich commented on CASSANDRA-6694: So here is the [branch|https://github.com/xedin/cassandra/compare/CASSANDRA-6694] which implements my idea of how to get rid of the Impl classes for Cell (+ does optimized updateDigest for both Cell implementations and couple of other things), I left DecoratedKey alone for now, work not fully complete yet but only couple on nit things are missing - I need to change couple of places to use CFMetaData and clone native cells so I decided not to do it if we are not going to go with that code. Regarding [~benedict]'s reorg branch I found couple of problems: # internalGetLong(long, long) is actually meant to be internalSetLong(long, long) in AbstractMemory; # CounterUpdateCell should be BufferCounterUpdateCell as it extends BufferCell # CounterUpdateCell interface is missing as well as NativeCounterUpdateCell implementation to match it. bq. Why do you say no real reason? This is the serialization format, so we have to convert to it. That's the definition of what toByteBuffer() should return. We only call it when writing to disk or to the network, and is no different from the original implementation in that regard. That's not to say with time we cannot change this, but there's not much we can do yet. When taken out of context like that it doesn't really make sense but what I meant, there are situation where we don't really need to get BB from the CellName but can transfer bytes directly (especially for the native cell implementations). bq. I construct it using unsafe, which skips all constructors. So there is no synchronization or PhantomReference creation. Right, we should be good there, my bad. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970462#comment-13970462 ] Pavel Yaskevich commented on CASSANDRA-6694: [~benedict] While working on trying to avoid usage of Impl classes and looking closer at the code I have a question, which knowing that future is going to be totally off-heap makes sense to ask now: current Native*Cell classes re-use Impl code from static implementations of interfaces but some of the methods e.g. reconcile for Counter(Update)Cell in certain conditions need to generate a new object (for now we are allocating BufferCounterCell which allows as to use CounterCell.Impl.reconcile for both implementations), do you have an action plan regarding required changes in that regard for the next step in this series when we are not going to copy things back to heap? Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965066#comment-13965066 ] Pavel Yaskevich commented on CASSANDRA-6694: [~jbellis] I will leave this a alone if you and others are fine with maintaing the code as it is in the patch set. Discussion I'm trying to have, and I presume others are interested too, centered around the question - if there is a better (cleaner if you will) way to organize Cell to avoid unnecessary field allocation as well as keeping us from introduction of static Impl classes with only static methods inside that extend each other, I still don't understand why we would extend one class, that has only static methods, from another with the same method layout (e.g. DeletedCell.Impl extends Cell.Impl) which results in bigger constants pool per class and has byte code implications that I have previously described. From my point of view, it looks like we are basically trying to re-build inside of Cassandra what JVM already provides as a platform. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965465#comment-13965465 ] Jonathan Ellis commented on CASSANDRA-6694: --- bq. why we can't have a simple implementation of the cell which has one buffer + metadata about component sizes (which could also be encoded) instead of having buffer per component in the name (if composite) + buffer for value + long timestamp I think this is the key question so I want to back out of the Imple rabbit hole for a minute to address that. This would absolutely simplify things a great deal in terms of the Allocator design. The problem is that it has a much bigger impact on the rest of the code, and the consensus from the last ticket was, We want to have off-heap as an option, but we want the default to stay on-heap and change as little as possible. So, I agree that what you are saying is cleaner but I think we should push it out to 3.0 given the constraints for 2.1. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965481#comment-13965481 ] Jonathan Ellis commented on CASSANDRA-6694: --- bq. Can we decide if we actually want to have Cell (and derivatives) as this patch set proposes (with static Impl static classes which is OOP unfriendly to say the least) or do something else (question raised back in CASSANDRA-6689)? If we accept the NativeCell/BufferCell distinction above, then the combination of optimization and lack of multiple inheritance drives this design or something like it. Specifically, we want NativeCell to be both a Cell and a NativeAllocation, so Benedict has (reasonably, IMO) chosen to extend NA and leave the Cell common methods in a utility Impl class. (IMO the right OOP approach would be to extend Cell, making it an Abstract class instead of an Interface, and have NativeCell have a NA as a field instead of extending it. But then we're increasing the memory overhead of a NC by almost 50% which directly impacts our main goal here.) I can see reasonable alternatives to where exactly the static utility methods live: put them in the BufferCell classes and have the Native classes reuse them that way, or put them in a separate class entirely, and I'm okay with either of those options but I don't really see them as strictly better than the Impl choice (which has the advantage of encapsulating what interface specifically they deal with, distinct from the Buffer or Native subclasses). Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965487#comment-13965487 ] Jonathan Ellis commented on CASSANDRA-6694: --- bq. Is it essential to move everything to the separate package .data ? If I may bikeshed a bit, data is a fairly meaningless term in the Cassandra context and I would prefer to name it cells instead. Otherwise, I think it's a reasonable refactor. My initial reaction was, moving things to different packages should totally be a separate commit but the new interfaces don't share a whole lot with the old classes other than the name. So even that doesn't really bother me, but if Pavel or Marcus still want that to facilitate review then it's a reasonable request. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965493#comment-13965493 ] Benedict commented on CASSANDRA-6694: - I agree data is a bit meaningless - and, in fact, I started with cells. But it includes DecoratedKey / RowPosition, so data became the easiest most encompassing term. More than open to better suggestions. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965502#comment-13965502 ] Jonathan Ellis commented on CASSANDRA-6694: --- Simple solution: leave DK and RP where they are. :) Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965562#comment-13965562 ] Benedict commented on CASSANDRA-6694: - Well, the only fly in that ointment is that they have Buffer and Native implementations also, and the DataAllocator allocates them as well as cells. So to separate them seems a bit strange - but I'm not too fussed tbh. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963907#comment-13963907 ] Pavel Yaskevich commented on CASSANDRA-6694: bq. I'm saying performance critical code is impacted when you have virtual method calls that cannot be optimised by the VM (i.e. those with multiple implementations). I meant CASSANDRA-6553 and CASSANDRA-6934 Which means that if we actually optimize AbstractType and derivatives to work directly with underlying bytes whole problem could be resolved? That's why I want to understand why we can't have a simple implementation of the cell which has one buffer + metadata about component sizes (which could also be encoded) instead of having buffer per component in the name (if composite) + buffer for value + long timestamp? Maybe it would be easier to offload all of the work to AbstractType instead of trying to optimize on the Cell level? I went through JVM instruction set doc (specifically http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.invokespecial and http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.invokestatic) those methods are not that different and both have to do lookup in the constant_pool of that class so I'm wondering if it's virtual calls that create a problem or it's something else masked by that... It also looks like if we use static Impl scheme (like in the patch set) would execute the same amount of instructions because compiler emits *aload_0* (this) in both cases before would it be invoke\{special, virtual\} or invokestatic, and more instructions in static Impl form if we use something else instead of this. Generally when callers use methods from super class or interface (as it is right now for e.g. Cell.dataSize()) compiler would emit *aload_0, invokevirtual #offset* directly to the Cell method, where with static Impl it has to that multiple times *aload_0, invokevirtual #offset* (to the method in DeleteCell.dataSize() and then internally *aload_0, invokestatic #offset* (to the DeletedCell.Impl.dataSize()) which means longer constant_pool walk. bq. Then what exactly do we win? We still have to have two hierarchies and the same modularization. Also the potential ease of optimizations for comparison disappear, and we still have increased indirection and virtual method call costs. If this is the suggestion, I am very -1, as the payoff is very small, the work nontrivial and the negatives substantial. The wins are, primarily, less object overhead (ultimate goal of all this) and maintainability of the code. We basically have Cell based on type - expired, deleted, counter, client (the last one being used mostly by Thrift) as it is right now, so no Buffered* or Native* plus allocators of 3 types (maybe we actually don't need one which allocates DirectBuffer but can just go with JNA backed one) which allocate raw bytes. Cell reconcile, equals, dataSize and other methods become straight-forward. Also, as we consider Composite as a complete entity, storing components as contiguous blocks would reduce container overhead + speeds up comparisons by exploiting spatial locality. [~jbellis] mentioned this My preferred solution would be, stop extracting the name so often by itself. Spot checking the code, it seems we usually do this just to simplify a comparison, so this could in principle just be done with the Cell object rather than just the name. I think that would would further benefit the approach that I'm describing. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963923#comment-13963923 ] Benedict commented on CASSANDRA-6694: - bq. less object overhead There is no reduced overhead from the current patch. bq. Also, as we consider Composite as a complete entity, storing components as contiguous blocks would reduce container overhead + speeds up comparisons by exploiting spatial locality You seem to be backtracking to the prior suggestion of only one implementation. I am potentially ok with this, but see my prior comment for concerns and complications. The -1 was to having what we have now except with an extra level of indirection (i.e. one packed Cell implementation, and one componentised like we had before this patch). Also, I would prefer to avoid the extra indirection +virtual method costs of having another inner object representation, within which we need another offset. The JVM instruction set is besides the point. The point is what hotspot will do: with a single implementor or static method of small enough bytecode representation, it will be inlined. Note I said multiple implementation virtual method. With the option you suggest we will need an extra virtual invocation cost with every access to the underlying bytes, some extra math to access the right location, and one extra object field reference to locate the position we're offsetting from. These costs mount up rapidly. Hmm. No, I now note your client implementation: what exactly is this one? Please clarify, as the thrift cell is going to need to be compared with the other implementations, and suddenly much of any benefit will disappear. The best way to make comparisons cheap and easy is to have both sides of the comparison have at least the same layout. If we have to either virtual invoke or instanceof check for every comparison, and a different code path for comparing each type of representation, there will be a performance impact. As such the only main benefit of this approach is eliminated in my eyes. Also, how will this client implementation achieve its various functions, and define its type? Seems like you'll need a duplicate hierarchy still. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964564#comment-13964564 ] Pavel Yaskevich commented on CASSANDRA-6694: bq. The JVM instruction set is besides the point. The point is what hotspot will do: with a single implementor or static method of small enough bytecode representation, it will be inlined. Note I said multiple implementation virtual method. With the option you suggest we will need an extra virtual invocation cost with every access to the underlying bytes, some extra math to access the right location, and one extra object field reference to locate the position we're offsetting from. These costs mount up rapidly. How is that besides the point when you claim that method calls with multiple implementations are slower than (and not getting inlined) static method invocations from multiple classes basically constant_pool reimplementation in your code?... What I claim is that it doesn't matter if you override a method multiple times or call a static method which calls another static method like your patch does for DeletedCell e.g. \{Native, Buffer\}DeletedCell.cellDataSize() which calls DeletedCell.Impl.cellDataSize(this) which transfers to Cell.Impl.cellDataSize(this); Just make an example disassemble classes (with javap -c or similar) and see what bytecode did it generate. Also for inlining problem I would like to see the proof of reason why are those methods are not getting inlined (are they even touched by JIT?) by enabling logging with -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining and sharing the output, otherwise multiple implementation virtual method being slow claim is just empty rhetoric. bq. Hmm. No, I now note your client implementation: what exactly is this one? Please clarify, as the thrift cell is going to need to be compared with the other implementations, and suddenly much of any benefit will disappear. The best way to make comparisons cheap and easy is to have both sides of the comparison have at least the same layout. If we have to either virtual invoke or instanceof check for every comparison, and a different code path for comparing each type of representation, there will be a performance impact. As such the only main benefit of this approach is eliminated in my eyes. Also, how will this client implementation achieve its various functions, and define its type? Seems like you'll need a duplicate hierarchy still. What was just a suggestion for temp container in between client transport and memtable, as those buffers are already allocated separately by thrift it seems reasonable to have Cell work with those buffers, it would take more memory for ByteBuffer containers passed from Thrift but cell comparison logic should not change because as they would operate on the common container type, it's similar contept to what Netty does with ByteBuf gathered from other ByteBuf pieces. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962658#comment-13962658 ] Benedict commented on CASSANDRA-6694: - bq. Can we decide if we actually want to have Cell (and derivatives) as this patch set proposes (with static Impl static classes which is OOP unfriendly to say the least) or do something else (question raised back in CASSANDRA-6689)? Can we have something more concrete than something else as a suggestion? bq. Is it essential to move everything to the separate package .data ? No refactoring is essential - however it is much cleaner given all of the new classes. bq. Maybe there is a way which allows us to still have key/value/timestamp as fields, so we should only change callers method/class signatures instead? In general the idea would be to keep a single implementation of the Cell and add a generic placeholder instead of ByteBuffer. This seems to miss the entire purpose of this patch, which is to reduce the heap consumption of each Cell. If we use another placeholder, we will no doubt only *increase* the memory consumption, not decrease it; or, at best, reduce it only fractionally for off-heap and increase it for on-heap implementations. Neither are really acceptable, and would make this whole patch a bit worthless. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962679#comment-13962679 ] Pavel Yaskevich commented on CASSANDRA-6694: If I had something more concrete you would see a patch for it, but here I am trying to start a discussion, I think [~jbellis] mentioned that it might be better to reduce usage of the column names instead of merging cell with column name (if I remember correctly). Regarding the moving stuff around - if it's not essential then we can do it at the very last stage once we done with all more important changes which are plenty. Regarding placeholders idea, if we allocate contiguous region for the whole cell we can just have memory object + 1 int (or was it even short?...) field which marks the end of the column name at that buffer, as column timestamp is a fixed size long we know exactly where column value ends, that also helps with spatial locality in most of the cases. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962691#comment-13962691 ] Benedict commented on CASSANDRA-6694: - bq. I think Jonathan Ellis mentioned that it might be better to reduce usage of the column names instead of merging cell with column name (if I remember correctly) I don't recall this suggestion. Perhaps you are referring to the suggestion that we not extract the cell names from the cell as often as we do, for the purpose of comparison, in order to reduce garbage production? bq. Regarding placeholders idea, if we allocate contiguous region for the whole cell we can just have memory object + 1 int (or was it even short?...) field which marks the end of the column name at that buffer, as column timestamp is a fixed size long we know exactly where column value ends, that also helps with spatial locality in most of the In this case, this suggestion has much more complex problems: # More (multiple implementation) virtual method invocations (as shown by CASSANDRA-6993 this can have meaningfully negative performance implications) # Major refactor of AbstractType hierarchy to prevent bytebuffer allocation on comparison # More object allocation in the request threads due to having to re-pack all of any parameters into a Cell with a single buffer, as opposed to just dropping them in place # At which point it would make most sense to refactor (and mostly eliminate) the entirety of CASSANDRA-5417, as we're almost always pumping the result straight into a Cell anyway, so extracting the components into separate buffers and repacking them into a single buffer in the Cell is very wasteful That said, it is *viable*. It has some advantages too: the comparisons between Native and Buffer cells are much more easily optimised. Many of these changes may well need to happen in the natural course of things anyway as we optimise the native implementation. But it has comparatively wide-ranging implications for the current on-heap use case that might be a bit too much to bite off right now. bq. if it's not essential then we can do it at the very last stage once we done with all more important changes which are plenty I disagree. It makes the patch more complicated to *not* move it around. Because something is not essential does not mean it is not the better option Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963063#comment-13963063 ] Jonathan Ellis commented on CASSANDRA-6694: --- bq. It makes the patch more complicated to not move it around. I thought Pavel was referring to moving existing classes into new packages, but I may be mistaken. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963100#comment-13963100 ] Benedict commented on CASSANDRA-6694: - Yes, but I've gutted those classes into interfaces and introduced new sister classes. And modelling those (in my head, at least) is very difficult when it's not easy to see what classes relate to each other at a glance, and the db package is overpopulated as it is. So anything I need to do to make it possible to think about whilst writing it, I assume is going to be helpful for anybody else reading it. But if you both want to change that bit, I can rebase again. Since I've written it, it doesn't matter so much to me now. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963531#comment-13963531 ] Pavel Yaskevich commented on CASSANDRA-6694: bq. More (multiple implementation) virtual method invocations (as shown by CASSANDRA-6993 this can have meaningfully negative performance implications) I'm getting mixed signals here, are you claiming that JVM does a bad job or OOP is broken in general? Also CASSANDRA-6993 seems to point to a different problem. bq. Major refactor of AbstractType hierarchy to prevent bytebuffer allocation on comparison I don't see a problem with this if it spares most of the changes in allocators and Cell*/DecoratedKey rewrites. bq. More object allocation in the request threads due to having to re-pack all of any parameters into a Cell with a single buffer, as opposed to just dropping them in place We can have a Cell separate implementation with multiple buffers as Thrift allocates them anyway which we are going to be transformed to linear ones once they get into memtable as we have to reallocate there. bq. At which point it would make most sense to refactor (and mostly eliminate) the entirety of CASSANDRA-5417, as we're almost always pumping the result straight into a Cell anyway, so extracting the components into separate buffers and repacking them into a single buffer in the Cell is very wasteful Is it good or bad to refactor it? [~slebresne]/[~jbellis] WDYT? Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963619#comment-13963619 ] Benedict commented on CASSANDRA-6694: - bq. I'm getting mixed signals here, are you claiming that JVM does a bad job or OOP is broken in general? Also CASSANDRA-6993 seems to point to a different problem. I'm saying performance critical code is impacted when you have virtual method calls that cannot be optimised by the VM (i.e. those with multiple implementations). I meant CASSANDRA-6553 and CASSANDRA-6934 bq. We can have a Cell separate implementation with multiple buffers as Thrift allocates them anyway which we are going to be transformed to linear ones once they get into memtable as we have to reallocate there. Then what exactly do we win? We still have to have two hierarchies and the same modularisation. Also the potential ease of optimisations for comparison disappear, and we still have increased indirection and virtual method call costs. If this is the suggestion, I am very -1, as the payoff is very small, the work nontrivial and the negatives substantial. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962580#comment-13962580 ] Pavel Yaskevich commented on CASSANDRA-6694: I looked through the code and here is the couple of questions that I have/had: # Can we decide if we actually want to have Cell (and derivatives) as this patch set proposes (with static Impl static classes which is OOP unfriendly to say the least) or do something else (question raised back in CASSANDRA-6689)? # Is it essential to move everything to the separate package .data ? # Do we want to have a number of pool/allocator implementations which allocate different type of objects or is it possible to make a generic container (for ByteBuffer/Memory) which would basically be a pointer to a bigger buffer that holds all of the components (name/value/timestamp) so we can have limited number of allocators/pools to maintain ([~jbellis] described the same vision in one of his previous comments)... Maybe there is a way which allows us to still have key/value/timestamp as fields, so we should only change callers method/class signatures instead? In general the idea would be to keep a single implementation of the Cell and add a generic placeholder instead of ByteBuffer. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959841#comment-13959841 ] Benedict commented on CASSANDRA-6694: - rebased and pushed -f Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960112#comment-13960112 ] Jonathan Ellis commented on CASSANDRA-6694: --- Is there a case to be made here that there's more abstraction than necessary? Because I'm still having trouble wrapping my head around it. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960117#comment-13960117 ] Benedict commented on CASSANDRA-6694: - Well, it's probably indicative of something wrong, but I don't think it's the level of abstraction. Probably I can re-organise it to make it clearer, though. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960187#comment-13960187 ] Benedict commented on CASSANDRA-6694: - Rebased, reorganised and pushed to [6694-reorg|https://github.com/belliottsmith/cassandra/tree/6694-reorg] Does that make it clearer what's going on? Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960192#comment-13960192 ] Benedict commented on CASSANDRA-6694: - We basically have: BBAllocator (and implementors) BBPool + BBPoolAllocator (and implementors) NativePool + NativeAllocator BBPoolAllocator creates a BBAllocator per session, by wrapping the session's OpOrder.Group BBAllocator is used to construct Buffer* implementations (necessary without further refactor, as that's how CellName implementors work, and don't want to rip those apart in this commit); DataAllocator wraps the above to create arbitrary implementations (i.e. Native* or Buffer*, atm) Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960864#comment-13960864 ] Pavel Yaskevich commented on CASSANDRA-6694: Sorry guys, I've been busy with multiple things this week, will try to take a look at this on weekend. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958124#comment-13958124 ] Jonathan Ellis commented on CASSANDRA-6694: --- Can you give an overview of all the classes in memory/ and how they fit together? Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958166#comment-13958166 ] Benedict commented on CASSANDRA-6694: - Sure. There are now four main concepts that interact in various ways: ByteBufferAllocator, Pool PoolAllocator, ByteBufferPool.Allocator (and its implementors) and NativeAllocator # ByteBufferAllocator, as the name suggests, is a straightforward abstraction for the allocation/cloning of NIO ByteBuffers. It does not directly support any concept of pooling, nor understand the use of OpOrder for write guarding. # Pool and PoolAllocator are now independent of any concept of *what* they allocate - they simply manage the idea of the memory resources, and leave the actual allocation to the implementing class # ByteBufferPool.Allocator is the combination of PoolAllocator and BBA, although it itself isn't a BBA - it constructs a context BBA when given a writeOp that is guarding the allocation. This helps to keep the concept of write guarded pooled allocations cleanly separated from simple BBA allocations, whilst using the same code paths. Note that BBP.A is _abstract_ and is implemented by SlabAllocator and HeapPool.Allocator. We might consider renaming SlabAllocator to HeapSlabAllocator to keep naming consistent and help clarity. # NativeAllocator is, by contrast, the extension of PoolAllocator that supports native allocations - that is, any object that extends NativeAllocation. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946848#comment-13946848 ] Benedict commented on CASSANDRA-6694: - Pushed a rebased branch against latest CASSANDRA-6689 Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947041#comment-13947041 ] Jonathan Ellis commented on CASSANDRA-6694: --- So our classic Cell is ByteBuffer name + ByteBuffer value + long timestamp. After 6689 the buffers can be off-heap but just having the DirectBuffer objects on heap is still a lot. So we want to introduce NativeCell which is basically just a pointer to off-heap memory containing all 3. Why do we need all the Pool and Allocator changes for that? Can't we allocate NativeCell on the old SlabAllocator? Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947318#comment-13947318 ] Benedict commented on CASSANDRA-6694: - bq. Why do we need all the Pool and Allocator changes for that? Can't we allocate NativeCell on the old SlabAllocator? I want to avoid the ugly situation where one allocator can make two kinds of allocation, another can only make one, and it's not clear how or why, or which kind to allocate. I've separated the types of allocators, is all, and introduced an extra abstraction, DataAllocator, which encapsulates how one of the other type of allocator can be used to construct cells/keys, and also how to clean them up. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944885#comment-13944885 ] Sylvain Lebresne commented on CASSANDRA-6694: - bq. I see a 25% throughput improvement using offheap_objects as the allocator type vs either on/off heap buffers. That's definitively good to know, but does that suggest that without this, there isn't much notable performance difference between on and off heap buffers? Because if that's the case, I'm still of the opinion that it could be worth moving this to 3.0 on the argument that we've moved stuff in 2.1 last minute enough as it is. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944889#comment-13944889 ] Benedict commented on CASSANDRA-6694: - This specific patch only permits larger amounts of data to be retained in memtables; the only speed-wise performance implications of this are the ones stated here, i.e. improved write throughput through reduced write amplification and the writing of larger files. For this workload there's basically no difference between on and off-heap (CASSANDRA-6689) ByteBuffer backed storage, if that's what you're asking, because the on-heap overhead still heavily outweighs the off-heap utilisation. This would not be true for workloads with large per-column payloads. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944357#comment-13944357 ] Benedict commented on CASSANDRA-6694: - FTR, with a very simple and short performance comparison, simulating writing a lot of small (integer) fields, using cassandra-stress write n=40 -col size=fixed\(4\) n=fixed\(100\), I see a 25% throughput improvement using offheap_objects as the allocator type vs either on/off heap buffers. I should expect to see performance improve further as the length of the test increases, as write amplification takes its toll more rapidly on the heap buffers. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944212#comment-13944212 ] Benedict commented on CASSANDRA-6694: - Patch available [here|https://github.com/belliottsmith/cassandra/tree/6694-moreoffheap] Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922529#comment-13922529 ] Benedict commented on CASSANDRA-6694: - Quick update: I realised I had accidentally included a partial image of the changes I was making for CASSANDRA-6781 in the offheap2c I uploaded. I've fixed the repository by rolling FastByteComparisons back, since that shouldn't have been included in this ticket. I've also uploaded another [tree|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781] which includes 6781 for anyone who wants to performance test. I'm in the process of looking at this now. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917897#comment-13917897 ] Benedict commented on CASSANDRA-6694: - Pushed another update, with a slight modification that improves the behaviour when deleting stale entries in KeysSearcher and CompositesSearcher Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916722#comment-13916722 ] Benedict commented on CASSANDRA-6694: - Uploaded [offheap2c|https://github.com/belliottsmith/cassandra/tree/offheap2c] which contains [~slebresne]'s changes for ByteBuffer and underscore removal (since this is actually all in code only in this patch it hopefully shouldn't be too much trouble for future merges) Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909343#comment-13909343 ] Benedict commented on CASSANDRA-6694: - bq. My preferred solution would be, stop extracting the name so often by itself Splitting this out into a separate ticket: CASSANDRA-6755 Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908198#comment-13908198 ] Benedict commented on CASSANDRA-6694: - Pushed an updated branch [here|https://github.com/belliottsmith/cassandra/tree/offheap2a] This is merged with latest changes in 2.1, and patches up a few oversights as well as squashes the zillions of commits (I've left the prior branch intact in case a reviewer wants to walk the history, particularly across the refactor). [~mshuler]: For testing this, we're interested in inducing worst case behaviour by, e.g., getting to a steady state where the memtables are full (might want to jack memtable_cleanup_threshold to near to 1, so we can push it further. Also, that makes me realise we should assert it is 1 at startup :-) ), but using a heavily clustered schema; say at least 3 clustering columns, all small, e.g. ints. Use a small value column as well; maybe a double. Then pack a LOT of CQL rows into a single partition. Say 1M+. Then we want to see what performance is like querying from this. I expect it to be worse, but the question is how much worse, and can we tolerate it? Probably want to run variants of this with a mixed workload as well, but I expect any difference will be lost in the noise of the current garbage heavy write layer. Make sure you run this test connecting through native protocol, and for comparison test against same version but with memtable_allocator set to heap_slab. Make sure you tune memtable thresholds so they both retain the same amount of data :-) Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908336#comment-13908336 ] Sylvain Lebresne commented on CASSANDRA-6694: - I know I'll sound like a killjoy, but quickly skimming over the patch, this is a big patch that moves a lot of code, and more importantly to me, seems to add a lot of clutter in places that a priori don't seem all that much related (everywhere in the CQL3 code typically). I'm not saying that such changes aren't necessary, nor really criticizing the approach (I certainly haven't look at the patch close enough for that), just remarking that this is a lot of changes, and, it seems to me, a non negligible amount of added clutter. With that in mind, I don't necessary have a problem with this change in theory if such change has demonstrable non-negligible positive impact on performance, but I do have 2 small worries: # I do would want to see tests that demonstrable the non-negligible positive impact on performance before committing such change, not *just* tests that show this doesn't degrade performance. This might be the intent of everyone and I might just be stating the obvious, and my apologies for the distraction if that's the case, but the comments above seems to mainly mention the check it doesn't degrade things part, and so I just want to make sure we're on the same page. Are we? # this is marked for 2.1-beta2. Is it really reasonable to target big (and rather intrusive) changes like that for 2.1 at this point? Does that really give us the proper time to review such changes and evaluate the gains it give use before committing it? Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908521#comment-13908521 ] Jonathan Ellis commented on CASSANDRA-6694: --- bq. The basic idea is that we have made Cell and DecoratedKey both interfaces, and we have buffer and native implementations. The native implementations squash the implementation of CellName into the same object, so that we can avoid any allocation overhead, and so we don't need to allocate a new object every time we read the name. With you so far. bq. As a result we have had to go a little anti-OOP; DecoratedKey and *Cell are now interfaces, with static implementation modules, the methods of which are invoked by each implementation with themselves as the first parameter. Not sure I follow, I only see static methods sizeOf and construct in NativeCell for instance. bq. without CASSANDRA-6697 we allocate a lot of ByteBuffers temporarily, i.e. whenever we read the constituents of the name or the contents of the cell My preferred solution would be, stop extracting the name so often by itself. Spot checking the code, it seems we usually do this just to simplify a comparison, so this could in principle just be done with the Cell object rather than just the name. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908524#comment-13908524 ] Jonathan Ellis commented on CASSANDRA-6694: --- NB: While I'm all for simplifying the RefAction impact if possible, I see it as similar to the Allocator we got used to passing to addColumn for the benefit of counter contexts, so it doesn't really bother me a whole lot [more] in principle. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908537#comment-13908537 ] Benedict commented on CASSANDRA-6694: - bq. Not sure I follow, I only see static methods sizeOf and construct in NativeCell for instance. These are in a subclass called Impl within each *Cell interface, so it is shared between Buffer*Cell and Native*Cell bq. My preferred solution would be, stop extracting the name so often by itself. Spot checking the code, it seems we usually do this just to simplify a comparison, so this could in principle just be done with the Cell object rather than just the name. I'll have a closer look and see how easy this would be. As it happens, I've made (but not yet published) some changes to FastByteComparisons that might be extensible to make this work without object instantiation. If we eliminated allocation for name comparisons we probably would get _most_ of any benefit, so this might be workable. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908605#comment-13908605 ] Benedict commented on CASSANDRA-6694: - So, at the very least we use it on each read when we return the clustering columns. We also do this for the value itself; also in KeySearcher and CompositeSearcher whilst filtering; and DecoratedKey.key() is called in a _lot_ of places. This _may_ also make the comparison code a little unpleasant (maybe not, also, but we have a few different kinds of comparison to deal with, and we need to make sure there's minimal penalty)... but I think this is probably still the right way forward. It won't eliminate as much garbage, but it should leave garbage at worst linear in number of columns and independent of _size_ of columns, and mostly short lived. Which is probably good enough, and it is definitely more sane in scope. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904708#comment-13904708 ] Benedict commented on CASSANDRA-6694: - Initial patch for this is available [here|https://github.com/belliottsmith/cassandra/tree/offheap2] The basic idea is that we have made Cell and DecoratedKey both interfaces, and we have buffer and native implementations. The native implementations squash the implementation of CellName into the same object, so that we can avoid any allocation overhead, and so we don't need to allocate a new object every time we read the name. As a result we have had to go a little anti-OOP; DecoratedKey and *Cell are now interfaces, with static implementation modules, the methods of which are invoked by each implementation with themselves as the first parameter. This isn't super pretty, but it isn't super ugly either. The ugliest thing here is that I flatten the logic from db.composites all into NativeCell, but it turns out this is actually really not very hard; they behave _mostly_ the same. I've also quite widely refactored the stuff introduced in CASSANDRA-5549 and CASSANDRA-6689: the PoolAllocator in utils.memory now only defines methods for managing the memory use of the pool; what it *means* to allocate is now left to its descendants to define. We now split them up into two camps: ByteBufferPool and NativePool (renamed from OffHeapPool). The formers' allocators implement ByteBufferAllocator (formerly AbstractAllocator), whereas the NativeAllocator allocates NativeAllocations. With me still? These NativeAllocations form the basis for any objects stored off-heap. Anyway, these PoolAllocators are now utilised by *Data*Allocators in the db package tree; these are comparatively simple, and I wanted to keep the guts of the memory management in utils.memory. These DataAllocator instances simply know how to clone DecoratedKey and Cell instances, and also how to tidy up any unused references. Some notes: - This (and CASSANDRA-6689) have negative implications for Thrift at the moment, as I have to copy any data on-heap in order to return to thrift. Unfortunately this can only easily be rectified by modifying thrift so that we have method calls we can override in the worker tasks, that are invoked when starting and finishing the servicing of a request. - I've settled for a 24-byte object, as I really needed to keep some extra information on-heap. We can definitely tighten this in the future, but I think it probably isn't worth doing at this stage. - As things stand, without CASSANDRA-6697 we allocate a lot of ByteBuffers temporarily, i.e. whenever we read the constituents of the name or the contents of the cell It would be good to get some testing resources allocated to the first and last points to see if we should be trying to fix it. We should decide if we want CASSANDRA-6697 preferably before we go live with a final 2.1 release. What we really need to do is run a number of tests against schemas with a lot of composite columns, and see what effect there is on garbage collections, and latency metrics. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904728#comment-13904728 ] Jonathan Ellis commented on CASSANDRA-6694: --- I bet [~mshuler] could help with the testing. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)