[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964564#comment-13964564 ]
Pavel Yaskevich commented on CASSANDRA-6694: -------------------------------------------- bq. The JVM instruction set is besides the point. The point is what hotspot will do: with a single implementor or static method of small enough bytecode representation, it will be inlined. Note I said "multiple implementation" virtual method. With the option you suggest we will need an extra virtual invocation cost with every access to the underlying bytes, some extra math to access the right location, and one extra object field reference to locate the position we're offsetting from. These costs mount up rapidly. How is that besides the point when you claim that method calls with multiple implementations are slower than (and not getting inlined) static method invocations from multiple classes basically constant_pool reimplementation in your code?... What I claim is that it doesn't matter if you override a method multiple times or call a static method which calls another static method like your patch does for DeletedCell e.g. \{Native, Buffer\}DeletedCell.cellDataSize() which calls DeletedCell.Impl.cellDataSize(this) which transfers to Cell.Impl.cellDataSize(this); Just make an example disassemble classes (with javap -c or similar) and see what bytecode did it generate. Also for inlining problem I would like to see the proof of reason why are those methods are not getting inlined (are they even touched by JIT?) by enabling logging with -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining and sharing the output, otherwise "multiple implementation" virtual method being slow claim is just empty rhetoric. bq. Hmm. No, I now note your "client" implementation: what exactly is this one? Please clarify, as the thrift cell is going to need to be compared with the other implementations, and suddenly much of any benefit will disappear. The best way to make comparisons cheap and easy is to have both sides of the comparison have at least the same layout. If we have to either virtual invoke or instanceof check for every comparison, and a different code path for comparing each type of representation, there will be a performance impact. As such the only main benefit of this approach is eliminated in my eyes. Also, how will this "client" implementation achieve its various functions, and define its type? Seems like you'll need a duplicate hierarchy still. What was just a suggestion for temp container in between client transport and memtable, as those buffers are already allocated separately by thrift it seems reasonable to have Cell work with those buffers, it would take more memory for ByteBuffer containers passed from Thrift but cell comparison logic should not change because as they would operate on the common container type, it's similar contept to what Netty does with ByteBuf gathered from other ByteBuf pieces. > Slightly More Off-Heap Memtables > -------------------------------- > > Key: CASSANDRA-6694 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Benedict > Labels: performance > Fix For: 2.1 beta2 > > > The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as > the on-heap overhead is still very large. It should not be tremendously > difficult to extend these changes so that we allocate entire Cells off-heap, > instead of multiple BBs per Cell (with all their associated overhead). > The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 > bytes per cell on average for the btree overhead, for a total overhead of > around 20-22 bytes). This translates to 8-byte object overhead, 4-byte > address (we will do alignment tricks like the VM to allow us to address a > reasonably large memory space, although this trick is unlikely to last us > forever, at which point we will have to bite the bullet and accept a 24-byte > per cell overhead), and 4-byte object reference for maintaining our internal > list of allocations, which is unfortunately necessary since we cannot safely > (and cheaply) walk the object graph we allocate otherwise, which is necessary > for (allocation-) compaction and pointer rewriting. > The ugliest thing here is going to be implementing the various CellName > instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)