[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

Benedict (JIRA) Thu, 17 Apr 2014 01:56:34 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972524#comment-13972524
 ]


Benedict commented on CASSANDRA-6694:
-------------------------------------

So, on the whole I really don't perceive this approach as better: there's a 
great deal of code duplication now (set to get worse still when you finish the 
refactor for DecoratedKey), between each of the correspondingly named cell 
implementations. Personally I think the Impl approach is neater as a result of 
avoiding that (this may be more pronounced if we decide to optimise equals() is 
you suggested). That said, if this moves us forwards I can live with it, if you 
can address point 1 below.

There are a few problems though:

# I am *very* opposed to a public setPeer() method. This is a deal breaker for 
me - but it can be avoided with a bit more refactoring.
# Your optimised updateDigest function is actually much slower than the old 
implementation for all but the smallest values: an optimised version needs to 
batch the contents into an array (stored in a ThreadLocal) and call 
updateDigest with the array, unless the total size is very small (there's a 
crossover point on my laptop of about 12 bytes, under which it's faster to call 
update(byte)).
# AbstractNativeCell.getBytes actually calls setBytes
# excessHeapSize... should be unsharedHeapSize...
# There should be no hashCode method in Buffer\*Cell - I removed these for a 
reason. Because we can have a Cell that is a CellName, and vice-versa, using a 
Cell as a key for a map is likely dangerous. Since we don't do it anywhere, 
it's safe to simply remove the methods.

There may be other minor issues, I'll hold off giving it a formal review until 
we decide the direction we're going. To respond to a few of your comments:

bq. CounterUpdateCell interface is missing as well as NativeCounterUpdateCell 
implementation to match it.

There shouldn't be one for the time being - we can never construct one.

bq. CounterUpdateCell should be BufferCounterUpdateCell as it extends BufferCell

Same reason - it doesn't exist as either or, so I made a conscious decision to 
leave it as a CounterUpdateCell: the fact that it extends BufferCell is kind of 
unimportant. It's purpose is somewhat different, and I think it is better left 
named CounterUpdateCell, as that is its purpose (to carry a counter update as 
far as the memtable, and no further).

bq. Impl classes extends another Impl classes which doesn't make much sense as 
all of the methods are static.

This brings in the namespace of the extended class' static methods, which is 
useful.

bq. When taken out of context like that it doesn't really make sense but what I 
meant, there are situation where we don't really need to get BB from the 
CellName but can transfer bytes directly (especially for the native cell 
implementations).

Sure, but again: scope of ticket, and care needs to be taken when doing this 
(e.g. your updateDigest modifications)

> Slightly More Off-Heap Memtables
> --------------------------------
>
>                 Key: CASSANDRA-6694
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1 beta2
>
>
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
> the on-heap overhead is still very large. It should not be tremendously 
> difficult to extend these changes so that we allocate entire Cells off-heap, 
> instead of multiple BBs per Cell (with all their associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
> bytes per cell on average for the btree overhead, for a total overhead of 
> around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
> address (we will do alignment tricks like the VM to allow us to address a 
> reasonably large memory space, although this trick is unlikely to last us 
> forever, at which point we will have to bite the bullet and accept a 24-byte 
> per cell overhead), and 4-byte object reference for maintaining our internal 
> list of allocations, which is unfortunately necessary since we cannot safely 
> (and cheaply) walk the object graph we allocate otherwise, which is necessary 
> for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName 
> instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

Reply via email to