[ 
https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment: 2252-v3.txt

Rebased most of Stu's latest.  Changed getLiveSize to only add in waste from 
the allocator instead of double-counting the rest.  Enabled 
MemoryMeter.omitSharedBufferOverhead, which is super untested.

CFS.getColumnFamily was getting passed an allocator but this doesn't actually 
do anything.  (I removed the parameter.)  Was this supposed to be used during 
counter reconcile somehow?

Passing allocator throughout the CF+SC+[Super|Counter|Deleted|Expiring]Column 
heirarchy is ugly and error-prone.  (I found and fixed one error while 
rebasing, where a method taking an allocator parameter called the default 
addColumn, instead of the addColumn-with-allocator.)  Perhaps moving allocator 
to AbstractColumnContainer could fix this?

Not thrilled with the current alternatives for moving slabs off-heap.  Our 
options are to 
- use allocateDirect with all the problems that relying on finalization brings 
(see: CASSANDRA-2521), as well as requiring users to manually tune the JVM 
direct buffer ceiling (or face a flood of System.GC calls courtesy of 
allocateDirect when the ceiling is reached).
- use JNA + manual free, which will require doing reference counting for 
memtables the way we do for sstables post-CASSANDRA-2521.  Otherwise if a 
thread that had the memtable in its list of historical memtables to merge from 
tries to read, you segfault.  (This is NOT the same as the JNA 179 segfaults, 
which are fixed in 3.3.0.)
- stick with on-heap slabs

I'd say off-heap slabs don't matter that much but it would make the promotion 
failure problems you saw go away completely.

I'm also not a big fan of slabbing everything in sight.  Keys associated with 
memtables make sense (and is done in my rebase).  Row key and column names 
during sstable build, I'm skeptical of -- if your rows are small enough that 
they finish in before new -> old promotion, then it doesn't matter.  And if 
they are so large they do not, then your rate of key allocation is glacial and 
again it shouldn't matter.  But, if we WERE to slab these the right way to do 
it would be per-sstable not per IndexSummary.

There is no logical unit of slabbing for key cache, we shouldn't be doing that 
at all.

I have an alternative idea to reduce non-memtable fragmentation: Adding region 
recycling post-flush.  Once you promoted a slab in old gen, it stays there, 
instead of being GC'd and replaced with a slab in new gen again.

(This would also mitigate the main downside of allocateDirect.)

We'd still probably want some kind of delayed release of slabs so write load 
spikes don't permanently chew up your entire heap.


> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 
> 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd 
> Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to