[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920720#comment-13920720
 ] 

Benedict commented on CASSANDRA-6689:
-------------------------------------

bq.  sort of RCU (i'm looking at you OpOrder)

What do you mean here? If you mean read-copy-update, OpOrder is nothing like 
this.

bq. I'm not sure what is to retain here if we do that copy when we send to the 
wire

Ultimately, doing this copying before sending to the wire is something I would 
like to avoid. Using the RefAction.allocateOnHeap() on top of this copying sees 
wire transfer speeds for thrift drop by about 10% in my fairly rough-and-ready 
benchmarks, so obviously copying has a cost. Possibly this cost is due to 
unavoidably copying data you don't necessarily want to serialise, but it seems 
to be there. Ultimately if we want to get in-memory read operations to 10x 
their current performance, we can't go cutting any corners.

bq. introducing separate gc

I've stated clearly what this introduces as a benefit: overwrite workloads no 
longer cause excessive flushes

bq.  things but as we have a fixed number of threads it is going to work out 
the same way as for buffering open files in the steady system state

Your next sentence states how this is a large cause of memory consumption, so 
surely we should be using that memory if possible for other uses (returning it 
to the buffer cache, or using it internally for more caching)?

bq. Temporary memory allocated by readers is exactly what we should be managing 
at the first place because they allocate the most and it always the biggest 
concern for us

I agree we should be moving to managing this as well, however I disagree about 
how we should be managing it. In the medium term we should be bringing the 
buffer cache in process, so that we can answer some queries without handing off 
to the mutation stage (anything known to be non-blocking and fast should be 
answered immediately by the thread that processed the connection), at which 
point we will benefit from shared use of the memory pool, and concrete control 
over how much memory readers are using, and zero-copy reads from the buffer 
cache. I hope we may be able to do this for 3.0.

bq. do a simple memcpy test and see how much mb/s can you get from copying from 
one pre-allocated pool to another

Are you performing a full object tree copy, and doing this with a running 
system to see how it affects the performance of other system components? If 
not, it doesn't seem to be a useful comparison. Note that this will still 
create a tremendous amount of heap churn, as most of the memory used by objects 
right now is on-heap. So copying the records is almost certainly no better for 
young gen pressure than what we currently do - in fact, *it probably makes the 
situation worse*.

bq. it's not the memtable which creates the most of the noise and memory 
presure in the system (even tho it uses big chunk of heap) 

It may not be causing the young gen pressure you're seeing, but it certainly 
offers some benefit here by keeping more rows in memory so recent queries are 
more likely to be answered with zero allocation, so reducing young gen 
pressure; it is also a foundation for improving the row cache and introducing a 
shared page cache which could bring us closer to zero allocation reads.

It's also not clear to me how you would be managing the reclaim of the off-heap 
allocations without OpOrder, or do you mean to only use off-heap buffers for 
readers, or to ref-count any memory as you're reading it? Not using off-heap 
memory for the memtables would negate the main original point of this ticket: 
to support larger memtables, thus reducing write amplification. Ref-counting 
incurs overhead linear to the size of the result set, much like copying, and is 
also fiddly to get right (not convinced it's cleaner or neater), whereas 
OpOrder incurs overhead proportional to the number of times you reclaim. So if 
you're using OpOrder, all you're really talking about is a new RefAction: 
copyToAllocator() or something. So it doesn't notably reduce complexity, it 
just reduces the quality of the end result.


> Partially Off Heap Memtables
> ----------------------------
>
>                 Key: CASSANDRA-6689
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1 beta2
>
>         Attachments: CASSANDRA-6689-small-changes.patch
>
>
> Move the contents of ByteBuffers off-heap for records written to a memtable.
> (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to