[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923029#comment-13923029
 ] 

Benedict edited comment on CASSANDRA-6689 at 3/6/14 9:07 PM:
-------------------------------------------------------------

bq. Is there a branch/patch to see all of the changes involved?

Yes, 
[offheap2c+6781|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781] 
and [offheap2c|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781]

The former includes performance enhancements for DirectByteBuffer use, from a 
separate ticket.

bq.  I'm just saying that all of the read/write requests are short lived enough 
to stay inside young generation region
Well, yes and no. We have to wait until any client finishes processing the 
data, so there's no absolute guarantee they'll not survive. But either way, 
ParNew pauses are almost as bad as full GC, only they happen much more often. 
300ms pauses are not a good thing, and if we can then reduce the size of YG so 
that when these pauses do happen they're shorter (or, say, maybe even use G1GC) 
then that's even better.

bq. Sorry but I still don't get it, do you mean lock-free/non-blocking or that 
it does no syscalls or something similar? But that doesn't matter for pauses as 
much as allocation throughput and fragmentation of Java GC.

I mean no worker threads have to stop in order for memory to be reclaimed. 
There is no STW for reclaiming memory. This is unrelated to the lock freedom.

bq. Java is the worst of cache locality with it's object placement anyway 

Well, exactly. That's my point here :-)

bq.  it adds another complications not related to this very ticket.

I have no intention of doing it in this ticket, just indicating it as a very 
useful improvement.

bq. which is how we usually do things for Cassandra project

It looks like you really have two steps, if I boil it down: 1) implement 
copying approach; 2) if slow, implement this approach? 

The issue with this, really, though, is that if we simply allocate ByteBuffers 
off-heap like we have in this ticket, we get none of the benefit of increased 
memtable capacity, since the on-heap overheads are still huge. Since that's one 
of the main goals here, it seems problematic to lose out on it - we don't need 
to test to find out what the result would be. This ticket was supposed to only 
be a stepping stone. 

Possibly we could scale CASSANDRA-6694 back somewhat, removing any support for 
RefAction.refer(), and always performing a copy onto heap from the NativeCell 
implementations, and spending some time ripping out any of the GC or any of the 
code at Memtable discard for coping with RefAction.refer(). But honestly this 
seems like a waste of effort to me, as the majority of the code would remain, 
we'll just not have as good a final solution. But it could be done if that is 
the community preference. We could probably split it up into further commits, 
but each commit adds the potential for more errors in my book, when we have a 
good solution that is ready to go.

Maximally separated timeline for separated commits of 6694 would be:
# introduce concurrency primitives
# introduce .memory and .data refactor, but only for ByteBuffer allocators
# introduce all .data.Native* implementations and a cut down native allocator, 
using OpOrder to guard copying like we currently do for referencing (we need to 
use something, and it is simpler than ref counting or anything else)
# introduce RefAction
# introduce GC

I would rather not split it up as, as I say, each new patch is an opportunity 
to mess up, but it could be done. We can do performance testing to our hearts 
content at each stage, although personally I think such testing would not be 
sufficient to demonstrate no benefit to the current approach, as even with 
little benefit seen it presupposes no future performance benefit is capped by 
the slower solution. So I would push for the final patch anyway. That said, I 
would be surprised if we did not see any improvement by comparison.


was (Author: benedict):
bq. Is there a branch/patch to see all of the changes involved?

Yes, 
[offheap2c+6781|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781] 
and [offheap2c|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781]

The former includes performance enhancements for DirectByteBuffer use, from a 
separate ticket.

bq.  I'm just saying that all of the read/write requests are short lived enough 
to stay inside young generation region
Well, yes and no. We have to wait until any client finishes processing the 
data, so there's no absolute guarantee they'll not survive. But either way, 
ParNew pauses are almost as bad as full GC, only they happen much more often. 
300ms pauses are not a good thing, and if we can then reduce the size of YG so 
that when these pauses do happen they're shorter (or, say, maybe even use G1GC) 
then that's even better.

bq. Sorry but I still don't get it, do you mean lock-free/non-blocking or that 
it does no syscalls or something similar? But that doesn't matter for pauses as 
much as allocation throughput and fragmentation of Java GC.

I mean no worker threads have to stop in order for memory to be reclaimed. 
There is no STW for reclaiming memory. This is unrelated to the lock freedom.

bq. Java is the worst of cache locality with it's object placement anyway 

Well, exactly. That's my point here :-)

bq.  it adds another complications not related to this very ticket.

I have no intention of doing it in this ticket, just indicating it as a very 
useful improvement.

bq. which is how we usually do things for Cassandra project

It looks like you really have two steps, if I boil it down: 1) implement 
copying approach; 2) if slow, implement this approach? 

The issue with this, really, though, is that if we simply allocate ByteBuffers 
off-heap like we have in this ticket, we get none of the benefit of increased 
memtable capacity, since the on-heap overheads are still huge. Since that's one 
of the main goals here, it seems problematic to lose out on it - we don't need 
to test to find out what the result would be. This ticket was supposed to only 
be a stepping stone. 

Possibly we could scale CASSANDRA-6694 back somewhat, removing any support for 
RefAction.refer(), and always performing a copy onto heap from the NativeCell 
implementations, and spending some time ripping out any of the GC or any of the 
code at Memtable discard for coping with RefAction.refer(). But honestly this 
seems like a waste of effort to me, as the majority of the code would remain, 
we'll just not have as good a final solution. But it could be done if that is 
the community preference. We could probably split it up into further commits, 
but each commit adds the potential for more errors in my book, when we have a 
good solution that is ready to go.

Maximally separated timeline for separated commits of 6694 would be:
# introduce concurrency primitives
# introduce .memory and .data refactor, but only for ByteBuffer allocators, and 
RefAction, but only allocateOnHeap
# introduce all .data.Native* implementations and a cut down native allocator, 
using OpOrder to guard copying like we currently do for referencing (we need to 
use something, and it is simpler than ref counting or anything else)
# introduce RefAction.refer(), GC, etc. (i.e. final patch)

I would rather not split it up as, as I say, each new patch is an opportunity 
to mess up, but it could be done. We can do performance testing to our hearts 
content at each stage, although personally I think such testing would not be 
sufficient to demonstrate no benefit to the current approach, as even with 
little benefit seen it presupposes no future performance benefit is capped by 
the slower solution. So I would push for the final patch anyway. That said, I 
would be surprised if we did not see any improvement by comparison.

> Partially Off Heap Memtables
> ----------------------------
>
>                 Key: CASSANDRA-6689
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1 beta2
>
>         Attachments: CASSANDRA-6689-small-changes.patch
>
>
> Move the contents of ByteBuffers off-heap for records written to a memtable.
> (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to