[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922259#comment-13922259
 ] 

Benedict commented on CASSANDRA-6689:
-------------------------------------

bq. Well, it seems like you never operated a real Cassandra cluster, did you? 

You seem to have interpreted my query as an attack on the veracity of your 
statement. It was not. I only wanted more specific facts that could be used to 
target a solution, and preferably a new ticket on which to discuss them.

This discussion has all the hallmarks of approaching unproductivity, so after 
this I do not think I have anything useful to add to the discussion, and will 
leave the committers to decide whether or not to include this work or to wait 
for you to produce your alternative:

# Any scheme that copies data will inherently incur larger GC pressure, as we 
then copy for memtable reads as well as disk reads. Object overhead is in fact 
_larger_ than the payload for many workloads, so even if we have arenas this 
effect is not eliminated or even appreciably ameliorated.
# Temporary reader space (and hence your approach) is *not* predictable: it is 
not proportional to the number of readers, but to the number and size of 
columns the readers read. In fact it is larger than this, as we probably have 
to copy anything we *might* want to use (given the way the code is 
encapsulated, this is what I do currently when copying on-heap - anything else 
would introduce notable complexity), not just columns that end up in the result 
set.
# We appear to be in agreement that your approach has higher costs associated 
with it. Further, copying potentially GB/s of (randomly located) data around 
destroys the CPU cache, reduces peak memory bandwidth by inducing strobes, 
consumes bandwidth directly, wastes CPU cycles waiting for the random lookups; 
all to no good purpose. We should be reducing these costs, not introducing more.
# It is simply not clear, despite your assertion of clarity, how you would 
reclaim any freed memory without "separate GC" (what else is GC but this 
reclamation?), however you want to call it, when it will be interspersed with 
non-freed memory, nor how you would guard the non-atomic copying (ref-counting, 
OpOrder, Lock: what?). Without this information it is not clear to me that it 
would be any simpler either.
# Your approach is currently (still poorly defined) vaporware.

Some further advantages specific to my approach:
# Pauseless operation, so improved predictability
# Absolute bound on memory utilisation, that can be rolled out to other data 
structures, further improving overall performance predictability
# Lock-freedom and low overhead, so we move closer to being able to answer 
queries directly from the messaging threads themselves, improving latency and 
throughput

An alternative approach needs, IMO, to demonstrate a clear superiority to the 
patch that is already available, especially when it will incur further work to 
produce. It is not clear to me that your solution is superior in any regard, 
nor any simpler. It also seems to be demonstrably less predictable and more 
costly, so I struggle to see how it could be considered preferable.

Also: 
bq. would that keep memtable around longer than expected

I'm not sure why you suppose this would be so. We can already happily reclaim 
any subportion of a region or memtable, so there is no reason to think this 
would be necessary, even if they resided in the same structure.

bq. there seems to be a low once off-heap feature is enabled which is no 
surprise once you look at how much complexity does it actually add.

This is certainly addressable. The off-heap feature by itself I have 
performance tested somewhat, and competes with Java GC for throughput (beating 
it as number of live objects increases), whilst being _pauseless_, so the 
complexity you refer to is no slouch and highly unlikely to be the culprit. 
There are issues with the way we manage IO for direct byte buffers, but I have 
addressed these in CASSANDRA-6781.


> Partially Off Heap Memtables
> ----------------------------
>
>                 Key: CASSANDRA-6689
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1 beta2
>
>         Attachments: CASSANDRA-6689-small-changes.patch
>
>
> Move the contents of ByteBuffers off-heap for records written to a memtable.
> (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to