[ https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922259#comment-13922259 ]
Benedict commented on CASSANDRA-6689: ------------------------------------- bq. Well, it seems like you never operated a real Cassandra cluster, did you? You seem to have interpreted my query as an attack on the veracity of your statement. It was not. I only wanted more specific facts that could be used to target a solution, and preferably a new ticket on which to discuss them. This discussion has all the hallmarks of approaching unproductivity, so after this I do not think I have anything useful to add to the discussion, and will leave the committers to decide whether or not to include this work or to wait for you to produce your alternative: # Any scheme that copies data will inherently incur larger GC pressure, as we then copy for memtable reads as well as disk reads. Object overhead is in fact _larger_ than the payload for many workloads, so even if we have arenas this effect is not eliminated or even appreciably ameliorated. # Temporary reader space (and hence your approach) is *not* predictable: it is not proportional to the number of readers, but to the number and size of columns the readers read. In fact it is larger than this, as we probably have to copy anything we *might* want to use (given the way the code is encapsulated, this is what I do currently when copying on-heap - anything else would introduce notable complexity), not just columns that end up in the result set. # We appear to be in agreement that your approach has higher costs associated with it. Further, copying potentially GB/s of (randomly located) data around destroys the CPU cache, reduces peak memory bandwidth by inducing strobes, consumes bandwidth directly, wastes CPU cycles waiting for the random lookups; all to no good purpose. We should be reducing these costs, not introducing more. # It is simply not clear, despite your assertion of clarity, how you would reclaim any freed memory without "separate GC" (what else is GC but this reclamation?), however you want to call it, when it will be interspersed with non-freed memory, nor how you would guard the non-atomic copying (ref-counting, OpOrder, Lock: what?). Without this information it is not clear to me that it would be any simpler either. # Your approach is currently (still poorly defined) vaporware. Some further advantages specific to my approach: # Pauseless operation, so improved predictability # Absolute bound on memory utilisation, that can be rolled out to other data structures, further improving overall performance predictability # Lock-freedom and low overhead, so we move closer to being able to answer queries directly from the messaging threads themselves, improving latency and throughput An alternative approach needs, IMO, to demonstrate a clear superiority to the patch that is already available, especially when it will incur further work to produce. It is not clear to me that your solution is superior in any regard, nor any simpler. It also seems to be demonstrably less predictable and more costly, so I struggle to see how it could be considered preferable. Also: bq. would that keep memtable around longer than expected I'm not sure why you suppose this would be so. We can already happily reclaim any subportion of a region or memtable, so there is no reason to think this would be necessary, even if they resided in the same structure. bq. there seems to be a low once off-heap feature is enabled which is no surprise once you look at how much complexity does it actually add. This is certainly addressable. The off-heap feature by itself I have performance tested somewhat, and competes with Java GC for throughput (beating it as number of live objects increases), whilst being _pauseless_, so the complexity you refer to is no slouch and highly unlikely to be the culprit. There are issues with the way we manage IO for direct byte buffers, but I have addressed these in CASSANDRA-6781. > Partially Off Heap Memtables > ---------------------------- > > Key: CASSANDRA-6689 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6689 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Benedict > Assignee: Benedict > Fix For: 2.1 beta2 > > Attachments: CASSANDRA-6689-small-changes.patch > > > Move the contents of ByteBuffers off-heap for records written to a memtable. > (See comments for details) -- This message was sent by Atlassian JIRA (v6.2#6252)