[ https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922861#comment-13922861 ]
Benedict commented on CASSANDRA-6689: ------------------------------------- bq. Before that is addressed, I'm -1 of this These are already addressed in CASSANDRA-6694. bq. Object overhead would stay inside ParNew bounds (for (< p999)) The more we rely on staying within ParNew, the more often we are going to exceed it; and reducing the number of ParNew runs is also a good thing. You said you have 300ms ParNew pauses, occuring every second? So reducing the max latency and total latency is surely a good thing? bq. as the main idea is to have those pools of a fixed size How does this work without knowing the maximum size of a result set? We can't have a client block forever because we didn't provide enough room in the pools. Potentially we could have it error, but this seems inelegant to me, when it can be avoided. It also seems a suboptimal way to introduce back pressure, since it only affects concurrent reads / large reads. We should raise a ticket specifically to address back pressure, IMO, and try to come up with a good all round solution to the problem. bq. Let's say we live in the modern NUMA world, so we are going to do the following pin the group threads to CPU cores so we have fixed scope of allocation of different things, that why there is no significant bus pressure for copy among other things JVM/Cassandra does with memory It would be great to be more NUMA aware, but this is not about traffic over the interconnect, but simply with the arrays/memory banks themselves, and doesn't address any of the other negative consequences. You'll struggle to get more than a few GB/s bandwidth out of a modern CPU given that we are copying object trees (even shallow ones - they're still randomly distributed), and we don't want to waste any of that if we can avoid it bq. What do you mean by this, we still leave on the JVM, do we not? Also what would it do in the low memory situation? allocate from heap? wait? This is not pauseless operation. I did not mean to imply pauseless globally, but the memory reclaim operations introduced here are pauseless, thus reducing pauses overall, as whenever we would have had a pause from ParNew/FullGC to reclaim, we would not here. bq. We won't be able to answer queries directly from the messaging threads for the number of reasons not even indirectly related to your approach, at least for not breaking SEDA, which also supposed to be a safe guide for over utilization. I'm not sure why you think this would be a bad thing. It would only help for CL=1, but we are often benchmarked using this, so it's an important thing to be fast on if possible, and there are definitely a number of our users who are okay with CL=1 for whom faster responses would be great. Faster query answering should reduce over-utilisation, assuming some back-pressure built in to MessagingService or the co-ordinator managing its outstanding proxied requests to ensure it isn't overwhelmed by the responses. bq. The same way as jemalloc or any other allocator does it, it least that is not reinventing the wheel. Do you mean you would use jemalloc for every allocation? In which case there are further costs incurred for crossing the JNA barrier so frequently, almost certainly outweighing any benefit to using jemalloc. Otherwise we would need to maintain free-lists ourselves, or perform compacting GC. Personally I think compacting GC is actually much simpler. > Partially Off Heap Memtables > ---------------------------- > > Key: CASSANDRA-6689 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6689 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Benedict > Assignee: Benedict > Fix For: 2.1 beta2 > > Attachments: CASSANDRA-6689-small-changes.patch > > > Move the contents of ByteBuffers off-heap for records written to a memtable. > (See comments for details) -- This message was sent by Atlassian JIRA (v6.2#6252)