[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922861#comment-13922861
 ] 

Benedict commented on CASSANDRA-6689:
-------------------------------------

bq. Before that is addressed, I'm -1 of this

These are already addressed in CASSANDRA-6694.

bq. Object overhead would stay inside ParNew bounds (for (< p999)) 

The more we rely on staying within ParNew, the more often we are going to 
exceed it; and reducing the number of ParNew runs is also a good thing. You 
said you have 300ms ParNew pauses, occuring every second? So reducing the max 
latency and total latency is surely a good thing?

bq.  as the main idea is to have those pools of a fixed size

How does this work without knowing the maximum size of a result set? We can't 
have a client block forever because we didn't provide enough room in the pools. 
Potentially we could have it error, but this seems inelegant to me, when it can 
be avoided. It also seems a suboptimal way to introduce back pressure, since it 
only affects concurrent reads / large reads. We should raise a ticket 
specifically to address back pressure, IMO, and try to come up with a good all 
round solution to the problem.

bq. Let's say we live in the modern NUMA world, so we are going to do the 
following pin the group threads to CPU cores so we have fixed scope of 
allocation of different things, that why there is no significant bus pressure 
for copy among other things JVM/Cassandra does with memory

It would be great to be more NUMA aware, but this is not about traffic over the 
interconnect, but simply with the arrays/memory banks themselves, and doesn't 
address any of the other negative consequences. You'll struggle to get more 
than a few GB/s bandwidth out of a modern CPU given that we are copying object 
trees (even shallow ones - they're still randomly distributed), and we don't 
want to waste any of that if we can avoid it

bq. What do you mean by this, we still leave on the JVM, do we not? Also what 
would it do in the low memory situation? allocate from heap? wait? This is not 
pauseless operation.

I did not mean to imply pauseless globally, but the memory reclaim operations 
introduced here are pauseless, thus reducing pauses overall, as whenever we 
would have had a pause from ParNew/FullGC to reclaim, we would not here.

bq. We won't be able to answer queries directly from the messaging threads for 
the number of reasons not even indirectly related to your approach, at least 
for not breaking SEDA, which also supposed to be a safe guide for over 
utilization.

I'm not sure why you think this would be a bad thing. It would only help for 
CL=1, but we are often benchmarked using this, so it's an important thing to be 
fast on if possible, and there are definitely a number of our users who are 
okay with CL=1 for whom faster responses would be great. Faster query answering 
should reduce over-utilisation, assuming some back-pressure built in to 
MessagingService or the co-ordinator managing its outstanding proxied requests 
to ensure it isn't overwhelmed by the responses.

bq. The same way as jemalloc or any other allocator does it, it least that is 
not reinventing the wheel.

Do you mean you would use jemalloc for every allocation? In which case there 
are further costs incurred for crossing the JNA barrier so frequently, almost 
certainly outweighing any benefit to using jemalloc. Otherwise we would need to 
maintain free-lists ourselves, or perform compacting GC. Personally I think 
compacting GC is actually much simpler.



> Partially Off Heap Memtables
> ----------------------------
>
>                 Key: CASSANDRA-6689
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1 beta2
>
>         Attachments: CASSANDRA-6689-small-changes.patch
>
>
> Move the contents of ByteBuffers off-heap for records written to a memtable.
> (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to