[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082081#comment-17082081
 ] 

ZhaoYang commented on CASSANDRA-15229:
--------------------------------------

{quote}
Recirculating immediately will lead to greater inefficiency in allocation, as 
we will attempt to reuse partially freed chunks in preference to entirely freed 
chunks, leading to a great deal more churn in the active blocks. This will 
affect the networking pooling as much as the chunk cache.
{quote}

In networking, most of the time, buffer will be release immediately after 
allocation and  with {{recycleWhenFree=false}}, fully freed chunk will be 
reused instead of being recycled to global list. Partial-recycle is unlikely 
affect networking usage. I am happy to test it..

{quote}
 At the very least this behaviour should be enabled only for the ChunkCache, 
but ideally might have e.g. two queues, one with guaranteed-free chunks, 
another (perhaps for ease a superset) containing those chunks that might or 
mightn't be free.
{quote}

It's a good idea to have a separate queue and let partially freed chunk to have 
lower priority than fully freed chunk. So partially freed chunks will likely 
have larger freed space comparing to reusing them immediately.

{quote}if using Unsafe.allocateMemory wouldn't be simpler, more efficient, less 
risky and produce less fragmentation.
{quote}

It is simpler, but not efficient.. Without slab allocation, will it create 
fragmentation in system direct memory? 

I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both 
latencies are slightly worse than baseline. 

btw, I think it'd be nice to add a new metrics to track direct bytebuffer 
allocation outside of buffer pool because they may be held by chunk cache for a 
long time.

Chunk cache with 
[Bytebuffer.allocateDirect|https://github.com/jasonstack/cassandra/commit/c3f286c1148d13f00364872413733822a4a2c475]:
 !15229-direct.png|width=600,height=400!

Chunk cache with 
[Unsafe.allocateMemory|https://github.com/jasonstack/cassandra/commit/3dadd884ff0d8e19d3dd46a07a290762755df312]:
 !15229-unsafe.png|width=600,height=400!

> BufferPool Regression
> ---------------------
>
>                 Key: CASSANDRA-15229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Caching
>            Reporter: Benedict Elliott Smith
>            Assignee: ZhaoYang
>            Priority: Normal
>             Fix For: 4.0, 4.0-beta
>
>         Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to