[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082202#comment-17082202
 ] 

Benedict Elliott Smith commented on CASSANDRA-15229:
----------------------------------------------------

bq. In networking, most of the time, buffer will be release immediately after 
allocation and with recycleWhenFree=false, fully freed chunk will be reused 
instead of being recycled to global list. Partial-recycle is unlikely affect 
networking usage. I am happy to test it..

It is famously difficult to prove a negative, particularly via external 
testing.  It will be untrue in some circumstances, most notably large message 
processing (which happens asynchronously).  I would need to review the buffer 
control flow in messaging to confirm it is sufficiently low risk to modify the 
behaviour here, so I would prefer we not modify it in a way that is not easily 
verified.

bq. will it create fragmentation in system direct memory?

Not easily completely ruled out, but given this data will be allocated mostly 
in its own virtual page space (given all allocations are much larger than a 
normal page), it hopefully shouldn't be an insurmountable problem for most 
allocators given the availability of almost unlimited virtual page space on 
modern systems.

bq. I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both 
latencies are slightly worse than baseline.

Did you perform the simple optimisation of rounding up to the >= 2KiB boundary 
(for equivalent behaviour), then re-using any buffer that is correctly sized 
when evicting to make room for a new item?  It might well be possible to make 
this yet more efficient than {{BufferPool}} by reducing this boundary to e.g. 
1KiB, or perhaps as little as 512B.

So if I were doing this myself, I think I would be starting at this point and 
if necessary would move towards further reusing the buffers we already have in 
the cache - since it is already a pool of them.  I would just be looking to 
smooth out the random distribution of sizes used with e.g. a handful of queues 
each containing a single size of buffer and at most a handful of items each.  
This feels like a simpler solution to me, particularly as it does not affect 
any other pool users.

However, I’m not doing the work (nor maybe reviewing it), so if you are willing 
to at least enable the behaviour only for the ChunkCache so this change cannot 
have any unintended negative effect for those users not expected to benefit, my 
main concern will be alleviated.


> BufferPool Regression
> ---------------------
>
>                 Key: CASSANDRA-15229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Caching
>            Reporter: Benedict Elliott Smith
>            Assignee: ZhaoYang
>            Priority: Normal
>             Fix For: 4.0, 4.0-beta
>
>         Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to