[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077539#comment-17077539
 ] 

Stefania Alborghetti commented on CASSANDRA-15229:
--------------------------------------------------

{quote}My view is that having a significant proportion of memory wasted to 
fragmentation is a serious bug, irregardless of the total amount of memory that 
is wasted.
{quote}
 

That's absolutely true. However, it's also true that none of our users reported 
any problems when the cache was 512 MB and the default file access mode was 
mmap. Perhaps there are users in open source that reported problems, I haven't 
done a Jira search. So my point was simply meant to say that we should be 
mindful of changing critical code late in a release cycle if the existing code 
is performing adequately.
{quote}It's not poorly suited to long lived buffers its it? Only to buffers 
with widely divergent lifetimes.
{quote}
I implied the fact that lifetimes are divergent, since we're trying to support 
a cache, sorry about the confusion.

 
{quote}Honestly, given chunks are normally the same size, simply re-using the 
evicted buffer if possible, and if not allocating new system memory, seems 
probably sufficient to me.
{quote}
I'm not too sure that chunks are normally the same size. For data files, they 
depend on the compression parameters or on the partition sizes, both could be 
different for different tables. Also, indexes would use different chunk sizes 
surely? We observed that the chunk cache gradually tends to shift from buffers 
coming from data files to buffers coming from index files, as indexes are 
accessed more frequently. We have a different index implementation though.

 
{quote}{quote}I'll try to share some code so you can have a clearer picture.
{quote}
Thanks, that sounds great. I may not get to it immediately, but look forward to 
taking a look hopefully soon.
{quote}
I've dropped some files on this 
[branch|https://github.com/stef1927/cassandra/tree/15229-4.0]. The buffer pool 
is in org.apache.cassandra.utils.memory.buffers.  The starting point is the 
[BufferPool|https://github.com/apache/cassandra/compare/trunk...stef1927:15229-4.0#diff-72046b5d367f6e120594b58c973bed71R24]
 and its concrete implementations or the 
[BufferFactory|https://github.com/apache/cassandra/compare/trunk...stef1927:15229-4.0#diff-4fc5fae1de112fc5eb0bd865af532f0aR31].
 I've also dropped some related utility classes but not all of them, so clearly 
the code doesn't compile and the unit tests are also missing.

 

> BufferPool Regression
> ---------------------
>
>                 Key: CASSANDRA-15229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Caching
>            Reporter: Benedict Elliott Smith
>            Assignee: ZhaoYang
>            Priority: Normal
>             Fix For: 4.0, 4.0-beta
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to