[ https://issues.apache.org/jira/browse/CASSANDRA-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485058#comment-14485058 ]
Benedict commented on CASSANDRA-8897: ------------------------------------- bq. for page alignment we create a bigger buffer and slice it on an aligned buffer, is there a better way to do this? No, but you can (and should) allocate a large block of buffers so that you only have to truncate one unit of alignment for all buffers - say 512K/1Mb chunks, from which we slice smaller buffers. bq. then they get evicted if they get cold. The problem with the strategy you've taken is that we only evict entire queues, meaning we aren't very flexible. We also evict everything if the server is quiet for a period. This could lead to an odd situation of, say, an infrequent spurt of traffic with an uncommon page size, with a steady drip of queries using that size, and then a 0.5s drop in the regular main type of traffic, with this main traffic now never getting to cache its buffers. More typically it's likely to lead to a random allocation of memory between the pools. There is also a race condition that could leak memory. There are a lot of ways to skin this cat, but my suggestion would be perhaps much simpler, since we don't much mind the object allocation of the buffer wrapper, just the main body of it. Although we could avoid that too, so here are two suggestions: Simpler: * Have a shared queue for all buffer sizes, of slabs of some size, which are page aligned * On allocation we increment a count, slice the buffer size we need from the current slab, and set the buffer's attachment field to the slab it's from (or, have a map from parent buffer to slab) * On deallocation we decrement the count, and if that's hit zero we recycle the slab * If we want to be smart, we can have valid ranges we can slice from, but I don't think that's necessary. One thing we can do, though, is to collect all of the buffers we need to service a single read request upfront, so that they all have the same lifespan and we don't promote fragmentation. Perhaps as a follow up ticket. * If we exceed our limit, we allocate a buffer of only exactly the size we need (and don't bother page aligning) A little more complex (but not necessarily better): * Have separate queue for each buffer size/type, still allocate slabs * Maintain each slab in a globally shared LRU queue, and a local stack * Serve requests from the top slab on the stack; when it's exhausted, pop it; when the slab is fully (or perhaps partially, if the stack is empty) available again, push it back onto the top of the stack * If the stack is empty, and there is available room, allocate a new slab; otherwise deallocate the oldest shared slab; if this slab is still in use, allocate a buffer of exactly the size we want and non-page-aligned These are just suggestions; there are lots of possibilities when building a cache/pool like this. bq. at the moment only the compressed RAR uses direct allocation We should probably switch all readers to use direct. In fact we should probably not allocate heap buffers in any situation it isn't absolutely necessary. > Remove FileCacheService, instead pooling the buffers > ---------------------------------------------------- > > Key: CASSANDRA-8897 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8897 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Fix For: 3.0 > > > After CASSANDRA-8893, a RAR will be a very lightweight object and will not > need caching, so we can eliminate this cache entirely. Instead we should have > a pool of buffers that are page-aligned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)