[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082202#comment-17082202 ]
Benedict Elliott Smith edited comment on CASSANDRA-15229 at 4/14/20, 12:02 AM: ------------------------------------------------------------------------------- bq. In networking, most of the time, buffer will be release immediately after allocation and with recycleWhenFree=false, fully freed chunk will be reused instead of being recycled to global list. Partial-recycle is unlikely affect networking usage. I am happy to test it.. It is famously difficult to prove a negative, particularly via external testing. It will be untrue in some circumstances, most notably large message processing (which happens asynchronously). I would need to review the buffer control flow in messaging to confirm it is sufficiently low risk to modify the behaviour here, so I would prefer we not modify it in a way that is not easily verified. bq. will it create fragmentation in system direct memory? -Not easily completely ruled out, but given this data will be allocated mostly in its own virtual page space (given all allocations are much larger than a normal page), it hopefully shouldn't be an insurmountable problem for most allocators given the availability of almost unlimited virtual page space on modern systems.- edit: while this may be true, it's a bit of a stretch as I haven't looked at any modern allocator remotely recently, and I should not extrapolate in this way (however it's anyway probably not something to worry about if we're allocating relatively regular sizes) bq. I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both latencies are slightly worse than baseline. Did you perform the simple optimisation of rounding up to the >= 2KiB boundary (for equivalent behaviour), then re-using any buffer that is correctly sized when evicting to make room for a new item? It might well be possible to make this yet more efficient than {{BufferPool}} by reducing this boundary to e.g. 1KiB, or perhaps as little as 512B. So if I were doing this myself, I think I would be starting at this point and if necessary would move towards further reusing the buffers we already have in the cache - since it is already a pool of them. I would just be looking to smooth out the random distribution of sizes used with e.g. a handful of queues each containing a single size of buffer and at most a handful of items each. This feels like a simpler solution to me, particularly as it does not affect any other pool users. However, I’m not doing the work (nor maybe reviewing it), so if you are willing to at least enable the behaviour only for the ChunkCache so this change cannot have any unintended negative effect for those users not expected to benefit, my main concern will be alleviated. was (Author: benedict): bq. In networking, most of the time, buffer will be release immediately after allocation and with recycleWhenFree=false, fully freed chunk will be reused instead of being recycled to global list. Partial-recycle is unlikely affect networking usage. I am happy to test it.. It is famously difficult to prove a negative, particularly via external testing. It will be untrue in some circumstances, most notably large message processing (which happens asynchronously). I would need to review the buffer control flow in messaging to confirm it is sufficiently low risk to modify the behaviour here, so I would prefer we not modify it in a way that is not easily verified. bq. will it create fragmentation in system direct memory? -Not easily completely ruled out, but given this data will be allocated mostly in its own virtual page space (given all allocations are much larger than a normal page), it hopefully shouldn't be an insurmountable problem for most allocators given the availability of almost unlimited virtual page space on modern systems. - edit: while this may be true, it's a bit of a stretch as I haven't looked at any modern allocator remotely recently, and I should not extrapolate in this way (however it's anyway probably not something to worry about if we're allocating relatively regular sizes) bq. I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both latencies are slightly worse than baseline. Did you perform the simple optimisation of rounding up to the >= 2KiB boundary (for equivalent behaviour), then re-using any buffer that is correctly sized when evicting to make room for a new item? It might well be possible to make this yet more efficient than {{BufferPool}} by reducing this boundary to e.g. 1KiB, or perhaps as little as 512B. So if I were doing this myself, I think I would be starting at this point and if necessary would move towards further reusing the buffers we already have in the cache - since it is already a pool of them. I would just be looking to smooth out the random distribution of sizes used with e.g. a handful of queues each containing a single size of buffer and at most a handful of items each. This feels like a simpler solution to me, particularly as it does not affect any other pool users. However, I’m not doing the work (nor maybe reviewing it), so if you are willing to at least enable the behaviour only for the ChunkCache so this change cannot have any unintended negative effect for those users not expected to benefit, my main concern will be alleviated. > BufferPool Regression > --------------------- > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching > Reporter: Benedict Elliott Smith > Assignee: ZhaoYang > Priority: Normal > Fix For: 4.0, 4.0-beta > > Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, > 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, > 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, > 15229-unsafe.png > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org