[ https://issues.apache.org/jira/browse/CASSANDRA-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101998#comment-14101998 ]
Benedict commented on CASSANDRA-6726: ------------------------------------- bq. Buffer sizes do vary Other than the compressed input size, where do they vary? (I might have missed it). The input size *should* be bound by the chunk size; if it's larger we have a negative compression ratio, which should be rare, in which case I am happy either _not_ caching them, or with bumping the size of all buffers we allocate to accommodate the slight extra room (should be very minimal). bq. so do the required buffer types I'm not sure this dramatically affects the code, but also with CASSANDRA-7039 this should no longer be the case, so I'd prefer not to add too much complexity that will be soon made redundant. bq. I tried to define a flexible way to solve this which can be reused in other parts of Cassandra that require buffers. I appreciate this, however I do not think there are other places where it will be meaningfully useful, now or in the immediate future, and so I'd prefer not to pollute the namespace of our utilities (which is already unpleasant enough wrt pooling) especially as there is increased complexity in maintaining this more general pool. I'm not completely against the pooling you have defined, I just don't see it as better than the simpler approach. If you can make a compelling case, please do so. bq. My usual workflow is to first create a patch ... then .... a lot of measurements and comparisons in the process. It's good to state when you post a patch that you don't consider it complete (and perhaps outline things you intend to cover still) so that review can be better targeted; otherwise it will be assumed the patch is approximately final. Don't go too overboard with micro-measurement. The particular impact here is likely to be hard to measure in benchmarks, especially with confidence the results are generalizable, but it's a reasonably obvious win for systems experiencing GC pressure and we have plenty of other things to address. At the moment there are a lot of major performance improvements to make with big impacts, so I prefer not to get bogged down in careful measurement unless it's clear the codepath can have significant performance impact (as in, at least measurable single digit percentages in common workloads) bq. For ByteArrayCompressor.uncompress... Either is fine - the additional complexity is not significant, but forbidding the path is also fine. So long as we avoid a scenario where we can later accidentally introduce it as a common (and GC churn inducing) path. > Recycle CompressedRandomAccessReader/RandomAccessReader buffers independently > of their owners, and move them off-heap when possible > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-6726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6726 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Assignee: Branimir Lambov > Priority: Minor > Labels: performance > Fix For: 3.0 > > Attachments: cassandra-6726.patch > > > Whilst CRAR and RAR are pooled, we could and probably should pool the buffers > independently, so that they are not tied to a specific sstable. It may be > possible to move the RAR buffer off-heap, and the CRAR sometimes (e.g. Snappy > may possibly support off-heap buffers) -- This message was sent by Atlassian JIRA (v6.2#6252)