[ 
https://issues.apache.org/jira/browse/CASSANDRA-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101998#comment-14101998
 ] 

Benedict commented on CASSANDRA-6726:
-------------------------------------

bq. Buffer sizes do vary

Other than the compressed input size, where do they vary? (I might have missed 
it). The input size *should* be bound by the chunk size; if it's larger we have 
a negative compression ratio, which should be rare, in which case I am happy 
either _not_ caching them, or with bumping the size of all buffers we allocate 
to accommodate the slight extra room (should be very minimal).

bq. so do the required buffer types

I'm not sure this dramatically affects the code, but also with CASSANDRA-7039 
this should no longer be the case, so I'd prefer not to add too much complexity 
that will be soon made redundant.

bq. I tried to define a flexible way to solve this which can be reused in other 
parts of Cassandra that require buffers. 

I appreciate this, however I do not think there are other places where it will 
be meaningfully useful, now or in the immediate future, and so I'd prefer not 
to pollute the namespace of our utilities (which is already unpleasant enough 
wrt pooling) especially as there is increased complexity in maintaining this 
more general pool. I'm not completely against the pooling you have defined, I 
just don't see it as better than the simpler approach. If you can make a 
compelling case, please do so.

bq. My usual workflow is to first create a patch ... then .... a lot of 
measurements and comparisons in the process.

It's good to state when you post a patch that you don't consider it complete 
(and perhaps outline things you intend to cover still) so that review can be 
better targeted; otherwise it will be assumed the patch is approximately final. 
Don't go too overboard with micro-measurement. The particular impact here is 
likely to be hard to measure in benchmarks, especially with confidence the 
results are generalizable, but it's a reasonably obvious win for systems 
experiencing GC pressure and we have plenty of other things to address. At the 
moment there are a lot of major performance improvements to make with big 
impacts, so I prefer not to get bogged down in careful measurement unless it's 
clear the codepath can have significant performance impact (as in, at least 
measurable single digit percentages in common workloads)

bq. For ByteArrayCompressor.uncompress...

Either is fine - the additional complexity is not significant, but forbidding 
the path is also fine. So long as we avoid a scenario where we can later 
accidentally introduce it as a common (and GC churn inducing) path.

> Recycle CompressedRandomAccessReader/RandomAccessReader buffers independently 
> of their owners, and move them off-heap when possible
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6726
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: cassandra-6726.patch
>
>
> Whilst CRAR and RAR are pooled, we could and probably should pool the buffers 
> independently, so that they are not tied to a specific sstable. It may be 
> possible to move the RAR buffer off-heap, and the CRAR sometimes (e.g. Snappy 
> may possibly support off-heap buffers)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to