[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257210#comment-15257210
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:
--------------------------------------------

bq. The key itself is a small and fixed part of the overhead (all objects it 
references are already found elsewhere); there are also on-heap support 
structures within the implementing cache which are bigger. Though that's not 
trivial, we could also account for those, but I don't know how that helps cache 
management and sizing for the user.

The problem I see with this is the same for any other data structure on JVM - 
if we don't account for additional overhead at some point it will blow up and 
it won't be pretty, especially if we don't account for internal size of the 
data structure which holds the cache and other overhead like keys and their 
containers, can we claim with certainty that at some capacity it's actual size 
in memory is not going to be 2x or 3x? If yes then let's leave it like it is 
today otherwise we need to do something about it right away.

bq. I'm sorry, I do not understand the problem – the code only relies on the 
position of the buffer and since buffer is cleared before the read, an end of 
stream (and only that) will result in an empty buffer; both read() and 
readByte() interpret this correctly.

Sorry, what I mean - we might want to be more conservative and indicate early 
that requested length is bigger than number of available bytes, we already had 
couple of bugs which where hard to debug because EOFException doesn't provide 
any useful information...

bq. I had added a return of the passed buffer for convenience but it also adds 
possibility for error – changed the return of the method to void. On the other 
point, it does not make sense for the callee to return an (aligned) offset as 
the caller may need to have a better control over positioning before allocating 
the buffer – caching rebufferers, specifically, do.

and 

bq. This wasn't the case even before this ticket. When RAR requests rebuffering 
at a certain position, it can either have its buffer filled (direct case), or 
receive a view of a shared buffer that holds the data (mem-mapped case). There 
was a lot of clumsiness in RAR to handle the question of which of these is the 
case, does it own its buffer, should it be allocated or freed. The patch 
addresses this clumsiness as well as allowing for another type of advantageous 
buffer management.

I understand, I actually started with proposition to return "void" but I 
changed it later on because I saw a possibility to unify bufferless with other 
implementations because essentially the question is - where original data comes 
from - directly from the channel or already mmap'ed buffer, so maybe if we had 
a common interface to both of the cases and used it as a backend for rebufferer 
it would simplify things instead of putting that logic into rebufferer itself? 
Just something to think about...

bq. Interesting. Another possibility mentioned before is to implement 
compression in such a way that the compressed size matches the chunk size. Both 
are orthogonal and outside the scope of this ticket – lets open a new issue for 
that?

I'm fine if we make it a separate ticket but I think we will have to tackle it 
first since it would directly affect rebufferer/cache logic.


> In process (uncompressed) page cache
> ------------------------------------
>
>                 Key: CASSANDRA-5863
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: T Jake Luciani
>            Assignee: Branimir Lambov
>              Labels: performance
>             Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to