[ https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257210#comment-15257210 ]
Pavel Yaskevich commented on CASSANDRA-5863: -------------------------------------------- bq. The key itself is a small and fixed part of the overhead (all objects it references are already found elsewhere); there are also on-heap support structures within the implementing cache which are bigger. Though that's not trivial, we could also account for those, but I don't know how that helps cache management and sizing for the user. The problem I see with this is the same for any other data structure on JVM - if we don't account for additional overhead at some point it will blow up and it won't be pretty, especially if we don't account for internal size of the data structure which holds the cache and other overhead like keys and their containers, can we claim with certainty that at some capacity it's actual size in memory is not going to be 2x or 3x? If yes then let's leave it like it is today otherwise we need to do something about it right away. bq. I'm sorry, I do not understand the problem – the code only relies on the position of the buffer and since buffer is cleared before the read, an end of stream (and only that) will result in an empty buffer; both read() and readByte() interpret this correctly. Sorry, what I mean - we might want to be more conservative and indicate early that requested length is bigger than number of available bytes, we already had couple of bugs which where hard to debug because EOFException doesn't provide any useful information... bq. I had added a return of the passed buffer for convenience but it also adds possibility for error – changed the return of the method to void. On the other point, it does not make sense for the callee to return an (aligned) offset as the caller may need to have a better control over positioning before allocating the buffer – caching rebufferers, specifically, do. and bq. This wasn't the case even before this ticket. When RAR requests rebuffering at a certain position, it can either have its buffer filled (direct case), or receive a view of a shared buffer that holds the data (mem-mapped case). There was a lot of clumsiness in RAR to handle the question of which of these is the case, does it own its buffer, should it be allocated or freed. The patch addresses this clumsiness as well as allowing for another type of advantageous buffer management. I understand, I actually started with proposition to return "void" but I changed it later on because I saw a possibility to unify bufferless with other implementations because essentially the question is - where original data comes from - directly from the channel or already mmap'ed buffer, so maybe if we had a common interface to both of the cases and used it as a backend for rebufferer it would simplify things instead of putting that logic into rebufferer itself? Just something to think about... bq. Interesting. Another possibility mentioned before is to implement compression in such a way that the compressed size matches the chunk size. Both are orthogonal and outside the scope of this ticket – lets open a new issue for that? I'm fine if we make it a separate ticket but I think we will have to tackle it first since it would directly affect rebufferer/cache logic. > In process (uncompressed) page cache > ------------------------------------ > > Key: CASSANDRA-5863 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5863 > Project: Cassandra > Issue Type: Sub-task > Reporter: T Jake Luciani > Assignee: Branimir Lambov > Labels: performance > Fix For: 3.x > > > Currently, for every read, the CRAR reads each compressed chunk into a > byte[], sends it to ICompressor, gets back another byte[] and verifies a > checksum. > This process is where the majority of time is spent in a read request. > Before compression, we would have zero-copy of data and could respond > directly from the page-cache. > It would be useful to have some kind of Chunk cache that could speed up this > process for hot data, possibly off heap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)