[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255531#comment-15255531
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:
--------------------------------------------

*General Notes:*
* I think we should try to come up with better names for rebufferer classes, so 
their function is more obvious... Maybe something like {File, FS, Data}Reader 
with load or read method instead of rebuffer.
* Maybe we should try to implement the idea I mentioned when I originally 
worked on the compression support about 5 years go in CASSANDRA-47 which wasn't 
worth it at that time but might be more relevant now :), it might make caching 
of compressed files a lot simpler, here it is - make compressors always return 
a size of the buffer aligned on PAGE_SIZE (default 512 bytes) and leave "holes" 
in the file by seeking to the next alignment, over the years I've double 
checked with multiple familiar people that most of the modern/popular 
filesystems (NTFS, ext*, xfs etc.) already have support for that, and are not 
going to allocate unused blocks as well as place all of the allocated ones 
close together. This is going to help here in following ways:
    -- caches don't have to worry about size/alignment of the 
compressed/decompressed chunks;
    -- compressed reader is very simple since it has to just align requested 
offsets (allows to remove CompressionInfo segment);
    -- there is no need to keep uncompressed size information around since data 
size is the same for compressed/umcompressed cases (everything is already 
aligned);
    -- CRAR/CSF and all of the supporting classes are not longer required;
    -- and more e.g. we could potentially just re-map compressed pages into 
decompressed on the fly and cache doesn't even have to know.

*Code Notes:*
* why does ReaderCache only account for buffer side instead of key + buffer 
size in weighter, this means cache size is underestimated?
* couple instanceof checks kind of signal that we want to re-evaluate rebuffer 
class hierarchy.
* ReaderCache#invalidateFile is not very efficient O\(n\) from the size of the 
cache, which is used by cleanup of the mmap'ed files, which might be a problem.
* (potential safety improvement) ChecksummedDataInput#readBuffer - should do 
buffer vs. read length validation for -1 situation because otherwise this might 
cause corruption
* HintsReader - adds unused import and commented out seek which should be 
removed
* since CRAR no longer extends RAR header comment about that should be removed 
as well, as the matter of fact since CRAR now is just
  a container of rebuffer implementations, maybe it makes sense to remove it 
all together and just use RAR from CompressedSegmentedFile with different 
"rebuffer" backends,
  so in other words put all of the rebuffers from CRAR to CSF?
* BufferlessRebufferer#rebuffer(long position, ByteBuffer buffer) at least 
requires better clarification of the parameters and return value,
  because in e.g. CRAR it's not exactly clear why would uncompressed buffer be 
provided to also be returned,
  why can't argument just be filled and return type changed to be long which is 
an (aligned) offset of the file?
  Which allows to remove Rebufferer#rebuffer(long) method and always let 
callers provide the buffer to fill,
  since I only see it used in RAR#reBufferAt and LimitingRebufferer where both 
could be made to hold the actual buffer.
  Such allows to converge everything under BufferlessRebufferer and have 
ReaderCache and RAR to handle buffers and divides
  reponsibilities of buffer management and actual block storage handling 
between RAR and Rebufferer.


> In process (uncompressed) page cache
> ------------------------------------
>
>                 Key: CASSANDRA-5863
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: T Jake Luciani
>            Assignee: Branimir Lambov
>              Labels: performance
>             Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to