keith-turner opened a new issue, #6023:
URL: https://github.com/apache/accumulo/issues/6023

   **Is your feature request related to a problem? Please describe.**
   
   Accumulo caches uncompressed rfile blocks.  This can lead to data in cache 
taking up much more space than data on disk.
   
   **Describe the solution you'd like**
   
   Optionally allow storing compressed rfile blocks in the cache.
   
   Storing compressed rfile blocks in the cache would likely lead to more CPU 
usage at query and would likely disable feature that allow random lookups in 
cached blocks.
   
   **Describe alternatives you've considered**
   
   This could potentially be implemented without any changes to Accumulo.  The 
only drawback to that is we would always uncompress the data when reading from 
disk and then recompress it when storing in the cache. This could be expensive. 
To allow taking compressed rfile blocks directly from disk and storing them in 
cache would require a change in Accumulo because it always uncompresses before 
caching.
   
   Maybe its best to leave the on heap primary cache as uncompressed and have a 
secondary cache (possibly off heap) that compresses blocks.  This could likely 
be done w/o any changes to Accumulo as it could be done completely by plugins.  
This may look like the following.
   
    1. Read rfile block and uncompress it.
    2. Stored uncompressed rfile block in primary cache.
    3. When primary cache offloads a block to secondary cache its compresses.
    4. When primary cache loads a block from secondary cache its uncompresses.
   
   Maybe that could provide good CPU and memory utilization.
   
   **Additional context**
   
   Noticed the difference in compressed vs uncompressed cache data when working 
on #6010.  That change emits data about compressed and uncompressed data read 
per scan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to