[ https://issues.apache.org/jira/browse/HBASE-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205667#comment-13205667 ]
Kannan Muthukkaruppan commented on HBASE-5263: ---------------------------------------------- Zhihong: Yes! Fixed it in place. I had a recursive reference going there... :) > Preserving cached data on compactions through cache-on-write > ------------------------------------------------------------ > > Key: HBASE-5263 > URL: https://issues.apache.org/jira/browse/HBASE-5263 > Project: HBase > Issue Type: Improvement > Reporter: Mikhail Bautin > Assignee: Mikhail Bautin > Priority: Minor > > We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the > block cache on compactions if cache-on-write is enabled. However, it would be > ideal to reduce the effect compactions have on the cached data. For every > block we are writing for a compacted file we can decide whether it needs to > be cached based on whether the original blocks containing the same data were > already in cache. More precisely, for every HFile reader in a compaction we > can maintain a boolean flag saying whether the current key-value came from a > disk IO or the block cache. In the HFile writer for the compaction's output > we can maintain a flag that is set if any of the key-values in the block > being written came from a cached block, use that flag at the end of a block > to decide whether to cache-on-write the block, and reset the flag to false on > a block boundary. If such an inclusive approach would still trash the cache, > we could restrict the total number of blocks to be cached per an output > HFile, switch to an "and" logic instead of "or" logic for deciding whether to > cache an output file block, or only cache a certain percentage of output file > blocks that contain some of the previously cached data. > Thanks to Nicolas for this elegant online algorithm idea! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira