subject:"\[jira\] \[Commented\] \(HBASE\-5263\) Preserving cached data on compactions through cache\-on\-write"

[jira] [Commented] (HBASE-5263) Preserving cached data on compactions through cache-on-write

2012-02-10 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205645#comment-13205645
]

Kannan Muthukkaruppan commented on HBASE-5263:
--

Promising idea!

In terms of the implementation details, it would be nice to avoid some
pathological cases... were cold data (which was in the cache but almost on its
way out of the cache) becomes hot again. I am guessing a naive approach could
have this pitfall, but something that additionally takes into consideration the
hotness of the keys in the block and appropriately places the data in the
correct place in the blockcache LRU would not. Haven't thought through much
about the implementation details... but wanted to throw out the initial
thoughts at least.

See also related idea by Liyin here: HBASE-5263. These could be complementary
approaches.

Preserving cached data on compactions through cache-on-write

Key: HBASE-5263
URL: https://issues.apache.org/jira/browse/HBASE-5263
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor

We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the
block cache on compactions if cache-on-write is enabled. However, it would be
ideal to reduce the effect compactions have on the cached data. For every
block we are writing for a compacted file we can decide whether it needs to
be cached based on whether the original blocks containing the same data were
already in cache. More precisely, for every HFile reader in a compaction we
can maintain a boolean flag saying whether the current key-value came from a
disk IO or the block cache. In the HFile writer for the compaction's output
we can maintain a flag that is set if any of the key-values in the block
being written came from a cached block, use that flag at the end of a block
to decide whether to cache-on-write the block, and reset the flag to false on
a block boundary. If such an inclusive approach would still trash the cache,
we could restrict the total number of blocks to be cached per an output
HFile, switch to an and logic instead of or logic for deciding whether to
cache an output file block, or only cache a certain percentage of output file
blocks that contain some of the previously cached data.
Thanks to Nicolas for this elegant online algorithm idea!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5263) Preserving cached data on compactions through cache-on-write

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205648#comment-13205648
 ] 

Zhihong Yu commented on HBASE-5263:
---

@Kannan:
I think you were referring to HBASE-5369.

 Preserving cached data on compactions through cache-on-write
 

 Key: HBASE-5263
 URL: https://issues.apache.org/jira/browse/HBASE-5263
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor

 We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the 
 block cache on compactions if cache-on-write is enabled. However, it would be 
 ideal to reduce the effect compactions have on the cached data. For every 
 block we are writing for a compacted file we can decide whether it needs to 
 be cached based on whether the original blocks containing the same data were 
 already in cache. More precisely, for every HFile reader in a compaction we 
 can maintain a boolean flag saying whether the current key-value came from a 
 disk IO or the block cache. In the HFile writer for the compaction's output 
 we can maintain a flag that is set if any of the key-values in the block 
 being written came from a cached block, use that flag at the end of a block 
 to decide whether to cache-on-write the block, and reset the flag to false on 
 a block boundary. If such an inclusive approach would still trash the cache, 
 we could restrict the total number of blocks to be cached per an output 
 HFile, switch to an and logic instead of or logic for deciding whether to 
 cache an output file block, or only cache a certain percentage of output file 
 blocks that contain some of the previously cached data. 
 Thanks to Nicolas for this elegant online algorithm idea!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5263) Preserving cached data on compactions through cache-on-write

[jira] [Commented] (HBASE-5263) Preserving cached data on compactions through cache-on-write

2 matches

Site Navigation

Mail list logo

Footer information