[jira] [Commented] (HBASE-23887) New L1 cache : AdaptiveLRU
[ https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292048#comment-17292048 ] Ben Manes commented on HBASE-23887: --- Have you compared against the TinyLFU cache implementation? I'd be curious to see your results. That algorithm is also adaptive to tune itself towards LRU or MRU / LFU based on sampling workload's hit rate. There is a [simulator|http://example.com/] that you might find useful, as it supports a large number of trace formats. This is more useful than YCSB which is too synthetic to match cache access patterns, though is excellent for load testing. A benefit of the implementation is that it is concurrently O(1) rather than requiring a background O(N) scan. It looks like you compared only to LRU. What are your thoughts on TinyLFU? > New L1 cache : AdaptiveLRU > -- > > Key: HBASE-23887 > URL: https://issues.apache.org/jira/browse/HBASE-23887 > Project: HBase > Issue Type: Improvement > Components: BlockCache, Performance >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.2 > > Attachments: 1582787018434_rs_metrics.jpg, > 1582801838065_rs_metrics_new.png, BC_LongRun.png, > BlockCacheEvictionProcess.gif, BlockCacheEvictionProcess.gif, PR#1257.diff, > cmp.png, evict_BC100_vs_BC23.png, eviction_100p.png, eviction_100p.png, > eviction_100p.png, gc_100p.png, graph.png, image-2020-06-07-08-11-11-929.png, > image-2020-06-07-08-19-00-922.png, image-2020-06-07-12-07-24-903.png, > image-2020-06-07-12-07-30-307.png, image-2020-06-08-17-38-45-159.png, > image-2020-06-08-17-38-52-579.png, image-2020-06-08-18-35-48-366.png, > image-2020-06-14-20-51-11-905.png, image-2020-06-22-05-57-45-578.png, > image-2020-09-23-09-48-59-714.png, image-2020-09-23-10-06-11-189.png, > ratio.png, ratio2.png, read_requests_100pBC_vs_23pBC.png, requests_100p.png, > requests_100p.png, requests_new2_100p.png, requests_new_100p.png, scan.png, > scan_and_gets.png, scan_and_gets2.png, wave.png, ycsb_logs.zip > > > Hi! > I first time here, correct me please if something wrong. > All latest information is here: > [https://docs.google.com/document/d/1X8jVnK_3lp9ibpX6lnISf_He-6xrHZL0jQQ7hoTV0-g/edit?usp=sharing] > I want propose how to improve performance when data in HFiles much more than > BlockChache (usual story in BigData). The idea - caching only part of DATA > blocks. It is good becouse LruBlockCache starts to work and save huge amount > of GC. > Sometimes we have more data than can fit into BlockCache and it is cause a > high rate of evictions. In this case we can skip cache a block N and insted > cache the N+1th block. Anyway we would evict N block quite soon and that why > that skipping good for performance. > --- > Some information below isn't actual > --- > > > Example: > Imagine we have little cache, just can fit only 1 block and we are trying to > read 3 blocks with offsets: > 124 > 198 > 223 > Current way - we put the block 124, then put 198, evict 124, put 223, evict > 198. A lot of work (5 actions). > With the feature - last few digits evenly distributed from 0 to 99. When we > divide by modulus we got: > 124 -> 24 > 198 -> 98 > 223 -> 23 > It helps to sort them. Some part, for example below 50 (if we set > *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip > others. It means we will not try to handle the block 198 and save CPU for > other job. In the result - we put block 124, then put 223, evict 124 (3 > actions). > See the picture in attachment with test below. Requests per second is higher, > GC is lower. > > The key point of the code: > Added the parameter: *hbase.lru.cache.data.block.percent* which by default = > 100 > > But if we set it 1-99, then will work the next logic: > > > {code:java} > public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean > inMemory) { > if (cacheDataBlockPercent != 100 && buf.getBlockType().isData()) > if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) > return; > ... > // the same code as usual > } > {code} > > Other parameters help to control when this logic will be enabled. It means it > will work only while heavy reading going on. > hbase.lru.cache.heavy.eviction.count.limit - set how many times have to run > eviction process that start to avoid of putting data to BlockCache > hbase.lru.cache.heavy.eviction.bytes.size.limit - set how many bytes have to > evicted each time that start to avoid of putting data to BlockCache > By default: if 10 times (100 secunds) evicted more than 10 MB (each time) > then we start to skip 50% of data blocks. > When heavy evitions process end then new logic off and will put into > BlockCache all block
[jira] [Commented] (HBASE-17028) New 2.0 blockcache (tinylfu) doesn't have inmemory partition, etc Update doc and codebase accordingly
[ https://issues.apache.org/jira/browse/HBASE-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834105#comment-16834105 ] Ben Manes commented on HBASE-17028: --- If there is a desire to move it to a larger org, like Apache, that can be discussed. It would most likely mean that I'd step away as others take ownership. However as a library there is mostly maintenance with minor feature requests, so it could be overkill as not a lot of active development is necessary. The project was started without a lot of thought into package names, etc. that make it more professional sounding, as instead my desire was to write code and solve hard CS problems. Happy to find ways that increase the bus factor. > New 2.0 blockcache (tinylfu) doesn't have inmemory partition, etc Update doc > and codebase accordingly > - > > Key: HBASE-17028 > URL: https://issues.apache.org/jira/browse/HBASE-17028 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: stack >Assignee: Biju Nair >Priority: Major > Labels: BlockCache, TinyLFU > Attachments: HBASE-17028.patch > > > Intent is to make the parent tinylfu blockcache default on in 2.0 replacing > our old lru blockcache. This issue is about making it clear in doc and code > how the new blockcache differs from the old (You can put back the old lru > blockcache with config change). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820436#comment-16820436 ] Ben Manes commented on HBASE-15560: --- Thank you [~apurtell], [~busbey], [~stack], [~ebortnik], and [~eshcar]! > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, > run_ycsb_c.sh, run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809144#comment-16809144 ] Ben Manes commented on HBASE-15560: --- That's wonderful, thank you. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, > run_ycsb_c.sh, run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809108#comment-16809108 ] Ben Manes edited comment on HBASE-15560 at 4/3/19 6:56 PM: --- [~apurtell], I'm sorry for the hassle. In [2.7.0|https://github.com/ben-manes/caffeine/releases/tag/v2.7.0] we did migrate from JSR-305 annotations to ErrorProne's and CheckerFramework's. This was to be compatible Java 9's modules, which doesn't support split packages. I had forgotten HBase's need to handle all transitive dependencies explicitly. You could downgrade to {{2.6.2}} if this is too much trouble. was (Author: ben.manes): [~apurtell], I'm sorry for the hassle. In [2.7.0|[https://github.com/ben-manes/caffeine/releases/tag/v2.7.0]] we did migrate from JSR-305 annotations to ErrorProne's and CheckerFramework's. This was to be compatible Java 9's modules, which doesn't support split packages. I had forgotten HBase's need to handle all transitive dependencies explicitly. You could downgrade to `2.6.2` if this is too much trouble. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809108#comment-16809108 ] Ben Manes commented on HBASE-15560: --- [~apurtell], I'm sorry for the hassle. In [2.7.0|[https://github.com/ben-manes/caffeine/releases/tag/v2.7.0]] we did migrate from JSR-305 annotations to ErrorProne's and CheckerFramework's. This was to be compatible Java 9's modules, which doesn't support split packages. I had forgotten HBase's need to handle all transitive dependencies explicitly. You could downgrade to `2.6.2` if this is too much trouble. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802250#comment-16802250 ] Ben Manes commented on HBASE-15560: --- Caffeine is very much dependent on Java 8 to implement features, e.g. using the newer Map computation methods. I'm sorry if I didn't make that clear enough. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, bc.hit.count, > bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801995#comment-16801995 ] Ben Manes commented on HBASE-15560: --- Thanks [~apurtell] and [~busbey]. Can you please upgrade to 2.7.0 when updating the patch? It should be backwards compatible with bug fixes and improvements since the version in the current patch. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 1.6.0, 2.3.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, bc.hit.count, bc.miss.count, > branch-1.tinylfu.txt, gets, run_ycsb_c.sh, run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791215#comment-16791215 ] Ben Manes commented on HBASE-15560: --- Sorry to hear that. Thanks for trying to move this forward. The latest version now includes robust [adaptivity|http://highscalability.com/blog/2019/2/25/design-of-a-modern-cachepart-deux.html] to optimize against the workload. Let me know if I can be of help if this ever comes up again. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 1.6.0, 2.3.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710972#comment-16710972 ] Ben Manes commented on HBASE-15560: --- Sorry that this dropped off my radar. I can summarize a few things stated before, with minor updates. h5. Current (SLRU) * Design: ** Reads perform a ConcurrentHashMap lookup, increment a global AtomicLong counter for the access time, and marks the block type as frequent ** Writes perform a ConcurrentHashMap update and notifies a thread if the cache overflows ** Thread wakes up every 10s or when notified. Performs an O(n lg n) sort, and evicts the recency/frequency segments up to watermarks * Benefits ** Provides scan resistance and captures simple frequency workloads. Is optimal for Zipf. ** Has minimal latencies at low/modest concurrency as does very little work on requesting threads * Costs ** At high concurrency, AtomicLong would be a synchronization bottleneck (~10M op/sec if I recall correctly). This probably does not matter due to disk I/O, network I/O, etc. resulting in modest thrashing on this counter. ** No back-pressure on writes if the cache cannot evict fast enough. However, the I/O involved may make this moot. ** Expected lower hit rates in real-world traces, based on the variety of workload we have examined (non-HBase, various freq/recency mixtures) h5. Caffeine (Proposed, TinyLFU) * Design: ** Reads perform a ConcurrentHashMap lookup, hash to a ring buffer (growable array of buffers), and tries to add the item (up to 3 times, may rehash). If the ring buffer is full or a state machine flag is marked, then tryLocks to schedule a task on an executor. ** Writes perform a ConcurrentHashMap update, add to a ring buffer (blocking if full), updates a state machine flag, and tryLocks to schedule a task on an executor. ** Executor drains the ring buffers, replays the events on the eviction policy, evicts if the cache has overflowed (default: ForkJoinPool.commonPool()). * Benefits ** Allows higher degree of read concurrency by not having a single point of contention (striped ring buffers) ** Offers back-pressure on writes if the eviction thread cannot keep up (deschedules writers by them taking the global lock if the buffer is full) ** Spreads out small chunks of O(1) work ** Allows more advanced policies / data-structures (TinyLFU, Hierarchical TimerWheel) => higher hit rates & more features * Costs ** Slightly higher penalties on read / write (no free lunch) ** Is more biased towards frequency (a negative if a recency-skewed workload) h5. Synopsis The SLRU is the cheapest (latency) and most optimal (hit rate) for synthetic Zipf testing. It was designed with those considerations in mind. Any other solution will trade higher latency for better hit rates and system behavior. The question is then if the latency difference is small enough (effectively noise) and the higher hit rate improves overall performance. *This can only be definitively answered by someone willing to canary an instance in a live environment.* My belief, from analyzing hit rates and their impacts on other applications, is that there will be a benefit. h5. TinyLFU improvements We have been exploring ways to improve TinyLFU-based policies in adversarial workloads (recency-biased). In those cases work is brought in, operated on repeatedly, and then never touched again. A good example of that is a transaction log or a distributed compilation cache (with local cache). In those workloads frequency is a negative signal, as by the time the score is high enough for retention the item is no longer worth retaining. We have been working on adaptive schemes by sampling the workload and adjusting based on its characteristics ([paper|https://drive.google.com/open?id=1CT2ASkfuG9qVya9Sn8ZUCZjrFSSyjRA_]). Both a naive hill climber and a statistics-based model correct the policy to the optimal hit rate. I hope to try [adaptive moment estimation|https://arxiv.org/abs/1412.6980], an advanced hill climber, which I believe will be the most robust and inexpensive mechanism (as proven by the ML community). This work will allow the cache to offer the best hit rate regardless of workload, which no other policy has been able to do so far. h5. Next Steps I don't think there is anything meaningful that I can offer to this ticket. If this was to go in, either a leap of faith by making it an option or someone in the community would have to prove the benefit. Without an environment or trace, we can't do more than discuss minor details from synthetic testing. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943573#comment-15943573 ] Ben Manes commented on HBASE-17339: --- I think its really difficult to tell, but I'd guess that there might be a small gain. Those 30M misses sound compulsory, meaning that they would occur regardless of the cache size. Therefore we'd expect an unbounded cache to have 87% hit rate at 400M accesses or 90% at 300M. If you're observing 80%, then at best there is 10% boost. If Bélády's optimal is lower then there is even less of a difference to boost by. It could be that SLRU captures frequency well enough that both policies are equivalent. The [MultiQueue paper|https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou.pdf] argues that 2nd level cache access patterns are frequency skewed. The LruBlockCache only retains if there were multiple accesses, not the counts, and tries to evict fairly across the buckets. Since TinyLFU captures a longer tail (freq. of items outside of the cache), there is a chance that it can make a better prediction. But we wouldn't know without an access trace to simulate with. I suspect that the high hit rate means there isn't much cache pollution to lower the hit rate, so a good enough victim is chosen. At the tail most of the entries have a relatively similar frequency, too. It would be fun to find out, but you probably won't think it was worth the effort. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942296#comment-15942296 ] Ben Manes commented on HBASE-17339: --- Can you compare runs with the TinyLFU patch to observe the impact? > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760517#comment-15760517 ] Ben Manes commented on HBASE-15560: --- Sorry that I haven't had time to investigate this and wrap up the ticket. Its been the usual hectic end of year, but I will try to get to it soonish. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648257#comment-15648257 ] Ben Manes commented on HBASE-15560: --- If you can give me the steps to reproduce your observation on HBase then I'll try to debug it locally. That way I don't keep you in limbo. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643283#comment-15643283 ] Ben Manes commented on HBASE-15560: --- Are the access keys not in the data set so that it is not found? I assumed a miss means query the cache, load, store into the cache. If queried again, it should be a cache hit. If that's correct then the value has no meaning and the keys are the access distributions. Any surrogate, like a hash, will be representative. So using the same Zipf distribution should give us similar results. But I might be mistaking how the cache is used in HBase and evaluating it incorrectly in isolation > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643270#comment-15643270 ] Ben Manes commented on HBASE-15560: --- In the last run, {{Optimal}} is {{55.50%}}. Sorry for the typo. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643265#comment-15643265 ] Ben Manes commented on HBASE-15560: --- {{quote} Would you want the same dataset loaded too? {{quote}} That can't hurt, so unless its more work might as well. --- In my [simulator|https://github.com/ben-manes/caffeine/wiki/Simulator], I tried to emulate {{workload c}} using the following configuration, * maximum-size = (below) * source = "synthetic" * distribution = "zipfian" * zipfian.items = 1000 I then ran it with small caches to emulate your observation. {{LruBlockCache}} is an SLru variant, so I'm assuming it behaves similar to the theoretical version. ||Policy||max=5||max=10||max=25|| |Lru|13.10%|20.70%|35.60%| |SLru|25.90%|29.30|45.00%| |Caffeine|24.40%|32.30%|46.00%| |Optimal|35.20%|42.10%|45.50%| We see that at the smallest size, 5, Caffeine slightly under performs. However whether its slightly lower, equal, or higher varies on the run. This is due to the distribution generation and Caffeine's hashing having randomness, so across runs we see it pretty much on par. As the size increases we see them all stay pretty close. Since SLru is known to be optimal for Zipf, this at least is a good sign but does not explain your observations. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643227#comment-15643227 ] Ben Manes commented on HBASE-15560: --- Sorry for not getting to this over the weekend. A bit of a family scare which had a happy ending. An access trace is a log of the key hashes on a {{get}}. Then I can replay them offline with the simulator. The "weight" of an entry in {{workloadc}} claims to be 1kb uniformly. I wasn't sure if they were going to vary, e.g with large and small across the distribution. I do have ycsb integrated into the simulator for synthetic distributions so perhaps I can try to reproduce your observations that way. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15638231#comment-15638231 ] Ben Manes commented on HBASE-15560: --- I think my first step is to get an access trace (log the key's hash) so that I can run it through my simulator. Then I can verify that the policy behaves well when weights are ignored. The next step would be to re-run it but assigning weights. Do these vary in your test? I think YCSB had static so they were the same per entry. If not, then the argument would be a potential bug when trying to be weight-aware when promoting between the window/probation/protected regions. Right now the simulator is not weight-aware, so I'd hack a quick test based on the trace data. If all of those are good, then we need to go line-by-line to see where the two caches differ. It could be a miscalculation of an entry's weight or other subtle bugs. We'll know more if we can isolate it down to a small, repeatable test case to debug with. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637546#comment-15637546 ] Ben Manes commented on HBASE-15560: --- Thanks [~stack]! I won't be able to dig into this until the weekend. If I understand you right, the concern is that the throughput is lower for smaller caches? That would imply a lower hit rate, so even the low penalty would be observable when accumulated. Maybe there's a bug in how the new cache uses the "replace" flag or doesn't cache on a weight limit? Since the cache is weighted, it might also be that a block exceeds the size of the window cache so there are more compulsory misses. I'd really like to step through a test case, but not sure how I'd isolate and repeat your observations atm. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, gets, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621071#comment-15621071 ] Ben Manes commented on HBASE-15560: --- [~stack], will you have time to test this soon? > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571276#comment-15571276 ] Ben Manes edited comment on HBASE-15560 at 10/13/16 8:37 AM: - The update latencies, except for average, were very similar. Since presumably not all entries fit in the cache then an update of a miss would trigger an eviction. It could be impact from the O(n lg n) Lru eviction thread, GC, or more coarse grained locking. Since this was run on a macbook rather than an isolated server, it could also be a background daemon kicking in. I think the important take away is not the absolute but that they are in the same ballpark. There isn't an outlier indicating the new implementation has a major degredation, e.g. due to locking or hit rates. [~eshcar]: To more directly answer your question, the update cost is [very close|https://github.com/ben-manes/caffeine/wiki/Benchmarks#write-100-1] to ConcurrentHashMap. This is because the locking overhead dominates, leaving enough spare cpu cycles to mask any other penalties being processed asynchronously. [~anoopsamjohn] In my original post the results mentioned were probably with no evictions. Because LruBlockCache penalizes only the eviction, whereas Caffeine spreads it out, one would expect Lru to have an advantage. But by Amdahl's law the potential speedup is very tiny, so it falls into the noise. A fresh test would be good. was (Author: ben.manes): The update latencies, except for average, were very similar. Since presumably not all entries fit in the cache then an update of a miss would trigger an eviction. It could be impact from the O(n lg n) Lru eviction thread, GC, or more coarse grained locking. Since this was run on a macbook rather than an isolated server, it could also be a background daemon kicking in. I think the important take away is not the absolute but that they are in the same ballpark. There isn't an outlier indicating the new implementation has a major degredation, e.g. due to locking or hit rates. [~eshcar]: To more directly answer your question, the update cost is [very close|https://github.com/ben-manes/caffeine/wiki/Benchmarks#write-100-1] to ConcurrentHashMap. This is because the locking overhead dominates, leaving enough spare cpu cycles to mask any other penalties being processed asynchronously. [~anoopamz] In my original post the results mentioned were probably with no evictions. Because LruBlockCache penalizes only the eviction, whereas Caffeine spreads it out, one would expect Lru to have an advantage. But by Amdahl's law the potential speedup is very tiny, so it falls into the noise. A fresh test would be good. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571276#comment-15571276 ] Ben Manes commented on HBASE-15560: --- The update latencies, except for average, were very similar. Since presumably not all entries fit in the cache then an update of a miss would trigger an eviction. It could be impact from the O(n lg n) Lru eviction thread, GC, or more coarse grained locking. Since this was run on a macbook rather than an isolated server, it could also be a background daemon kicking in. I think the important take away is not the absolute but that they are in the same ballpark. There isn't an outlier indicating the new implementation has a major degredation, e.g. due to locking or hit rates. [~eshcar]: To more directly answer your question, the update cost is [very close|https://github.com/ben-manes/caffeine/wiki/Benchmarks#write-100-1] to ConcurrentHashMap. This is because the locking overhead dominates, leaving enough spare cpu cycles to mask any other penalties being processed asynchronously. [~anoopamz] In my original post the results mentioned were probably with no evictions. Because LruBlockCache penalizes only the eviction, whereas Caffeine spreads it out, one would expect Lru to have an advantage. But by Amdahl's law the potential speedup is very tiny, so it falls into the noise. A fresh test would be good. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570364#comment-15570364 ] Ben Manes commented on HBASE-15560: --- YCSB workload B states that each record is 1kb, so that is about 100mb (97mib). That's probably introduces some misses due to Java object overhead. Since LruBlockCache uses a high watermark and evicts to a low watermark, it could be aggressively under utilizing the capacity. So a higher hit rate might be understandable, in addition to the workload pattern's characteristics. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15559308#comment-15559308 ] Ben Manes commented on HBASE-15560: --- 1. I performed "git revert b952e64" 2. Configured YCSB workload B with the settings, {code} recordcount=10 operationcount=100 {code} 3. Started hbase server with the {{hbase-site.xml}} configuration, {code:xml} hfile.block.cache.size 0.1f hbase.regionserver.global.memstore.size 0.7f hfile.block.cache.policy Lru {code} 4. [Loaded and ran ycsb|https://github.com/brianfrankcooper/YCSB/tree/master/hbase098] with Lru and TinyLfu. h4. LruBlockCache {code} totalSize=96.67 MB, freeSize=2.32 MB, max=98.99 MB, blockCount=1793, accesses=4766387, hits=4081322, hitRatio=85.63%, cachingAccesses=4764133, cachingHits=4081322, cachingHitsRatio=85.67%, evictions=10402, evicted=681017, evictedPerRun=65.46981349740435 {code} {code} [OVERALL], RunTime(ms), 189753.0 [OVERALL], Throughput(ops/sec), 5270.008906315052 [TOTAL_GCS_PS_Scavenge], Count, 717.0 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 730.0 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.38471065016099876 [TOTAL_GCS_PS_MarkSweep], Count, 0.0 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0.0 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0 [TOTAL_GCs], Count, 717.0 [TOTAL_GC_TIME], Time(ms), 730.0 [TOTAL_GC_TIME_%], Time(%), 0.38471065016099876 [READ], Operations, 950125.0 [READ], AverageLatency(us), 152.8599626364952 [READ], MinLatency(us), 76.0 [READ], MaxLatency(us), 60959.0 [READ], 95thPercentileLatency(us), 215.0 [READ], 99thPercentileLatency(us), 253.0 [READ], Return=OK, 950125 [CLEANUP], Operations, 2.0 [CLEANUP], AverageLatency(us), 72164.0 [CLEANUP], MinLatency(us), 8.0 [CLEANUP], MaxLatency(us), 144383.0 [CLEANUP], 95thPercentileLatency(us), 144383.0 [CLEANUP], 99thPercentileLatency(us), 144383.0 [UPDATE], Operations, 49875.0 [UPDATE], AverageLatency(us), 215.8185664160401 [UPDATE], MinLatency(us), 125.0 [UPDATE], MaxLatency(us), 36159.0 [UPDATE], 95thPercentileLatency(us), 294.0 [UPDATE], 99thPercentileLatency(us), 484.0 [UPDATE], Return=OK, 49875 {code} h4. TinyLfuBlockCache {code} totalSize=98.98 MB, freeSize=4.07 KB, max=98.99 MB, blockCount=2112, accesses=4170109, hits=3794003, hitRatio=90.98%, cachingAccesses=4170112, cachingHits=3794005, cachingHitsRatio=90.98%, evictions=373994, evicted=37399 {code} {code} [OVERALL], RunTime(ms), 118390.0 [OVERALL], Throughput(ops/sec), 8446.659346228567 [TOTAL_GCS_PS_Scavenge], Count, 664.0 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 714.0 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.6030914773207197 [TOTAL_GCS_PS_MarkSweep], Count, 0.0 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0.0 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0 [TOTAL_GCs], Count, 664.0 [TOTAL_GC_TIME], Time(ms), 714.0 [TOTAL_GC_TIME_%], Time(%), 0.6030914773207197 [READ], Operations, 949956.0 [READ], AverageLatency(us), 112.233432916893 [READ], MinLatency(us), 75.0 [READ], MaxLatency(us), 61151.0 [READ], 95thPercentileLatency(us), 165.0 [READ], 99thPercentileLatency(us), 204.0 [READ], Return=OK, 949956 [CLEANUP], Operations, 2.0 [CLEANUP], AverageLatency(us), 59732.0 [CLEANUP], MinLatency(us), 8.0 [CLEANUP], MaxLatency(us), 119487.0 [CLEANUP], 95thPercentileLatency(us), 119487.0 [CLEANUP], 99thPercentileLatency(us), 119487.0 [UPDATE], Operations, 50044.0 [UPDATE], AverageLatency(us), 188.9981216529454 [UPDATE], MinLatency(us), 122.0 [UPDATE], MaxLatency(us), 36671.0 [UPDATE], 95thPercentileLatency(us), 257.0 [UPDATE], 99thPercentileLatency(us), 489.0 [UPDATE], Return=OK, 50044 {code} > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the count
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549840#comment-15549840 ] Ben Manes commented on HBASE-15560: --- I can take another stab at (1) and work with [~eshcar] to ensure its validity. I should have time over this upcoming weekend. Like the paper's simulations, I can also run an anonymized trace to calculate the hit/miss curves of the two policies. The trace file would be a sequence of cache key hashes on a cache.get() call. While not taking into account entry sizes, it should tell us if the policy improves the efficiency in a realistic workload. That lends itself to being able to estimate the new response times, assuming all else is equal. Would an anonymized access trace be easy to acquire and share? > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546678#comment-15546678 ] Ben Manes commented on HBASE-15560: --- I know the frustration and agree that feature flags should have a clear deprecation cycle. You might want to consider a special deprecation annotation indicating the release (or date) that a flag should be removed by. A custom checkstyle / pmd rule would be easy to write and allow for validating in the build. If the flag is rot due to lack of testing then it pushes for a decision to be made. In general I would have performance tested this myself, but due to not being an HBase user that would be meaningless. Its been fun to provide the patch and work through the process, as requested by [~ebortnik], but I do need help on that last mile. So I am looking forward to digging into the results when we have some hard numbers. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546599#comment-15546599 ] Ben Manes commented on HBASE-15560: --- This flag was to simplify evaluation. It can either be retained to allow a gradual rollout, e.g. feature flag in case users discover concerns, or removed. For providing a patch it seemed most respectful, on my part, to not try to force a switch. I'm fine removing the configuration once the team is confident in adopting the new policy. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546102#comment-15546102 ] Ben Manes commented on HBASE-15560: --- As noted previously, please try with a real workload rather than the a synthetic. When [~eshcar] and I tried, we found that Lru was already optimized for YCSB making the difference negligible. Given the paper's real-world traces and Druid's experiences ([1|https://github.com/druid-io/druid/pull/3028], [2|https://groups.google.com/d/msg/druid-user/J-YMqt8wc5s/tv2VXa6pBwAJ]), TinyLFU appears promising. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Attachment: HBASE-15560.patch > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Attachment: HBASE-15560.patch > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15541173#comment-15541173 ] Ben Manes commented on HBASE-15560: --- We can add it now if that's desired, or defer on whether to add or remove depending on the team's consensus. Neither the Lru nor TinyLfu caches use the field in their policies and I don't see how it provides meaningful information to users. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530762#comment-15530762 ] Ben Manes commented on HBASE-15560: --- [~busbey], [~eshcar]: The remaining issues appear unrelated to my changes. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528256#comment-15528256 ] Ben Manes commented on HBASE-15560: --- [~busbey], I can't seem to trigger a new build. Can you please take a look? > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Status: Patch Available (was: Open) > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Status: Open (was: Patch Available) > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Attachment: HBASE-15560.patch > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Status: Patch Available (was: Open) > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Status: Open (was: Patch Available) Cancelling and re-submitting patch, hoping that Hadoop QA will pick it up this time. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Attachment: HBASE-15560.patch Fixed JavaDoc. I am unable to reproduce the test failure locally when I run, $ mvn test -Dtest=org.apache.hadoop.hbase.filter.TestFilter > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525151#comment-15525151 ] Ben Manes commented on HBASE-15560: --- Thanks [~busbey]! I made the update and will fix the definition in my build. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Attachment: HBASE-15560.patch > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524953#comment-15524953 ] Ben Manes commented on HBASE-15560: --- [~eshcar] all of the issues don't appear to be related to my changes. Do you know if there is anything I can do about it? > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Attachment: HBASE-15560.patch Attached patch from review board > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522658#comment-15522658 ] Ben Manes commented on HBASE-15560: --- The error appears to be a pre-commit check that fails for CaffeinatedBlockCache formatting. However, I renamed that to TinyLfuBlockCache, so either the report is old or the build is retaining stale files. Locally a compile is successful. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514311#comment-15514311 ] Ben Manes commented on HBASE-15560: --- I think that I addressed all of the comments, except where noted as unclear. Please take another look when you have a chance. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502152#comment-15502152 ] Ben Manes commented on HBASE-15560: --- Thanks for the reviews so far. When I originally wrote the patch I did enough for evaluation and tests seemed premature. I'll see what can be ported over from TestLruBlockCache, though I'd expect it to be minimal. I don't think it would be appropriate to write tests in HBase that make too many assumptions about the policy's behavior as that leads to relying on implementation details of an external library. BlockPriority is coupled to LruBlockCache and arguably it is a leaky abstraction by exposing it. I'd argue that the `BlockCache` should be redesigned to avoid [dogpiling|https://en.wikipedia.org/wiki/Cache_stampede] by computing through the cache and minimize implementation assumptions. `MemcachedBlockCache` silently ignores the flag, throws exceptions, and returns dummy values for methods that make no sense for it. If this is merged in, I'd like to see a similar ticket like [ACCUMULO-4466|https://issues.apache.org/jira/browse] to evaluate whether to make TinyLFU the default. The Lru implementation could then be removed. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494432#comment-15494432 ] Ben Manes commented on HBASE-15560: --- Patch [submitted|https://reviews.apache.org/r/51932/] to review board. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes > Attachments: HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Affects Version/s: 2.0.0 Status: Patch Available (was: Open) > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes > Attachments: HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484585#comment-15484585 ] Ben Manes commented on HBASE-15560: --- Thanks [~Apache9]. The rebase had no conflicts and I didn't notice any major changes in the caching area since the previous patch. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Reporter: Ben Manes > Attachments: HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Attachment: HBASE-15560.patch Rebased and upgraded to Caffeine 2.3.3 > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Reporter: Ben Manes > Attachments: HBASE-15560.patch, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382616#comment-15382616 ] Ben Manes commented on HBASE-15560: --- Thanks! I'll track that one then. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Reporter: Ben Manes > Attachments: tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381075#comment-15381075 ] Ben Manes commented on HBASE-15560: --- Can we merge this in? > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Reporter: Ben Manes > Attachments: tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275490#comment-15275490 ] Ben Manes commented on HBASE-15560: --- [Druid|http://druid.io/] was recently struck by [JDK-8078490 - Missed submissions in ForkJoinPool|https://bugs.openjdk.java.net/browse/JDK-8078490]. This caused the cache to stop evicting because the asynchronous task was never run due to a race in the executor. The result was either a memory leak (2.2.6) or halting due to the back pressure (2.3.0). The solution if you are running on an older JDK8 release is to use a different executor (e.g. same-thread). This critical bug effected 8u40 - 8u60 (current is 8u92) and broke any FJP usage, such as *CompletableFuture*. I confirmed this fix with Doug Lea when investigating. In other news, Cassandra recently adopted Caffeine for its [page cache](https://issues.apache.org/jira/browse/CASSANDRA-5863). Their analysis of the performance, hit rate, and scan tolerance were positive. I'm hoping to integrate the eviction policy into their off-heap cache ([OHC|https://github.com/snazy/ohc]), which uses LRU. That cache is extracted into a library so contributing there might make it easy for you to benefit as well. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Reporter: Ben Manes > Attachments: tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265087#comment-15265087 ] Ben Manes commented on HBASE-15560: --- Please let me know if there is anything I can do on my end to help. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Reporter: Ben Manes > Attachments: tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217054#comment-15217054 ] Ben Manes commented on HBASE-15560: --- Hi Michael, I agree that offer a choice is an unnecessary confusion. It made it easier to test via a flag and, if retained, could be used only transitionally to give users a release cycle to adjust. For the 100% case LruBlockCache's work is a hash table read and increment of a global counter (fetch-and-add instruction). Caffeine's is the hash table read, select a ring buffer, CAS append into it, and schedule an async drain if full on ForkJoinPool#commonPool() (which might cause a few more context switches). The differences were in a margin of error in LruBlockCache's favor, but a tiny penalty seems understandable. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Reporter: Ben Manes > Attachments: tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Description: LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an O( n ) background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival (candidate) to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had [near optimal hit rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. Concurrency is achieved by buffering and replaying the operations, similar to a write-ahead log. A read is recorded into a striped ring buffer and writes to a queue. The operations are applied in batches under a try-lock by an asynchronous thread, thereby track the usage pattern without incurring high latencies ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). In YCSB benchmarks the results were inconclusive. For a large cache (99% hit rates) the two caches have near identical throughput and latencies with LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 1-4% hit rate improvement and therefore lower latencies. The lack luster result is because a synthetic Zipfian distribution is used, which SLRU performs optimally. In a more varied, real-world workload we'd expect to see improvements by being able to make smarter predictions. The provided patch implements BlockCache using the [Caffeine|https://github.com/ben-manes/caffeine] caching library (see HighScalability [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). Edward Bortnikov and Eshcar Hillel have graciously provided guidance for evaluating this patch ([github branch|https://github.com/ben-manes/hbase/tree/tinylfu]). was: LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an O( n ) background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival (candidate) to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had [near optimal hit rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. Concurrency is achieved by buffering and replaying the operations, similar to a write-ahead log. A read is recorded into a striped ring buffer and writes to a queue. The operations are applied in batches under a try-lock by an asynchronous thread, thereby track the usage pattern without incurring high latencies ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). In YCSB benchmarks the results were inconclusive. For a large cache (99% hit rates) the two caches have near identical throughput and latencies with LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 1-4% hit rate improvement and therefore lower latencies. The lack luster result is because a synthetic Zipfian distribution is used, which SLRU performs optimally. In a more varied, real-world workload we'd expect to see improvements by being able to make smarter predictions. The provided patch implements BlockC
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Description: LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an O( n ) background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival (candidate) to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had [near optimal hit rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. Concurrency is achieved by buffering and replaying the operations, similar to a write-ahead log. A read is recorded into a striped ring buffer and writes to a queue. The operations are applied in batches under a try-lock by an asynchronous thread, thereby track the usage pattern without incurring high latencies ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). In YCSB benchmarks the results were inconclusive. For a large cache (99% hit rates) the two caches have near identical throughput and latencies with LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 1-4% hit rate improvement and therefore lower latencies. The lack luster result is because a synthetic Zipfian distribution is used, which SLRU performs optimally. In a more varied, real-world workload we'd expect to see improvements by being able to make smarter predictions. The provided patch implements BlockCache using the [Caffeine|https://github.com/ben-manes/caffeine] caching library (see HighScalability [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). Edward Bortnikov and Eshcar Hillel have graciously provided guidance for evaluating this patch ([github branch|https://github.com/ben-manes/hbase/tree/tinylfu]. was: LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an {noformat}O(n){noformat} background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival (candidate) to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had [near optimal hit rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. Concurrency is achieved by buffering and replaying the operations, similar to a write-ahead log. A read is recorded into a striped ring buffer and writes to a queue. The operations are applied in batches under a try-lock by an asynchronous thread, thereby track the usage pattern without incurring high latencies ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). In YCSB benchmarks the results were inconclusive. For a large cache (99% hit rates) the two caches have near identical throughput and latencies with LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 1-4% hit rate improvement and therefore lower latencies. The lack luster result is because a synthetic Zipfian distribution is used, which SLRU performs optimally. In a more varied, real-world workload we'd expect to see improvements by being able to make smarter predictions. The provided patch i
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Description: LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an {noformat}O(n){noformat} background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival (candidate) to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had [near optimal hit rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. Concurrency is achieved by buffering and replaying the operations, similar to a write-ahead log. A read is recorded into a striped ring buffer and writes to a queue. The operations are applied in batches under a try-lock by an asynchronous thread, thereby track the usage pattern without incurring high latencies ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). In YCSB benchmarks the results were inconclusive. For a large cache (99% hit rates) the two caches have near identical throughput and latencies with LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 1-4% hit rate improvement and therefore lower latencies. The lack luster result is because a synthetic Zipfian distribution is used, which SLRU performs optimally. In a more varied, real-world workload we'd expect to see improvements by being able to make smarter predictions. The provided patch implements BlockCache using the [Caffeine|https://github.com/ben-manes/caffeine] caching library (see HighScalability [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). Edward Bortnikov and Eshcar Hillel have graciously provided guidance for evaluating this patch ([github branch|https://github.com/ben-manes/hbase/tree/tinylfu]. was: LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an O(n) background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. W-TinyLFU ([research paper|W-TinyLFU: http://arxiv.org/pdf/1512.00727.pdf]) records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival (candidate) to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had [near optimal hit rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. Concurrency is achieved by buffering and replaying the operations, similar to a write-ahead log. A read is recorded into a striped ring buffer and writes to a queue. The operations are applied in batches under a try-lock by an asynchronous thread, thereby track the usage pattern without incurring high latencies ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). In YCSB benchmarks the results were inconclusive. For a large cache (99% hit rates) the two caches have near identical throughput and latencies with LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 1-4% hit rate improvement and therefore lower latencies. The lack luster result is because a synthetic Zipfian distribution is used, which SLRU performs optimally. In a more varied, real-world workload we'd expect to see improvements by being able to make smarter predictions. The provid
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated HBASE-15560: -- Attachment: tinylfu.patch > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Reporter: Ben Manes > Attachments: tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O(n) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|W-TinyLFU: http://arxiv.org/pdf/1512.00727.pdf]) > records the frequency in a counting sketch, ages periodically by halving the > counters, and orders entries by SLRU. An entry is discarded by comparing the > frequency of the new arrival (candidate) to the SLRU's victim, and keeping > the one with the highest frequency. This allows the operations to be > performed in O(1) time and, though the use of a compact sketch, a much larger > history is retained beyond the current working set. In a variety of real > world traces the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A > read is recorded into a striped ring buffer and writes to a queue. The > operations are applied in > batches under a try-lock by an asynchronous thread, thereby track the usage > pattern without incurring high latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15560) TinyLFU-based BlockCache
Ben Manes created HBASE-15560: - Summary: TinyLFU-based BlockCache Key: HBASE-15560 URL: https://issues.apache.org/jira/browse/HBASE-15560 Project: HBase Issue Type: Improvement Components: BlockCache Reporter: Ben Manes LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an O(n) background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. W-TinyLFU ([research paper|W-TinyLFU: http://arxiv.org/pdf/1512.00727.pdf]) records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival (candidate) to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had [near optimal hit rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. Concurrency is achieved by buffering and replaying the operations, similar to a write-ahead log. A read is recorded into a striped ring buffer and writes to a queue. The operations are applied in batches under a try-lock by an asynchronous thread, thereby track the usage pattern without incurring high latencies ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). In YCSB benchmarks the results were inconclusive. For a large cache (99% hit rates) the two caches have near identical throughput and latencies with LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 1-4% hit rate improvement and therefore lower latencies. The lack luster result is because a synthetic Zipfian distribution is used, which SLRU performs optimally. In a more varied, real-world workload we'd expect to see improvements by being able to make smarter predictions. The provided patch implements BlockCache using the [Caffeine|https://github.com/ben-manes/caffeine] caching library (see HighScalability [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). Edward Bortnikov and Eshcar Hillel have graciously provided guidance for evaluating this patch ([github branch|https://github.com/ben-manes/hbase/tree/tinylfu]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)