[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549086#comment-15549086 ]
Eshcar Hillel commented on HBASE-15560: --------------------------------------- Hi, can I add my view of this issue ? I think the gap between what is required by the community and what can be provided is not that big. 1) [~ben.manes] you already have the results of the YCSB benchmark you ran with the initial patch. Can you rerun these tests with the latest patch and publish the results in some form. I suggest you publish the exact settings you used plus raw results (rather than lift). You can either present a comparison table of the mean latency + high (95th/99th) percentiles, over different cache sizes, or depict the dynamic of the latency throughout the run in a graph (by using the '-s' flag -- I can explain offline), or best do both. If you dig in the region server log you can find records of the hit ratio, which you can also depict alongside the latency; could be nice to see. This results would show that when combining HBase and Caffeine there is no overhead and in some cases a measurable benefit, even in synthetic workloads. 2) [~stack] if the results of these experiments would satisfy the community then the default can be switched to TinyLFU, with LRU being optional and pushed to master. This would allow the community to further experiment with this feature more easily, and to modify it if necessary. 3) Ben briefly described the results of the benchmarks when using a static distribution. Here is my explanation of the results (Ben feel free to correct me if I'm wrong): The distribution of the items is skewed but *static* with a small (high frequency) head and a long (low frequency) tail. With a given cache size -- after the cache is warm -- the items at the head feel the second segment (which is 80% of the cache in TinyLFU) and the following items feel the first segment. With LRU from time to time items from the tail of the distribution cause eviction from the first segment which is later translated to cache misses and increased latency, while TinyLFU tends to keep items with higher frequency in the cache, which results in less misses. As the size of the cache grows less and less items are evicted from the cache and the difference diminishes. With *dynamic* distribution items are continuously evicted from the cache and here the benefit of TinyLFU should be clearly pronounced. We have traces of production workloads that would potentially have skewed dynamic probability. However, we can neither share them and currently don't have the resources to turn them into a running benchmark. We could try to make an effort at this direction if this becomes a make-it-or-break-it point. Would this be acceptable: 1) [~ben.manes] running static YCSB benchmark; 2) [~stack] commit TinyLFU as a default; 3) benchmark with dynamic workloads, either by us or others in the community. > TinyLFU-based BlockCache > ------------------------ > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache > Affects Versions: 2.0.0 > Reporter: Ben Manes > Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)