Ben Manes created HIVE-14983: -------------------------------- Summary: Evaluate TinyLFU cache Key: HIVE-14983 URL: https://issues.apache.org/jira/browse/HIVE-14983 Project: Hive Issue Type: Improvement Components: llap Reporter: Ben Manes
The ORC low-level cache is either FIFO or LRFU, with the latter being the default. Least-Recently-Freq-Used is an O(lg n) policy that tries to recency with frequency for the working set. It uses a heap for frequency ordering and a linked list for recency ordering. [TinyLFU|https://arxiv.org/pdf/1512.00727.pdf] is an O(1) policy that uses a sketch to probabilistically estimate an entry's frequency. Instead of focusing on eviction, the policy filters out low-value entries from entering the cache. It retains a larger history than the working set by retaining the frequency in a compact counter array (e.g. [4-bit CountMin Sketch|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/FrequencySketch.java]). Simulations show that a small LRU window + TinyLFU + SLRU main cache has the [best performance|https://github.com/ben-manes/caffeine/wiki/Efficiency]. If the project is ready to adopt Java 8 then it could use [Caffeine|https://github.com/ben-manes/caffeine], the successor to Guava's cache. It provides improved [concurrency|https://github.com/ben-manes/caffeine/wiki/Benchmarks] in addition to the higher hit rate. But I think extracting the project's CountMin Sketch for a custom implementation would be a win. The result would be simpler code and an improved hit rates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)