[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069335#comment-15069335
 ] 

Ben Manes commented on SOLR-8241:
---------------------------------

[Benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks] of Caffeine 
shows that the cache is ~33% as fast as an unbounded ConcurrentHashMap. As an 
earlier version is already a dependency, for a proof-of-concept the easiest 
would be to use an adapter into a Solr 
[Cache|https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/util/Cache.java].
 If the results are attractive, the next decision can be whether to use 
Caffeine or incorporate its ideas into a Solr cache instead.

LRU and LFU only retain information of the current working set. That turns out 
to be a limitation and by capturing more history a significantly better 
prediction (and hit rate) can be achieved. How that history is stored and used 
is how many newer polices differ (ARC, LIRS, 2Q, etc). Regardless they 
outperform a LRU / LFU by sometimes very wide margins, which makes choosing one 
very attractive. In the case of TinyLFU its very easy to adapt onto an existing 
policy as it works by filtering (admission) rather than organizing the order of 
exiting (eviction).

The [paper|http://arxiv.org/pdf/1512.00727.pdf] is a bit long, but a good read. 
The simulation code is very simple, though Caffeine's version isn't due to 
tackling the concurrency aspect as well.

> Evaluate W-TinyLfu cache
> ------------------------
>
>                 Key: SOLR-8241
>                 URL: https://issues.apache.org/jira/browse/SOLR-8241
>             Project: Solr
>          Issue Type: Wish
>          Components: search
>            Reporter: Ben Manes
>            Priority: Minor
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to