[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178872#comment-15178872
 ] 

Ben Manes commented on SOLR-8241:
---------------------------------

Percentile stats are best obtained by the metrics library. The stats provided 
by Caffeine are monotonically increasing over the lifetime of the cache. This 
lets the percentiles over a time window be easily calculated by the metrics 
reporter.

The only native time statistic is the load time (cost of computing the entry on 
a miss) because it adds to the user-facing latency. All cache operations are 
O(1) and designed for concurrency, so broadly tracking time would be 
prohibitively expensive given how slow the native time methods are. From 
benchmarks I think the cache offers enough headroom to not be a bottleneck, so 
tracking the hit rate and minimizing the miss penalty are probably the more 
interesting areas to monitor.

I'm not sure what my next steps are to assist here, so let me know if I can be 
of further help.

> Evaluate W-TinyLfu cache
> ------------------------
>
>                 Key: SOLR-8241
>                 URL: https://issues.apache.org/jira/browse/SOLR-8241
>             Project: Solr
>          Issue Type: Wish
>          Components: search
>            Reporter: Ben Manes
>            Priority: Minor
>         Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to