[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2018-12-05 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710613#comment-16710613
 ] 

Ben Manes commented on SOLR-8241:
-

Another year, another ping!

Do you think that you'll have some time over the holidays or in 2019 to revisit 
this?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-12-30 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16307000#comment-16307000
 ] 

Ben Manes commented on SOLR-8241:
-

Shawn, is this issue something you'd be interested in finalizing in the new 
year? If not, what are the next steps to resolve?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-02-17 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872244#comment-15872244
 ] 

Ben Manes commented on SOLR-8241:
-

[~Timothy055], solr master is now on 2.3.5 (to upgrade its usage in the block 
cache).

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: proposal.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-01-25 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839058#comment-15839058
 ] 

Ben Manes commented on SOLR-8241:
-

[~elyograg]: Solr 6.4.0 was just released. Do you think we can make a 
commitment to resolve this for 6.5.0? We've iterated on the patch for about a 
year now.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: proposal.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-01-05 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802076#comment-15802076
 ] 

Ben Manes commented on SOLR-8241:
-

I think the tests all passed last I checked with this new SolrCache, but I 
don't think we had made it the default yet so that might be a premature 
statement. If you want to upgrade only the 1.x usage, that would be a safe 
change to extract from this patch (a minor API tweak). If anything the later 
versions also have fewer bugs.

I'd love to see this patch land.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-01-05 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802010#comment-15802010
 ] 

Shawn Heisey commented on SOLR-8241:


This issue was filed by the author of Caffeine (Ben Manes) and does include 
upgrading the caffeine dependency already present.  I haven't checked yet, but 
presumably all the Solr tests still pass with the upgrade.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2017-01-05 Thread Timothy M. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801792#comment-15801792
 ] 

Timothy M. Rodriguez commented on SOLR-8241:


+1 for this issue.  Solr currently uses caffeine-1.0.1 in it's distribution, 
which can cause conflicts if you create any extensions that intend to use the 
new library.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> proposal.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-10-08 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558709#comment-15558709
 ] 

Ben Manes commented on SOLR-8241:
-

I think there is a small bug in the "hottest" ordering provided by Caffeine, so 
the warmed-up cache doesn't contain the desired entries. I believe this is a 
simple mistake of concatenating two lists in the wrong order, so that it 
chooses a luke-warm entry instead. I'm not sure how to test my changes to 
verify this with a custom jar in Ivy, though.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-10-08 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558521#comment-15558521
 ] 

Ben Manes commented on SOLR-8241:
-

I can take a stab at tests, but its unclear what to include other than basic 
operations. Otherwise I'd defer to the library for deeper testing, e.g. scan 
resistance and efficiency. In those areas writing tests is for the author to 
have assurance that the library does what it claims. I'd prefer if someone 
obtained production traces instead, which I think would show you an interesting 
hit rate curve for how the policies stack up.

I'm pretty sure the current warming, which populates with the hottest entries 
first, should be good enough. Since reads dominate writes, the hot entries will 
quickly have a high frequency by the time an eviction is triggered. We can try 
to give the first few hot entries a small bump too, by adding a few accesses, 
to add an extra nudge.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch, SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-10-08 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558485#comment-15558485
 ] 

Shawn Heisey commented on SOLR-8241:


Taking a look at this today.

Would you be able to build some tests?  I can copy the existing LFUCache tests 
and modify them until they pass, but it would be better for somebody who knows 
how the cache is *supposed* to work to engineer those tests so they check for 
what *should* happen.  Best possible result would be that my assumptions for 
LFUCache will hold without changes, but that is probably not likely.

New cache warming loses old frequency information, as I already mentioned 
earlier.  I suspect that even without this being preserved, we're likely to see 
a generally higher hitrate than LRU.  I would like to preserve this information 
if possible, but I don't view it as a blocker.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-25 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15521377#comment-15521377
 ] 

Ben Manes commented on SOLR-8241:
-

I took look to refresh myself on LFUCache and decay. I don't think there is an 
issue because TinyLFU has similar logic to age the frequencies asynchronously. 
It observes a sample of 10 * maximum size and then halves the counters. The 
difference is the counters are stored in an array, are 4-bit, and represent all 
items (not just those currently residing in the cache). This extended history 
and using frequency for admission (rather than eviction) is what allows the 
policy to have a superior hit rate and be amortized O(1).

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517521#comment-15517521
 ] 

Ben Manes commented on SOLR-8241:
-

The cache does provide basic snapshot features ordered by the policy (hot/cold, 
young/old). You might be able to change perspectives by having the old 
searchers use a snapshot and rewarming the cache instance.

I do think it will be okay to recreate and warm, just not optimal. It looks 
like in my patch I did try to transfer over the hottest entries, so its 
probably alright.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517462#comment-15517462
 ] 

Shawn Heisey commented on SOLR-8241:


Each searcher has its own cache instances.  It needs to have access to those, 
with information relevant to the previous commit point, until the searcher 
finally gets closed and disappears.  That might happen *after* the new searcher 
begins accepting requests and populating its cache, so I think they do need to 
be separate instances.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517433#comment-15517433
 ] 

Ben Manes commented on SOLR-8241:
-

Can you explain why a new instance is required and the entire cache swapped?

There is an open issue for supporting bulk refresh, but its been low on my list 
of priorities. Not sure if that would have worked for this rewarming process.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517418#comment-15517418
 ] 

Shawn Heisey commented on SOLR-8241:


It's been so long since I wrote LFUCache that I'm having a hard time 
understanding the code.

I seem to remember that I intended warming to preserve the access counter on 
each entry when it was added to the new cache ... but I can't seem to find any 
evidence that I actually implemented this.  I think I might have implemented it 
in the faster replacement that never got finished.

I can't see a way with Caffeine to preserve the hitcounter and other relevance 
information when warming a new cache, which I think means that all warmed 
entries will be as relevant as anything new that ends up in the cache, and will 
therefore likely be the first to get evicted from a freshly warmed cache that 
happens to fill up, even if those particular entries were accessed millions of 
times in previous cache instances.

If there's a way to do the following we'd be OK: "copy the top Nth key from the 
old cache, preserving access info, and then replace the value with XXX"

Even without this capability, Caffeine would probably be overall more efficient 
than LRU, assuming that there's a reasonable span of time between commits that 
open new searchers.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517094#comment-15517094
 ] 

Ben Manes commented on SOLR-8241:
-

Expiration is tricky because it means the data is no longer valid to be 
consumed and should not be consumed. The middle ground here is to 
refreshAfterWrite, which serves stale entries and tries to asynchronously 
reload the value. That covers the common case by not penalizing active entries 
by evicting, while letting inactive ones expire.

That probably isn't enough and its impossible to cover all use-cases. So 
instead its more of a data structure to (hopefully) be malleable to have custom 
workarounds. The CacheWriter can be used to create a victim cache, which a 
CacheLoader could retrieve from. So you could let expired entries populate the 
victim and be promoted back into the cache, sometimes within the same atomic 
operation. Then a rewarming could clear the victim when its done as its 
contents are unnecessary. So something like this is might be workable.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517090#comment-15517090
 ] 

Ben Manes commented on SOLR-8241:
-

Expiration is tricky because it means the data is no longer valid to be 
consumed and should not be consumed. The middle ground here is to 
refreshAfterWrite, which serves stale entries and tries to asynchronously 
reload the value. That covers the common case by not penalizing active entries 
by evicting, while letting inactive ones expire.

That probably isn't enough and its impossible to cover all use-cases. So 
instead its more of a data structure to (hopefully) be malleable to have custom 
workarounds. The CacheWriter can be used to create a victim cache, which a 
CacheLoader could retrieve from. So you could let expired entries populate the 
victim and be promoted back into the cache, sometimes within the same atomic 
operation. Then a rewarming could clear the victim when its done as its 
contents are unnecessary. So something like this is might be workable.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517065#comment-15517065
 ] 

Shawn Heisey commented on SOLR-8241:


Decay (as implemented) can't be *completely* replaced by expiration, unless 
it's possible to migrate the last access time to the new cache when 
autowarming.  I haven't delved deeply enough into Caffeine to determine whether 
that's possible.

I don't recall anything that could be used to directly implement decay as 
already implemented.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517021#comment-15517021
 ] 

Shawn Heisey commented on SOLR-8241:


After some experience using Caffeine in my own code, I think it's absolutely 
something we should have in Solr.  My time is still limited, but I'd like to 
begin pushing this through as a replacement for the existing LFUCache.

I had a thought.  The decay capability currently present in LFUCache can be 
replaced by the expiration feature in Caffeine.  I did have some questions 
about that:

I'm not sure what the default expiration time for cache entries should be, but 
it should definitely be configurable.  The idea that comes to mind first is 4 
hours.

I'm also not sure what time unit to use for expiration.  Milliseconds would 
probably result in very large numbers.  I can see arguments for seconds, 
minutes, hours, and for some users, days.  Caffeine supports multiple time 
units, so the unit could be configurable, which puts millis back on the table.

I don't see support for a feature that I think would be really quite cool:  A 
"minimum size" for the cache, so when the number of entries in the cache drops 
to that level, entries will no longer be removed because of expiration.  For 
Solr's purposes, this would mean that if the cache has seen some activity, 
there will always be *something* available for autowarming, even if it's been a 
week since Solr got a query.  Perhaps this could be configured with 
"expireAfter" methods when the cache is constructed?


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-07-16 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381077#comment-15381077
 ] 

Ben Manes commented on SOLR-8241:
-

Can we try to move this forward again? Thanks!

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-04-12 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238184#comment-15238184
 ] 

Ben Manes commented on SOLR-8241:
-

Thanks for the information. I definitely meant that would be a new issue if we 
were happy with the results here. It makes sense that Lucene wouldn't want 
dependencies and a different expert would be needed to review. As those are 
synchronous I can easily port the code over (its the concurrency that's hard). 
We'll revisit that if we have a positive experience here, as I think this is 
the more critical cache for Solr.

Thanks a lot Shawn for pushing this forward and all your help thus far.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-04-12 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237993#comment-15237993
 ] 

Shawn Heisey commented on SOLR-8241:


Those are Lucene caches.  The first one is in Lucene core.  The other two are 
in the Lucene facet module -- which is a completely different implementation 
from Solr's faceting.

Recently I found some instances of entire external source files being copied 
*into* the Lucene core module.  When I asked about it, I was told that this is 
to comply with a strict rule about Lucene core remaining 100 percent 
dependency-free.  I have not confirmed this rule, but it would not surprise me.

I'm all for replacing LFUCache in Solr with your implementation, and I'm 
willing to do the work to make it happen.  We have quite a few external 
dependencies already.

Working on Lucene will require a separate issue, and even though I have write 
access to the code, I'm going to leave that to people who really understand 
Lucene and who've been around the code a lot longer than I have.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-04-12 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237765#comment-15237765
 ] 

Ben Manes commented on SOLR-8241:
-

There are some other caches that might be worth migrating as well (e.g. 
[LRUQueryCache|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java],
 
[LRUHashMap|https://github.com/apache/lucene-solr/blob/master/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/LRUHashMap.java],
 
[NameIntCacheLRU|https://github.com/apache/lucene-solr/blob/master/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/NameIntCacheLRU.java]).
 It might be good to follow-up after this patch and see what other caches 
benefit from being migrated.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-04-11 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235935#comment-15235935
 ] 

Jeff Wartes commented on SOLR-8241:
---

Since Solr requires Java 8 as of 6.0, it seems like this patch could be applied 
pretty easily now?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-03-03 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178872#comment-15178872
 ] 

Ben Manes commented on SOLR-8241:
-

Percentile stats are best obtained by the metrics library. The stats provided 
by Caffeine are monotonically increasing over the lifetime of the cache. This 
lets the percentiles over a time window be easily calculated by the metrics 
reporter.

The only native time statistic is the load time (cost of computing the entry on 
a miss) because it adds to the user-facing latency. All cache operations are 
O(1) and designed for concurrency, so broadly tracking time would be 
prohibitively expensive given how slow the native time methods are. From 
benchmarks I think the cache offers enough headroom to not be a bottleneck, so 
tracking the hit rate and minimizing the miss penalty are probably the more 
interesting areas to monitor.

I'm not sure what my next steps are to assist here, so let me know if I can be 
of further help.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-03-03 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178727#comment-15178727
 ] 

Shawn Heisey commented on SOLR-8241:


bq. neither Guava or Caffeine bothered to include a {{put}} statistic

For all the Solr cache implementations currently shipping, the put operation is 
pretty much identical, so stats are not likely to be very interesting when 
comparing implementations.  The situation is similar for lookups.  I am not at 
all worried about seeing time stats on either of those, unless it's easy and 
really fast to obtain.

I think that hit ratio is the most important statistic for cache performance, 
and it's already available.  Eviction performance is important, though.  The 
count of evictions, also present currently, is useful.  The speed of evictions, 
in conjunction with the count, can help decide whether the cache is too slow.

If the implementation itself happens to track stats, that's awesome, but I'm 
after more than a calculation of the average time.  Percentile stats give a 
clearer picture of what's happening than plain averages.  I'd love to have the 
same detail on speed data that we got for QTime with SOLR-1972.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-03-03 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178389#comment-15178389
 ] 

Ben Manes commented on SOLR-8241:
-

Using the metrics library should be really easy. There are two simple 
implementation approaches,

1. Use the same approach as [Guava 
metrics|http://antrix.net/posts/2014/codahale-metrics-guava-cache] that polls 
the cache's stats. Caffeine is the next gen, so it has a nearly identical API.
2. Use a custom 
[StatsCounter|http://static.javadoc.io/com.github.ben-manes.caffeine/caffeine/2.2.2/com/github/benmanes/caffeine/cache/stats/StatsCounter.html]
 and {{Caffeine.recordStats(statsCounter)}} that records directly into the 
metrics. This rejected feature 
[request|https://github.com/google/guava/issues/2209#issuecomment-153290342] 
shows an example of that, though I'd return a {{disabledStatsCounter()}} 
instead of throwing an exception if polled.

The only annoyance is neither Guava or Caffeine bothered to include a {{put}} 
statistic. That was partially an oversight and partially because we really 
wanted everyone to load through the cache (put is often an anti-pattern due to 
races). I forgot to add it in with v2 and due to being an API change semvar 
would require that it be in v3 or maybe we can use a [default 
method|https://blog.idrsolutions.com/2015/01/java-8-default-methods-explained-5-minutes/]
 hack for sneaking it into v2.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-03-03 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178352#comment-15178352
 ] 

Shawn Heisey commented on SOLR-8241:


I'm pretty sure that no matter what benchmarks we run, your implementation will 
be MUCH better than my current implementation.  If we put this in, which I am 
in favor of doing as soon as we can, I believe it should replace LFUCache.

Code simplicity alone probably makes it better than my improved implementation 
that isn't committed (SOLR-3393).

I wonder if it might be possible for Solr's cache implementations (including 
this one) to use the codahale metrics library (already in Solr) to record 
statistics about eviction time.  Evictions are the pain point for a cache 
implementation, and being able to compare results with different cache 
implementations would be awesome.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-03-03 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178106#comment-15178106
 ] 

Ben Manes commented on SOLR-8241:
-

I see that [YCSB|https://github.com/brianfrankcooper/YCSB] includes Solr as a 
backend. It is a popular benchmark, though is oriented for comparing key-value 
queries. Still, that might be an easy way to see the performance and cache 
efficiency impact of this proposal.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-02-08 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137458#comment-15137458
 ] 

David Smiley commented on SOLR-8241:


+1 Cool work Ben!  I like the relative simplicity and low amount of code for 
the implementation.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-02-08 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137321#comment-15137321
 ] 

Ben Manes commented on SOLR-8241:
-

I only used the LruCache as a template and removed much of it, though looking 
at LfuCache it might have been easier to work with, since mine was trimmed to 
look very similar. 

LFU is substantially better than LRU for search engine workloads like Solr's. I 
do not have any Solr specific metrics to offer, but the search engine traces I 
do have are very promising. LFU is superior to LRU, and TinyLFU is a 
substantial further improvement. If the impact was not so significant then I 
would not be advocating this change.

WebSearch1 @ 4M (U. of Mass.)
* Lru: 21.6%
* Lfu: 27.1%
* W-TinyLfu: 41.5%
* Optimal: 57.8%

S3 @ 400K (ARC paper)
* Lru: 12.0%
* Lfu: 29.4%
* W-TinyLfu: 43.0%
* Optimal: 60.0%

Yes, this is Java 8 only. The interface of RemovalListener was changed from 
v1.x to v2.x in order to be friendlier lambda syntax for the builder's type 
inference.

Please read this [short 
article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]
 which describes the policy and concurrency mechanism. That should provide you 
enough context to judge this change without taking a deep dive into the 
library's implementation. The patch to Solr's code is quite small.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-02-08 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137111#comment-15137111
 ] 

Shawn Heisey commented on SOLR-8241:


If this proves to be advantageous, it would be my recommendation to replace 
LFUCache, not add an entirely new implementation.  I would also suggest that we 
build something into the tests that will help us evaluate the performance of 
the various cache implementations, to determine if a later step should be to 
change the example configs to use LFUCache.  In general, I believe LFU to be a 
more viable eviction method for Solr than LRU, inherently more capable of 
reaching a higher hit ratio, but we need an efficient implementation.

It looks like this is a master-only patch.  I can see some code that looks like 
it's for Java 8 only, both before and after the patch.  I do not understand 
this part of the patch, because I do not have any idea how to use the new 
capabilities in Java 8, or what those new capabilities actually do:

{code}
-RemovalListener listener = 
-notification -> releaseLocation(notification.getValue());
+RemovalListener listener =
+(key, value, cause) -> releaseLocation(value);
{code}

The code that is patched above does not exist in branch_5x.  It's not a problem 
to have a master-only patch, just something notable.

As for the actual implementation: That is going to take me longer to digest.  
My free time is in short supply.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-02-08 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136695#comment-15136695
 ] 

Ben Manes commented on SOLR-8241:
-

Attached a patch that includes a new SolrCache implementation based on Caffeine 
(version 2.1.0). This was based on the LruCache, trimmed extensively to match 
the requirements in the SolrCaching wiki page.

This passes the "ant precommit" check, but due to a lack of familiarity with 
Solr I didn't run the server to test it. Due to the simplicity of the change I 
think this should be a relatively good prototype to start from. Hopefully there 
isn't much work required to complete this task and see if the cache is 
beneficial. Based on my limited understanding of Solr's existing caches, I 
expect this new one to be both faster and have a higher hit rate.

Shawn, can you please take a look? Thanks!

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2015-12-22 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069335#comment-15069335
 ] 

Ben Manes commented on SOLR-8241:
-

[Benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks] of Caffeine 
shows that the cache is ~33% as fast as an unbounded ConcurrentHashMap. As an 
earlier version is already a dependency, for a proof-of-concept the easiest 
would be to use an adapter into a Solr 
[Cache|https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/util/Cache.java].
 If the results are attractive, the next decision can be whether to use 
Caffeine or incorporate its ideas into a Solr cache instead.

LRU and LFU only retain information of the current working set. That turns out 
to be a limitation and by capturing more history a significantly better 
prediction (and hit rate) can be achieved. How that history is stored and used 
is how many newer polices differ (ARC, LIRS, 2Q, etc). Regardless they 
outperform a LRU / LFU by sometimes very wide margins, which makes choosing one 
very attractive. In the case of TinyLFU its very easy to adapt onto an existing 
policy as it works by filtering (admission) rather than organizing the order of 
exiting (eviction).

The [paper|http://arxiv.org/pdf/1512.00727.pdf] is a bit long, but a good read. 
The simulation code is very simple, though Caffeine's version isn't due to 
tackling the concurrency aspect as well.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2015-12-22 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069229#comment-15069229
 ] 

Shawn Heisey commented on SOLR-8241:


ARC was a cache type that I had read about when I went looking for something 
better than LRU.  If I had known the idea was patented, I never would have 
created an issue for it and would have went straight to LFU.

If I ever find some time, I will work on SOLR-3393.  I haven't looked at how 
W-TinyLfu works or whether it would be a good alternative.  I think there are 
few things to consider:  How the speed compares to the code I cobbled together 
on SOLR-3393, how difficult it is to incorporate/debug, and whether any 
significant library dependencies are added.  It looks like you've used the 
Apache License, so there's no conflicts there.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org