[ 
https://issues.apache.org/jira/browse/ACCUMULO-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985087#comment-15985087
 ] 

Ben Manes commented on ACCUMULO-4626:
-------------------------------------

I think then the hill climbing should be effective. The more access traces we 
have to simulate with, the more robust our algorithms can get. We have 3 traces 
where LRU is optimal: an ORM's cache (small), Gradle distributed build cache 
(medium), and an unknown user submitted trace (large). In these traces, 
Caffeine's static configuration did slightly worse, but only by a few percent.

The cache is split into two LRU regions (window, main) and the window cache's 
victim is promoted to the main cache if it passes a frequency filter. That 
filter is a compact array of counters used estimate the popularity of the 
window's candidate vs main's victim. The hill climbing adjusts the size of the 
two regions by observing the change in hit rates over a sample period (10x 
size), increasing the target region if better or reversing direction (1/16th 
pivot). This way the window is increased for recency-biased workloads and 
decreased for frequency-biased.

If frequency is a bad indicator in your workloads then this improvement should 
correct for that.


> improve cache hit rate via weak reference map
> ---------------------------------------------
>
>                 Key: ACCUMULO-4626
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4626
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Adam Fuchs
>              Labels: performance, stability
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> When a single iterator tree references the same RFile blocks in different 
> branches we sometimes get cache misses for one iterator even though the 
> requested block is held in memory by another iterator. This is particularly 
> important when using something like the IntersectingIterator to intersect 
> many deep copies. Instead of evicting completely, keeping evicted blocks into 
> a WeakReference value map can avoid re-reading blocks that are currently 
> referenced by another deep copied source iterator.
> We've seen this in the field for some of Sqrrl's queries against very large 
> tablets. The total memory usage for these queries can be equal to the size of 
> all the iterator block reads times the number of readahead threads times the 
> number of files times the number of IntersectingIterator children when cache 
> miss rates are high. This might work out to something like:
> {code}
> 16 readahead threads * 200 deep copied children * 99% cache miss rate * 20 
> files * 252KB per reader = ~16GB of memory
> {code}
> In most cases, evicting to a weak reference value map changes the cache miss 
> rate from very high to very low and has a dramatic effect on total memory 
> usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to