[ https://issues.apache.org/jira/browse/ACCUMULO-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975909#comment-15975909 ]
Josh Elser commented on ACCUMULO-4626: -------------------------------------- Thanks for the explanation, Adam. bq. the sum of sizes of the referenced blocks across all of the concurrently running queries exceeds the 25% or so of the total cache that is reserved for single-use blocks That seems really aggressive in terms of eviction to me. Are we getting poor cache utilization because of that? Maybe there's some other characteristic which keeps more-often accessed blocks in cache? I'm trying to get a better understanding (admittedly without yet pulling up the code) about how this would affect more real-life workloads. > improve cache hit rate via weak reference map > --------------------------------------------- > > Key: ACCUMULO-4626 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4626 > Project: Accumulo > Issue Type: Improvement > Components: tserver > Reporter: Adam Fuchs > Labels: performance, stability > Time Spent: 10m > Remaining Estimate: 0h > > When a single iterator tree references the same RFile blocks in different > branches we sometimes get cache misses for one iterator even though the > requested block is held in memory by another iterator. This is particularly > important when using something like the IntersectingIterator to intersect > many deep copies. Instead of evicting completely, keeping evicted blocks into > a WeakReference value map can avoid re-reading blocks that are currently > referenced by another deep copied source iterator. > We've seen this in the field for some of Sqrrl's queries against very large > tablets. The total memory usage for these queries can be equal to the size of > all the iterator block reads times the number of readahead threads times the > number of files times the number of IntersectingIterator children when cache > miss rates are high. This might work out to something like: > {code} > 16 readahead threads * 200 deep copied children * 99% cache miss rate * 20 > files * 252KB per reader = ~16GB of memory > {code} > In most cases, evicting to a weak reference value map changes the cache miss > rate from very high to very low and has a dramatic effect on total memory > usage. -- This message was sent by Atlassian JIRA (v6.3.15#6346)