[ https://issues.apache.org/jira/browse/ACCUMULO-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975517#comment-15975517 ]
Adam Fuchs commented on ACCUMULO-4626: -------------------------------------- Basically, the eviction thread is separate, and the work that it has to do to evict a set of blocks relative to the work done in the iterators is small. It is technically a race condition (at least from a performance perspective), and the cache eviction thread wins the race. I believe the core condition needed to trigger this is that the sum of sizes of the referenced blocks across all of the concurrently running queries exceeds the 25% or so of the total cache that is reserved for single-use blocks. We were able to work around it in this case by increasing the total block cache size, but that's not necessarily always a viable solution. > improve cache hit rate via weak reference map > --------------------------------------------- > > Key: ACCUMULO-4626 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4626 > Project: Accumulo > Issue Type: Improvement > Components: tserver > Reporter: Adam Fuchs > Labels: performance, stability > Time Spent: 10m > Remaining Estimate: 0h > > When a single iterator tree references the same RFile blocks in different > branches we sometimes get cache misses for one iterator even though the > requested block is held in memory by another iterator. This is particularly > important when using something like the IntersectingIterator to intersect > many deep copies. Instead of evicting completely, keeping evicted blocks into > a WeakReference value map can avoid re-reading blocks that are currently > referenced by another deep copied source iterator. > We've seen this in the field for some of Sqrrl's queries against very large > tablets. The total memory usage for these queries can be equal to the size of > all the iterator block reads times the number of readahead threads times the > number of files times the number of IntersectingIterator children when cache > miss rates are high. This might work out to something like: > {code} > 16 readahead threads * 200 deep copied children * 99% cache miss rate * 20 > files * 252KB per reader = ~16GB of memory > {code} > In most cases, evicting to a weak reference value map changes the cache miss > rate from very high to very low and has a dramatic effect on total memory > usage. -- This message was sent by Atlassian JIRA (v6.3.15#6346)