[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-2075: --------------------------------------- Attachment: LUCENE-2075.patch First cut at a benchmark. First, download http://concurrentlinkedhashmap.googlecode.com/files/clhm-production.jar and put into your lib subdir, then run "ant -lib lib/clhm-production.jar compile-core", then run it something like this: {code} java -server -Xmx1g -Xms1g -cp build/classes/java:lib/clhm-production.jar org.apache.lucene.util.cache.LRUBench 4 5.0 0.0 1024 1024 {code} The args are: * numThreads * runSec * sharePct -- what %tg of the terms should be shared b/w the threads * cacheSize * termCountPerThread -- how many terms each thread will cycle through The benchmark first sets up arrays of strings, per thread, based termsCountPerThread & sharePct. Then each thread steps through the array, and for each entry, tries to get the string, and if it's not present, puts it. It records the hit & miss count, and prints summary stats in the end, doing 3 rounds. To mimic Lucene, each entry is tested twice in a row, ie, the 2nd time we test the entry, it should be a hit. Ie we expect a hit rate of 50% if sharePct is 0. Here's my output from the above command line, using Java 1.6.0_14 (64 bit) on OpenSolaris: {code} numThreads=4 runSec=5.0 sharePct=0.0 cacheSize=1024 termCountPerThread=1024 LRU cache size is 1024; each thread steps through 1024 strings; 0 of which are common round 0 sync(LinkedHashMap): Mops/sec=2.472 hitRate=50.734 DoubleBarreLRU: Mops/sec=20.502 hitRate=50 ConcurrentLRU: Mops/sec=17.936 hitRate=84.409 ConcurrentLinkedHashMap: Mops/sec=1.248 hitRate=50.033 round 1 sync(LinkedHashMap): Mops/sec=2.766 hitRate=50.031 DoubleBarreLRU: Mops/sec=17.66 hitRate=50 ConcurrentLRU: Mops/sec=17.82 hitRate=83.726 ConcurrentLinkedHashMap: Mops/sec=1.266 hitRate=50.331 round 2 sync(LinkedHashMap): Mops/sec=2.714 hitRate=50.168 DoubleBarreLRU: Mops/sec=17.912 hitRate=50 ConcurrentLRU: Mops/sec=17.866 hitRate=84.156 ConcurrentLinkedHashMap: Mops/sec=1.26 hitRate=50.254 {code} NOTE: I'm not sure about the correctness of DoubleBarrelLRU -- I just quickly wrote it. Also, the results for ConcurrentLRUCache are invalid (its hit rate is way too high) -- I think this is because its eviction process can take a longish amount of time, which temporarily allows the map to hold way too many entries, and means it's using up alot more transient RAM than it should. In theory DoubleBarrelLRU should be vulnerable to the same issue, but in practice it seems to affect it much less (I guess because CHM.clear() must be very fast). I'm not sure how to fix the benchmark to workaround that... maybe we bring back the cleaning thread (from Solr's version), and give it a high priority? Another idea: I wonder whether a simple cache-line like cache would be sufficient. Ie, we hash to a fixed slot and we evict whatever is there. > Share the Term -> TermInfo cache across threads > ----------------------------------------------- > > Key: LUCENE-2075 > URL: https://issues.apache.org/jira/browse/LUCENE-2075 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Priority: Minor > Fix For: 3.1 > > Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, > LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, > LUCENE-2075.patch > > > Right now each thread creates its own (thread private) SimpleLRUCache, > holding up to 1024 terms. > This is rather wasteful, since if there are a high number of threads > that come through Lucene, you're multiplying the RAM usage. You're > also cutting way back on likelihood of a cache hit (except the known > multiple times we lookup a term within-query, which uses one thread). > In NRT search we open new SegmentReaders (on tiny segments) often > which each thread must then spend CPU/RAM creating & populating. > Now that we are on 1.5 we can use java.util.concurrent.*, eg > ConcurrentHashMap. One simple approach could be a double-barrel LRU > cache, using 2 maps (primary, secondary). You check the cache by > first checking primary; if that's a miss, you check secondary and if > you get a hit you promote it to primary. Once primary is full you > clear secondary and swap them. > Or... any other suggested approach? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org