[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Yonik Seeley (JIRA) Wed, 18 Nov 2009 08:31:03 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779514#action_12779514
 ]


Yonik Seeley commented on LUCENE-2075:
--------------------------------------

The Solr one could be simplified a lot for Lucene... no need to keep some of 
the statistics and things like "isLive".

Testing via something like the double barrel approach will be tricky.  The 
behavior of ConcurrentLRUCache (i.e. the cost of puts) depends on the access 
pattern - in the best cases, a single linear scan would be all that's needed.  
In the worst case, a subset of the  map needs to go into a priority queue.  
It's all in markAndSweep... that's my monster - let me know if the comments 
don't make sense.

How many entries must be removed to be considered a success also obviously 
affects whether a single linear scan is enough.  If that's often the case, some 
other optimizations can be done such as not collecting the entries for further 
passes:
{code}
          // This entry *could* be in the bottom group.
          // Collect these entries to avoid another full pass... this is wasted
          // effort if enough entries are normally removed in this first pass.
          // An alternate impl could make a full second pass.
{code}

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
>                 Key: LUCENE-2075
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2075
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: ConcurrentLRUCache.java
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage.  You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap.  One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary).  You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary.  Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Reply via email to