Yeah, having this stuff stored centrally behind the IndexReader seems like a better idea than having it in client classes. My shallow knowledge of the code isn't helping me explain why it's not performing though.
Out of interest, how come it's a per-thread cache? I don't understand all the issues involved but that surprised me. 2009/7/30 Michael McCandless (JIRA) <j...@apache.org>: > > [ > https://issues.apache.org/jira/browse/LUCENE-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737059#action_12737059 > ] > > Michael McCandless commented on LUCENE-1690: > -------------------------------------------- > > OK now I feel silly -- this cache is in fact very similar to the caching that > Lucene already does, internally! Sorry I didn't catch this overlap sooner. > > In oal.index.TermInfosReader.java there's an LRU cache, default size 1024, > that holds recently retrieved terms and their TermInfo. It uses > oal.util.cache.SimpleLRUCache. > > There are some important differences from this new cache in MLT. EG, it > holds the entire TermInfo, not just the docFreq. Plus, it's a central cache > for any & all term lookups that go through the SegmentReader. Also, it's > stored in thread-private storage, so each thread has its own cache. > > But, now I'm confused: how come you are not already seeing the benefits of > this cache? You ought to see MLT queries going faster. This core cache was > first added in 2.4.x; it looks like you were testing against 2.4.1 (from the > "Affects Version" on this issue). > >> Morelikethis queries are very slow compared to other search types >> ----------------------------------------------------------------- >> >> Key: LUCENE-1690 >> URL: https://issues.apache.org/jira/browse/LUCENE-1690 >> Project: Lucene - Java >> Issue Type: Improvement >> Components: contrib/* >> Affects Versions: 2.4.1 >> Reporter: Richard Marr >> Priority: Minor >> Attachments: LruCache.patch, LUCENE-1690.patch, LUCENE-1690.patch >> >> Original Estimate: 2h >> Remaining Estimate: 2h >> >> The MoreLikeThis object performs term frequency lookups for every query. >> From my testing that's what seems to take up the majority of time for >> MoreLikeThis searches. >> For some (I'd venture many) applications it's not necessary for term >> statistics to be looked up every time. A fairly naive opt-in caching >> mechanism tied to the life of the MoreLikeThis object would allow >> applications to cache term statistics for the duration that suits them. >> I've got this working in my test code. I'll put together a patch file when I >> get a minute. From my testing this can improve performance by a factor of >> around 10. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Richard Marr richard.m...@gmail.com 07976 910 515 --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org