Re: [jira] Commented: (LUCENE-1690) Morelikethis queries are very slow compared to other search types

Richard Marr Thu, 30 Jul 2009 03:29:32 -0700

Yeah, having this stuff stored centrally behind the IndexReader seems
like a better idea than having it in client classes. My shallow
knowledge of the code isn't helping me explain why it's not performing
though.


Out of interest, how come it's a per-thread cache? I don't understand
all the issues involved but that surprised me.




2009/7/30 Michael McCandless (JIRA) <[email protected]>:
>
>    [ 
> https://issues.apache.org/jira/browse/LUCENE-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737059#action_12737059
>  ]
>
> Michael McCandless commented on LUCENE-1690:
> --------------------------------------------
>
> OK now I feel silly -- this cache is in fact very similar to the caching that 
> Lucene already does, internally!  Sorry I didn't catch this overlap sooner.
>
> In oal.index.TermInfosReader.java there's an LRU cache, default size 1024, 
> that holds recently retrieved terms and their TermInfo.  It uses 
> oal.util.cache.SimpleLRUCache.
>
> There are some important differences from this new cache in MLT.  EG, it 
> holds the entire TermInfo, not just the docFreq.  Plus, it's a central cache 
> for any & all term lookups that go through the SegmentReader.  Also, it's 
> stored in thread-private storage, so each thread has its own cache.
>
> But, now I'm confused: how come you are not already seeing the benefits of 
> this cache?  You ought to see MLT queries going faster.  This core cache was 
> first added in 2.4.x; it looks like you were testing against 2.4.1 (from the 
> "Affects Version" on this issue).
>
>> Morelikethis queries are very slow compared to other search types
>> -----------------------------------------------------------------
>>
>>                 Key: LUCENE-1690
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-1690
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: contrib/*
>>    Affects Versions: 2.4.1
>>            Reporter: Richard Marr
>>            Priority: Minor
>>         Attachments: LruCache.patch, LUCENE-1690.patch, LUCENE-1690.patch
>>
>>   Original Estimate: 2h
>>  Remaining Estimate: 2h
>>
>> The MoreLikeThis object performs term frequency lookups for every query.  
>> From my testing that's what seems to take up the majority of time for 
>> MoreLikeThis searches.
>> For some (I'd venture many) applications it's not necessary for term 
>> statistics to be looked up every time. A fairly naive opt-in caching 
>> mechanism tied to the life of the MoreLikeThis object would allow 
>> applications to cache term statistics for the duration that suits them.
>> I've got this working in my test code. I'll put together a patch file when I 
>> get a minute. From my testing this can improve performance by a factor of 
>> around 10.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>



-- 
Richard Marr
[email protected]
07976 910 515

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [jira] Commented: (LUCENE-1690) Morelikethis queries are very slow compared to other search types

Reply via email to