[ 
https://issues.apache.org/jira/browse/LUCENE-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-502:
-------------------------------

    Attachment: LUCENE-503.patch

Are we interested in this optimization?

Here is an attempted patch. 

Two issues:

1. Seems it might be better to try and use IDF to determine which scorer to use 
(TermScorer or LowFreqTermScorer) rather than doc freq so that doc freq doesn't 
need to be accessed twice.

2. I don't know at what 'level' the LowFreqTermScorer should be cut out for the 
TermScorer. Some benching may help.

> TermScorer caches values unnecessarily
> --------------------------------------
>
>                 Key: LUCENE-502
>                 URL: https://issues.apache.org/jira/browse/LUCENE-502
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 1.9
>            Reporter: Steven Tamm
>            Priority: Minor
>         Attachments: LUCENE-503.patch, TermScorer.patch
>
>
> TermScorer aggressively caches the doc and freq of 32 documents at a time for 
> each term scored.  When querying for a lot of terms, this causes a lot of 
> garbage to be created that's unnecessary.  The SegmentTermDocs from which it 
> retrieves its information doesn't have any optimizations for bulk loading, 
> and it's unnecessary.
> In addition, it has a SCORE_CACHE, that's of limited benefit.  It's caching 
> the result of a sqrt that should be placed in DefaultSimilarity, and if 
> you're only scoring a few documents that contain those terms, there's no need 
> to precalculate the SQRT, especially on modern VMs.
> Enclosed is a patch that replaces TermScorer with a version that does not 
> cache the docs or feqs.  In the case of a lot of queries, that saves 196 
> bytes/term, the unnecessary disk IO, and extra SQRTs which adds up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to