[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

Michael McCandless (JIRA) Mon, 04 May 2009 14:58:54 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705776#action_12705776
 ]


Michael McCandless commented on LUCENE-1593:
--------------------------------------------

Shai your summary of what needs to be done looks right.  But:
shouldn't we do the interface -> abstract class migration (of Weight &
Searchable) under a separate issue?  Ie, under that issue no real
functional change to Lucene is happening.  Then in this issue we can
make the optimizations?

I just ran a quick perf test (best of 5 runs, Linux, JDK 1.6) of the
query 1 OR 2 on a large Wikipedia index.  Using BS instead of BS2
gives a 27% speedup (2.2 QPS -> 2.8).  I'd really like for Lucene to
be able to use BS automatically when it can.  In fact, I think we should
move more scorers to out-of-order, if we can get these kinds of
performance gains.

These changes go beyond that, though, and also enable a separate
optimization whereby the Collector knows it doesn't have to break ties
by docID.  TFC would then use that to gain performance for all in-order
scorers.

bq. Some of these changes were discussed elsewhere already, e.g. deprecating 
Weight and Searchable and make them abstract classes for easier such changes in 
the future.

In fact I think most of the work above is for this (and not the
optimizations), and I think migration from interfaces -> abstract
classes is important (for 2.9).


> Optimizations to TopScoreDocCollector and TopFieldCollector
> -----------------------------------------------------------
>
>                 Key: LUCENE-1593
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1593
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1593.patch, LUCENE-1593.patch, PerfTest.java
>
>
> This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code 
> to remove unnecessary checks. The plan is:
> # Ensure that IndexSearcher returns segements in increasing doc Id order, 
> instead of numDocs().
> # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs 
> will always have larger ids and therefore cannot compete.
> # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF) 
> and remove the check if reusableSD == null.
> # Also move to use "changing top" and then call adjustTop(), in case we 
> update the queue.
> # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker" 
> for the last SortField. But, doing so should not be necessary (since we 
> already break ties by docID), and is in fact less efficient (once the above 
> optimization is in).
> # Investigate PQ - can we deprecate insert() and have only 
> insertWithOverflow()? Add a addDummyObjects method which will populate the 
> queue without "arranging" it, just store the objects in the array (this can 
> be used to pre-populate sentinel values)?
> I will post a patch as well as some perf measurements as soon as I have them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

Reply via email to