[jira] Updated: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

Shai Erera (JIRA) Mon, 27 Apr 2009 07:26:52 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shai Erera updated LUCENE-1593:
-------------------------------

    Attachment: PerfTest.java
                LUCENE-1593.patch

The patch implements all that has been suggested except:
* pre-populating the queue in TopFieldCollector - as was noted here previously, 
this seems to remove the 'if (queueFull)' check but add another 'if' in 
FieldComparator (which may be executed several times per collect().
* Move initCountingSumScorer() to BS2's ctor and add(). That's because if more 
than one Scorer is added we create a DisjunctionSumScorer, which initializes 
its queue by calling next() on the passed-in Scorer. Therefore if we call 
initCountingSumScorer for every Scorer added, we advance that Scorer as well as 
all the previous ones. I chose to discard that optimization, which only affects 
next() and skipTo().

The patch also includes the fix for TestSort in the 2.4 back_compat branch. I 
only fixed TestSort, and not MultiSearcher and ParallelMultiSearcher.

All tests pass.

I also ran some performance measurements (all on SRV 2003):

|| JRE || sort || best time (trunk) || best time (patch) || diff (%) ||
| SUN 1.6 | int | 1017.59 | 1015.96 | {color:green}~1%{color} |
| SUN 1.6 | doc | 767.49 | 763.20 | {color:green}~1%{color} |
| IBM 1.5 | int | 1018.77 | 1017.39 | {color:green}~1%{color} |
| IBM 1.5 | doc | 768.10 | 764.14 | {color:green}~1%{color} |

As you can see, there is a slight performance improvement, but nothing too 
dramatic.

You are welcome to review the patch as well as run the PerfTest I attached. It 
accepts two arguments: <indexDir> and [sort]. 'sort' is optional and if not 
defined it sorts by doc. Otherwise, whatever you pass there, it sorts by int.

> Optimizations to TopScoreDocCollector and TopFieldCollector
> -----------------------------------------------------------
>
>                 Key: LUCENE-1593
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1593
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1593.patch, PerfTest.java
>
>
> This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code 
> to remove unnecessary checks. The plan is:
> # Ensure that IndexSearcher returns segements in increasing doc Id order, 
> instead of numDocs().
> # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs 
> will always have larger ids and therefore cannot compete.
> # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF) 
> and remove the check if reusableSD == null.
> # Also move to use "changing top" and then call adjustTop(), in case we 
> update the queue.
> # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker" 
> for the last SortField. But, doing so should not be necessary (since we 
> already break ties by docID), and is in fact less efficient (once the above 
> optimization is in).
> # Investigate PQ - can we deprecate insert() and have only 
> insertWithOverflow()? Add a addDummyObjects method which will populate the 
> queue without "arranging" it, just store the objects in the array (this can 
> be used to pre-populate sentinel values)?
> I will post a patch as well as some perf measurements as soon as I have them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

Reply via email to