RE: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Uwe Schindler Fri, 23 Oct 2009 09:35:48 -0700

Mark,

when removing may comment (as I now understand the whole
FieldDocSortedHitQueue), I found the following as a optimization of the
whole hq:


All FieldDoc values are Compareables (also the score or docid, if they
appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
of lessThan seems very ineffective, as it has a big switch statement on the
SortField type, then casts the value to the underlying numeric type Object,
calls Number.xxxValue() & co for it and then compares manually. As
j.l.Number is itself Comparable, I see no reason to do this. Just call
compareTo on the Comparable interface and we are happy. The big deal is that
it prevents casting and the two method calls xxxValue(), as Number.compareTo
works more efficient internally.

The only special cases are String sort, where the Locale may be used and the
score sorting which is backwards. But these are two if statements instead of
the whole switch.

I had not tested it now for performance, but in my opinion it should be
faster for MultiSearchers. All tests still pass (because they should).

Attached patch applies to (current) trunk.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Mark Miller (JIRA) [mailto:j...@apache.org]
> Sent: Friday, October 23, 2009 3:33 PM
> To: java-dev@lucene.apache.org
> Subject: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of
> multi-PQ vs single-PQ sorting API
> 
> 
>     [ https://issues.apache.org/jira/browse/LUCENE-
> 1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12769221#action_12769221 ]
> 
> Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM:
> ---------------------------------------------------------------
> 
> bq. but how does this fit together.
> 
> Thats what Comparable FieldComparator#value is for - fillFields will grab
> all those and load up FieldDoc fields - so the custom FieldComparator is
> tied into it - it creates Comparable objects that can be compared by the
> native compareTos. (the old API did the same thing)
> 
> {code}
>   /**
>    * Given a queue Entry, creates a corresponding FieldDoc
>    * that contains the values used to sort the given document.
>    * These values are not the raw values out of the index, but the
> internal
>    * representation of them. This is so the given search hit can be
> collated by
>    * a MultiSearcher with other search hits.
>    *
>    * @param entry The Entry used to create a FieldDoc
>    * @return The newly created FieldDoc
>    * @see Searchable#search(Weight,Filter,int,Sort)
>    */
>   FieldDoc fillFields(final Entry entry) {
>     final int n = comparators.length;
>     final Comparable[] fields = new Comparable[n];
>     for (int i = 0; i < n; ++i) {
>       fields[i] = comparators[i].value(entry.slot);
>     }
>     //if (maxscore > 1.0f) doc.score /= maxscore;   // normalize scores
>     return new FieldDoc(entry.docID, entry.score, fields);
>   }
> {code}
> 
>       was (Author: markrmil...@gmail.com):
>     bq. but how does this fit together.
> 
> Thats what Comparable FieldComparator#value is for - fillFields will grab
> all those and load up FieldDoc fields - so the custom FieldComparator is
> tied into it - it creates Comparable objects that can be compared by the
> native compareTos.
> 
> {code}
>   /**
>    * Given a queue Entry, creates a corresponding FieldDoc
>    * that contains the values used to sort the given document.
>    * These values are not the raw values out of the index, but the
> internal
>    * representation of them. This is so the given search hit can be
> collated by
>    * a MultiSearcher with other search hits.
>    *
>    * @param entry The Entry used to create a FieldDoc
>    * @return The newly created FieldDoc
>    * @see Searchable#search(Weight,Filter,int,Sort)
>    */
>   FieldDoc fillFields(final Entry entry) {
>     final int n = comparators.length;
>     final Comparable[] fields = new Comparable[n];
>     for (int i = 0; i < n; ++i) {
>       fields[i] = comparators[i].value(entry.slot);
>     }
>     //if (maxscore > 1.0f) doc.score /= maxscore;   // normalize scores
>     return new FieldDoc(entry.docID, entry.score, fields);
>   }
> {code}
> 
> > Explore performance of multi-PQ vs single-PQ sorting API
> > --------------------------------------------------------
> >
> >                 Key: LUCENE-1997
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
> >             Project: Lucene - Java
> >          Issue Type: Improvement
> >          Components: Search
> >    Affects Versions: 2.9
> >            Reporter: Michael McCandless
> >            Assignee: Michael McCandless
> >         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
> >
> >
> > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> > where a simpler (non-segment-based) comparator API is proposed that
> > gathers results into multiple PQs (one per segment) and then merges
> > them in the end.
> > I started from John's multi-PQ code and worked it into
> > contrib/benchmark so that we could run perf tests.  Then I generified
> > the Python script I use for running search benchmarks (in
> > contrib/benchmark/sortBench.py).
> > The script first creates indexes with 1M docs (based on
> > SortableSingleDocSource, and based on wikipedia, if available).  Then
> > it runs various combinations:
> >   * Index with 20 balanced segments vs index with the "normal" log
> >     segment size
> >   * Queries with different numbers of hits (only for wikipedia index)
> >   * Different top N
> >   * Different sorts (by title, for wikipedia, and by random string,
> >     random int, and country for the random index)
> > For each test, 7 search rounds are run and the best QPS is kept.  The
> > script runs singlePQ then multiPQ, and records the resulting best QPS
> > for each and produces table (in Jira format) as output.
> 
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org

FieldDocSortedHitQueue.patch
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Reply via email to