Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Mark Miller Fri, 23 Oct 2009 13:14:44 -0700

No - was considering how one might be added - Mike's python script
control to JIRA output stuff is just so cool, I'd hate to test any other
way ;) The new colors feature makes it even better. Not sure how best to
fit it in though - need a way to specify multiple indices obviously.


Would love to get that Python into java too :) I thought Jason had
started an issue for that, but I don't think it went very far. Would be
great if all that was more generically built into the benchmarker somehow.

Uwe Schindler wrote:
> I opened LUCENE-2006.
>
> Is there any MultiSearcher related task/alg in contrib/benchmark?
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
>   
>> -----Original Message-----
>> From: Mark Miller [mailto:[email protected]]
>> Sent: Friday, October 23, 2009 7:53 PM
>> To: [email protected]
>> Subject: Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore
>> performance of multi-PQ vs single-PQ sorting API
>>
>> Nice! I like it. Even if its not much faster (havn't checked either), I
>> can't see it being much slower and its cleaner code.
>>
>> I'd be happy to do some quick perf tests when I get a chance, but I'm +1
>> on it.
>>
>> Uwe Schindler wrote:
>>     
>>> Mark,
>>>
>>> when removing may comment (as I now understand the whole
>>> FieldDocSortedHitQueue), I found the following as a optimization of the
>>> whole hq:
>>>
>>> All FieldDoc values are Compareables (also the score or docid, if they
>>> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The
>>>       
>> code
>>     
>>> of lessThan seems very ineffective, as it has a big switch statement on
>>>       
>> the
>>     
>>> SortField type, then casts the value to the underlying numeric type
>>>       
>> Object,
>>     
>>> calls Number.xxxValue() & co for it and then compares manually. As
>>> j.l.Number is itself Comparable, I see no reason to do this. Just call
>>> compareTo on the Comparable interface and we are happy. The big deal is
>>>       
>> that
>>     
>>> it prevents casting and the two method calls xxxValue(), as
>>>       
>> Number.compareTo
>>     
>>> works more efficient internally.
>>>
>>> The only special cases are String sort, where the Locale may be used and
>>>       
>> the
>>     
>>> score sorting which is backwards. But these are two if statements
>>>       
>> instead of
>>     
>>> the whole switch.
>>>
>>> I had not tested it now for performance, but in my opinion it should be
>>> faster for MultiSearchers. All tests still pass (because they should).
>>>
>>> Attached patch applies to (current) trunk.
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: [email protected]
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: Mark Miller (JIRA) [mailto:[email protected]]
>>>> Sent: Friday, October 23, 2009 3:33 PM
>>>> To: [email protected]
>>>> Subject: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance
>>>>         
>> of
>>     
>>>> multi-PQ vs single-PQ sorting API
>>>>
>>>>
>>>>     [ https://issues.apache.org/jira/browse/LUCENE-
>>>> 1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
>>>> tabpanel&focusedCommentId=12769221#action_12769221 ]
>>>>
>>>> Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM:
>>>> ---------------------------------------------------------------
>>>>
>>>> bq. but how does this fit together.
>>>>
>>>> Thats what Comparable FieldComparator#value is for - fillFields will
>>>>         
>> grab
>>     
>>>> all those and load up FieldDoc fields - so the custom FieldComparator
>>>>         
>> is
>>     
>>>> tied into it - it creates Comparable objects that can be compared by
>>>>         
>> the
>>     
>>>> native compareTos. (the old API did the same thing)
>>>>
>>>> {code}
>>>>   /**
>>>>    * Given a queue Entry, creates a corresponding FieldDoc
>>>>    * that contains the values used to sort the given document.
>>>>    * These values are not the raw values out of the index, but the
>>>> internal
>>>>    * representation of them. This is so the given search hit can be
>>>> collated by
>>>>    * a MultiSearcher with other search hits.
>>>>    *
>>>>    * @param entry The Entry used to create a FieldDoc
>>>>    * @return The newly created FieldDoc
>>>>    * @see Searchable#search(Weight,Filter,int,Sort)
>>>>    */
>>>>   FieldDoc fillFields(final Entry entry) {
>>>>     final int n = comparators.length;
>>>>     final Comparable[] fields = new Comparable[n];
>>>>     for (int i = 0; i < n; ++i) {
>>>>       fields[i] = comparators[i].value(entry.slot);
>>>>     }
>>>>     //if (maxscore > 1.0f) doc.score /= maxscore;   // normalize scores
>>>>     return new FieldDoc(entry.docID, entry.score, fields);
>>>>   }
>>>> {code}
>>>>
>>>>       was (Author: [email protected]):
>>>>     bq. but how does this fit together.
>>>>
>>>> Thats what Comparable FieldComparator#value is for - fillFields will
>>>>         
>> grab
>>     
>>>> all those and load up FieldDoc fields - so the custom FieldComparator
>>>>         
>> is
>>     
>>>> tied into it - it creates Comparable objects that can be compared by
>>>>         
>> the
>>     
>>>> native compareTos.
>>>>
>>>> {code}
>>>>   /**
>>>>    * Given a queue Entry, creates a corresponding FieldDoc
>>>>    * that contains the values used to sort the given document.
>>>>    * These values are not the raw values out of the index, but the
>>>> internal
>>>>    * representation of them. This is so the given search hit can be
>>>> collated by
>>>>    * a MultiSearcher with other search hits.
>>>>    *
>>>>    * @param entry The Entry used to create a FieldDoc
>>>>    * @return The newly created FieldDoc
>>>>    * @see Searchable#search(Weight,Filter,int,Sort)
>>>>    */
>>>>   FieldDoc fillFields(final Entry entry) {
>>>>     final int n = comparators.length;
>>>>     final Comparable[] fields = new Comparable[n];
>>>>     for (int i = 0; i < n; ++i) {
>>>>       fields[i] = comparators[i].value(entry.slot);
>>>>     }
>>>>     //if (maxscore > 1.0f) doc.score /= maxscore;   // normalize scores
>>>>     return new FieldDoc(entry.docID, entry.score, fields);
>>>>   }
>>>> {code}
>>>>
>>>>
>>>>         
>>>>> Explore performance of multi-PQ vs single-PQ sorting API
>>>>> --------------------------------------------------------
>>>>>
>>>>>                 Key: LUCENE-1997
>>>>>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>>>>>             Project: Lucene - Java
>>>>>          Issue Type: Improvement
>>>>>          Components: Search
>>>>>    Affects Versions: 2.9
>>>>>            Reporter: Michael McCandless
>>>>>            Assignee: Michael McCandless
>>>>>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>>>>>
>>>>>
>>>>> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
>>>>> where a simpler (non-segment-based) comparator API is proposed that
>>>>> gathers results into multiple PQs (one per segment) and then merges
>>>>> them in the end.
>>>>> I started from John's multi-PQ code and worked it into
>>>>> contrib/benchmark so that we could run perf tests.  Then I generified
>>>>> the Python script I use for running search benchmarks (in
>>>>> contrib/benchmark/sortBench.py).
>>>>> The script first creates indexes with 1M docs (based on
>>>>> SortableSingleDocSource, and based on wikipedia, if available).  Then
>>>>> it runs various combinations:
>>>>>   * Index with 20 balanced segments vs index with the "normal" log
>>>>>     segment size
>>>>>   * Queries with different numbers of hits (only for wikipedia index)
>>>>>   * Different top N
>>>>>   * Different sorts (by title, for wikipedia, and by random string,
>>>>>     random int, and country for the random index)
>>>>> For each test, 7 search rounds are run and the best QPS is kept.  The
>>>>> script runs singlePQ then multiPQ, and records the resulting best QPS
>>>>> for each and produces table (in Jira format) as output.
>>>>>
>>>>>           
>>>> --
>>>> This message is automatically generated by JIRA.
>>>> -
>>>> You can reply to this email to add a comment to the issue online.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>         
>>> ------------------------------------------------------------------------
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>       
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Reply via email to