[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

Yonik Seeley (JIRA) Wed, 25 Oct 2006 20:13:04 -0700

    [ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_12444770 ] 
            
Yonik Seeley commented on LUCENE-693:
-------------------------------------


conjunction.patch.nosort1 is a slightly more elegant solution that does
not require any initial setup of the scorers (no calling next() in the 
constructor).
It's one of the fastest yet, but still slower in some cases, and I've figured 
out why.

With a conjunction at the top-level only (like my synthetic tests), only next() 
is called, so the sort is only done once.  The existing next() logic is 
simpler, and hence faster.  If a conjunction is nested somewhere, skipTo may be 
called on it, and that's when the current version is much faster since it 
avoids the sort.

nosort1_time / trunk_time for certain tests (relative perf, lower is better)
testConjunctionPerf : 1.22   (slower, only next is called)
testNestedConjunctionPerf: 0.35 (much faster, skipTo() is exercized)
testConjunctionTerms: 1.00 (only next() called, but term scorer dominates time 
anyway)
testNestedConjunctionTerms: 0.97 (slight improvement, masked by term scorer 
time)

This may not be the final version, but I think it's good to have it available 
anyway.

> ConjunctionScorer - more tuneup
> -------------------------------
>
>                 Key: LUCENE-693
>                 URL: http://issues.apache.org/jira/browse/LUCENE-693
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.1
>         Environment: Windows Server 2003 x64, Java 1.6, pretty large index
>            Reporter: Peter Keegan
>         Attachments: conjunction.patch, conjunction.patch, conjunction.patch, 
> conjunction.patch.nosort1
>
>
> (See also: #LUCENE-443)
> I did some profile testing with the new ConjuctionScorer in 2.1 and 
> discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
> java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
> which is quite expensive on large indexes because of the size of the 'norms' 
> array within, and isn't necessary. 
> Here is one possible solution:
>   private void sortScorers() {
> // squeeze the array down for the sort
> //    if (length != scorers.length) {
> //      Scorer[] temps = new Scorer[length];
> //      System.arraycopy(scorers, 0, temps, 0, length);
> //      scorers = temps;
> //    }
>     insertionSort( scorers,length );
>     // note that this comparator is not consistent with equals!
> //    Arrays.sort(scorers, new Comparator() {         // sort the array
> //        public int compare(Object o1, Object o2) {
> //          return ((Scorer)o1).doc() - ((Scorer)o2).doc();
> //        }
> //      });
>   
>     first = 0;
>     last = length - 1;
>   }
>   private void insertionSort( Scorer[] scores, int len)
>   {
>       for (int i=0; i<len; i++) {
>           for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- ) {
>               swap (scores, j, j-1);
>           }
>       }
>       return;
>   }
>   private void swap(Object[] x, int a, int b) {
>     Object t = x[a];
>     x[a] = x[b];
>     x[b] = t;
>   }
>  
> The squeezing of the array is no longer needed. 
> We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
> grow the array for common queries, although this probably has less 
> performance impact.
> This change added about 3% to query throughput in my testing.
> Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

Reply via email to