[jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

Yonik Seeley (JIRA) Tue, 20 Nov 2007 23:17:06 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yonik Seeley updated LUCENE-693:
--------------------------------

    Attachment: conjunction.patch

Whew... I'd forgotten about this issue.  I brushed up one of the last versions 
I had lying around from a year ago (see lastest conjunction.patch), fixed up my 
synthetic tests a bit, and got some decent results:

1% faster in top level term conjunctions (wheee)
49% faster in a conjunction of nested term conjunctions (no sort per call to 
skipTo)
5% faster in a top level ConstantScoreQuery conjunction
144% faster in a conjunction of nested ConstantScoreQuery conjunctions

A sort is done the first time, and the scorers are ordered so that the highest 
will skip first (the idea being that there may be a little info in the first 
skip about which scorer is most sparse).

Michael Busch recently brought up a related idea... that one could skip on low 
df terms first... but that would of course require some terms in the 
conjunction.

> ConjunctionScorer - more tuneup
> -------------------------------
>
>                 Key: LUCENE-693
>                 URL: https://issues.apache.org/jira/browse/LUCENE-693
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.1
>         Environment: Windows Server 2003 x64, Java 1.6, pretty large index
>            Reporter: Peter Keegan
>         Attachments: conjunction.patch, conjunction.patch, conjunction.patch, 
> conjunction.patch, conjunction.patch.nosort1
>
>
> (See also: #LUCENE-443)
> I did some profile testing with the new ConjuctionScorer in 2.1 and 
> discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
> java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
> which is quite expensive on large indexes because of the size of the 'norms' 
> array within, and isn't necessary. 
> Here is one possible solution:
>   private void sortScorers() {
> // squeeze the array down for the sort
> //    if (length != scorers.length) {
> //      Scorer[] temps = new Scorer[length];
> //      System.arraycopy(scorers, 0, temps, 0, length);
> //      scorers = temps;
> //    }
>     insertionSort( scorers,length );
>     // note that this comparator is not consistent with equals!
> //    Arrays.sort(scorers, new Comparator() {         // sort the array
> //        public int compare(Object o1, Object o2) {
> //          return ((Scorer)o1).doc() - ((Scorer)o2).doc();
> //        }
> //      });
>   
>     first = 0;
>     last = length - 1;
>   }
>   private void insertionSort( Scorer[] scores, int len)
>   {
>       for (int i=0; i<len; i++) {
>           for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- ) {
>               swap (scores, j, j-1);
>           }
>       }
>       return;
>   }
>   private void swap(Object[] x, int a, int b) {
>     Object t = x[a];
>     x[a] = x[b];
>     x[b] = t;
>   }
>  
> The squeezing of the array is no longer needed. 
> We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
> grow the array for common queries, although this probably has less 
> performance impact.
> This change added about 3% to query throughput in my testing.
> Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

Reply via email to