I did some profile testing with the new ConjuctionScorer in 2.1 and
discovered a new bottleneck in ConjunctionScorer.sortScorers. The
java.utils.Arrays.sort method is cloning the Scorers array on every sort,
which is quite expensive on large indexes because of the size of the 'norms'
array within, and isn't necessary.  We rewrote this to use simple insertion
sorting as follows:

 private void sortScorers() {
// squeeze the array down for the sort
//    if (length != scorers.length) {
//      Scorer[] temps = new Scorer[length];
//      System.arraycopy(scorers, 0, temps, 0, length);
//      scorers = temps;
//    }
   insertionSort( scorers,length );
   // note that this comparator is not consistent with equals!
//    Arrays.sort(scorers, new Comparator() {         // sort the array
//        public int compare(Object o1, Object o2) {
//          return ((Scorer)o1).doc() - ((Scorer)o2).doc();
//        }
//      });

   first = 0;
   last = length - 1;
 }
 private void insertionSort( Scorer[] scores, int len)
 {
     for (int i=0; i<len; i++) {
         for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- ) {
             swap (scores, j, j-1);
         }
     }
     return;
 }
 private void swap(Object[] x, int a, int b) {
   Object t = x[a];
   x[a] = x[b];
   x[b] = t;
 }

The squeezing of the array is no longer needed.
We also initialized the Scorers array to 8 (instead of 2) to avoid having to
grow the array for common queries, although this probably has less
performance impact.

This change added about 3% to query throughput.

Peter


On 10/17/06, Yonik Seeley (JIRA) <[EMAIL PROTECTED]> wrote:

     [ http://issues.apache.org/jira/browse/LUCENE-443?page=all ]

Yonik Seeley resolved LUCENE-443.
---------------------------------

    Fix Version/s: 2.1
       Resolution: Fixed
         Assignee: Yonik Seeley

Thanks!  I just committed this.

> ConjunctionScorer tune-up
> -------------------------
>
>                 Key: LUCENE-443
>                 URL: http://issues.apache.org/jira/browse/LUCENE-443
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.9
>         Environment: Linux, Java 1.5, Large Index with 4 million items
and some heavily nested boolean queries
>            Reporter: Abdul Chaudhry
>         Assigned To: Yonik Seeley
>             Fix For: 2.1
>
>         Attachments: Conjunction20060921.patch, ConjunctionScorer.java,
ConjunctionScorer.java
>
>
> I just recently ran a load test on the latest code from lucene , which
is using a new BooleanScore and noticed the ConjunctionScorer was crunching
through objects , especially while sorting as part of the skipTo call. It
turns a linked list into an array, sorts the array, then converts the array
back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large
index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first
and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage
collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to