Re: [jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

robert engels Tue, 20 Nov 2007 23:28:47 -0800

Sorry if this is somewhat off topic, but it seems at least marginallyrelated to this...

We are still using Lucene 1.9.1+, and I am wondering if there hasbeen any improvements in searching on AND clauses when some of theterms are very infrequent...

This change seems appropriate. Are there others associated with theperformance gains?

If you were going to back-port some of the later changes, can anyonegive some advice as to the biggest "bang for the buck". Hopefullythose not involving an index format change.


Thanks.
Robert

On Nov 21, 2007, at 1:16 AM, Yonik Seeley (JIRA) wrote:

[ https://issues.apache.org/jira/browse/LUCENE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yonik Seeley updated LUCENE-693:
--------------------------------

    Attachment: conjunction.patch
Whew... I'd forgotten about this issue. I brushed up one of thelast versions I had lying around from a year ago (see lastestconjunction.patch), fixed up my synthetic tests a bit, and got somedecent results:
1% faster in top level term conjunctions (wheee)
49% faster in a conjunction of nested term conjunctions (no sortper call to skipTo)
5% faster in a top level ConstantScoreQuery conjunction
144% faster in a conjunction of nested ConstantScoreQuery conjunctions
A sort is done the first time, and the scorers are ordered so thatthe highest will skip first (the idea being that there may be alittle info in the first skip about which scorer is most sparse).
Michael Busch recently brought up a related idea... that one couldskip on low df terms first... but that would of course require someterms in the conjunction.
ConjunctionScorer - more tuneup
-------------------------------

                Key: LUCENE-693
                URL: https://issues.apache.org/jira/browse/LUCENE-693
            Project: Lucene - Java
         Issue Type: Bug
         Components: Search
   Affects Versions: 2.1
Environment: Windows Server 2003 x64, Java 1.6, prettylarge index
           Reporter: Peter Keegan
Attachments: conjunction.patch, conjunction.patch,conjunction.patch, conjunction.patch, conjunction.patch.nosort1
(See also: #LUCENE-443)
I did some profile testing with the new ConjuctionScorer in 2.1and discovered a new bottleneck in ConjunctionScorer.sortScorers.The java.utils.Arrays.sort method is cloning the Scorers array onevery sort, which is quite expensive on large indexes because ofthe size of the 'norms' array within, and isn't necessary.
Here is one possible solution:
  private void sortScorers() {
// squeeze the array down for the sort
//    if (length != scorers.length) {
//      Scorer[] temps = new Scorer[length];
//      System.arraycopy(scorers, 0, temps, 0, length);
//      scorers = temps;
//    }
    insertionSort( scorers,length );
    // note that this comparator is not consistent with equals!
// Arrays.sort(scorers, new Comparator() { // sort thearray
//        public int compare(Object o1, Object o2) {
//          return ((Scorer)o1).doc() - ((Scorer)o2).doc();
//        }
//      });

    first = 0;
    last = length - 1;
  }
  private void insertionSort( Scorer[] scores, int len)
  {
      for (int i=0; i<len; i++) {
for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- ) {
              swap (scores, j, j-1);
          }
      }
      return;
  }
  private void swap(Object[] x, int a, int b) {
    Object t = x[a];
    x[a] = x[b];
    x[b] = t;
  }

The squeezing of the array is no longer needed.
We also initialized the Scorers array to 8 (instead of 2) to avoidhaving to grow the array for common queries, although thisprobably has less performance impact.
This change added about 3% to query throughput in my testing.
Peter
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

Reply via email to