[jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

2007-11-20 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-693:


Attachment: conjunction.patch

Whew... I'd forgotten about this issue.  I brushed up one of the last versions 
I had lying around from a year ago (see lastest conjunction.patch), fixed up my 
synthetic tests a bit, and got some decent results:

1% faster in top level term conjunctions (wheee)
49% faster in a conjunction of nested term conjunctions (no sort per call to 
skipTo)
5% faster in a top level ConstantScoreQuery conjunction
144% faster in a conjunction of nested ConstantScoreQuery conjunctions

A sort is done the first time, and the scorers are ordered so that the highest 
will skip first (the idea being that there may be a little info in the first 
skip about which scorer is most sparse).

Michael Busch recently brought up a related idea... that one could skip on low 
df terms first... but that would of course require some terms in the 
conjunction.

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: https://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch, conjunction.patch, 
 conjunction.patch, conjunction.patch.nosort1


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

2007-11-20 Thread robert engels
Sorry if this is somewhat off topic, but it seems at least marginally  
related to this...


We are still using Lucene 1.9.1+, and I am wondering if there has  
been any improvements in searching on AND clauses when some of the  
terms are very infrequent...


This change seems appropriate.  Are there others associated with the  
performance gains?


If you were going to back-port some of the later changes, can anyone  
give some advice as to the biggest bang for the buck.  Hopefully  
those not involving an index format change.


Thanks.
Robert

On Nov 21, 2007, at 1:16 AM, Yonik Seeley (JIRA) wrote:



 [ https://issues.apache.org/jira/browse/LUCENE-693? 
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]


Yonik Seeley updated LUCENE-693:


Attachment: conjunction.patch

Whew... I'd forgotten about this issue.  I brushed up one of the  
last versions I had lying around from a year ago (see lastest  
conjunction.patch), fixed up my synthetic tests a bit, and got some  
decent results:


1% faster in top level term conjunctions (wheee)
49% faster in a conjunction of nested term conjunctions (no sort  
per call to skipTo)

5% faster in a top level ConstantScoreQuery conjunction
144% faster in a conjunction of nested ConstantScoreQuery conjunctions

A sort is done the first time, and the scorers are ordered so that  
the highest will skip first (the idea being that there may be a  
little info in the first skip about which scorer is most sparse).


Michael Busch recently brought up a related idea... that one could  
skip on low df terms first... but that would of course require some  
terms in the conjunction.



ConjunctionScorer - more tuneup
---

Key: LUCENE-693
URL: https://issues.apache.org/jira/browse/LUCENE-693
Project: Lucene - Java
 Issue Type: Bug
 Components: Search
   Affects Versions: 2.1
Environment: Windows Server 2003 x64, Java 1.6, pretty  
large index

   Reporter: Peter Keegan
Attachments: conjunction.patch, conjunction.patch,  
conjunction.patch, conjunction.patch, conjunction.patch.nosort1



(See also: #LUCENE-443)
I did some profile testing with the new ConjuctionScorer in 2.1  
and discovered a new bottleneck in ConjunctionScorer.sortScorers.  
The java.utils.Arrays.sort method is cloning the Scorers array on  
every sort, which is quite expensive on large indexes because of  
the size of the 'norms' array within, and isn't necessary.

Here is one possible solution:
  private void sortScorers() {
// squeeze the array down for the sort
//if (length != scorers.length) {
//  Scorer[] temps = new Scorer[length];
//  System.arraycopy(scorers, 0, temps, 0, length);
//  scorers = temps;
//}
insertionSort( scorers,length );
// note that this comparator is not consistent with equals!
//Arrays.sort(scorers, new Comparator() { // sort the  
array

//public int compare(Object o1, Object o2) {
//  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
//}
//  });

first = 0;
last = length - 1;
  }
  private void insertionSort( Scorer[] scores, int len)
  {
  for (int i=0; ilen; i++) {
  for (int j=i; j0  scores[j-1].doc()  scores[j].doc 
();j-- ) {

  swap (scores, j, j-1);
  }
  }
  return;
  }
  private void swap(Object[] x, int a, int b) {
Object t = x[a];
x[a] = x[b];
x[b] = t;
  }

The squeezing of the array is no longer needed.
We also initialized the Scorers array to 8 (instead of 2) to avoid  
having to grow the array for common queries, although this  
probably has less performance impact.

This change added about 3% to query throughput in my testing.
Peter


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

2007-11-20 Thread Michael Busch
robert engels wrote:
 
 We are still using Lucene 1.9.1+, and I am wondering if there has been
 any improvements in searching on AND clauses when some of the terms are
 very infrequent...
 

multi-level skipping should help when an AND query has frequent and
infrequent terms. See LUCENE-866 for some performance numbers.

-Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-25 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-693?page=all ]

Yonik Seeley updated LUCENE-693:


Attachment: conjunction.patch.nosort1

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch, conjunction.patch, 
 conjunction.patch.nosort1


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-693?page=all ]

Yonik Seeley updated LUCENE-693:


Attachment: conjunction.patch

Here is my current patch and test code (which currently seems to be slower with 
this patch).

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-693?page=all ]

Yonik Seeley updated LUCENE-693:


Attachment: conjunction.patch

This version removes the docs[] array and seems to be slightly faster.
Still slower on the synthetic random ConstantScoreQuery tests though.

If anyone else as real-world benchmarks they can try, I'd appreciate the data.

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-23 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-693?page=all ]

Yonik Seeley updated LUCENE-693:


Attachment: conjunction.patch

Here's a patch that:
1) nails things down in the constructor (removes incremental add code)
2) removes sorting
3) always skips to the highest docid seen as opposed to target

It should be faster in almost all cases, but I'll make a performance test to 
verify tomorrow.

Notes: the docs[] array could be removed... I started out with it because you 
can't currently depend on doc() to tell you the position because the javadoc 
says it's undefined at first (as opposed to -1).  So I switched to using a 
docs[] array and initialized that to -1, but then learned that calling skipTo() 
before calling next() doesn't always work.

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]