[jira] Created: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
Improve BufferedIndexInput.readBytes() performance
--

 Key: LUCENE-695
 URL: http://issues.apache.org/jira/browse/LUCENE-695
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 2.0.0
Reporter: Nadav Har'El
Priority: Minor


During a profiling session, I discovered that BufferedIndexInput.readBytes(),
the function which reads a bunch of bytes from an index, is very inefficient
in many cases. It is efficient for one or two bytes, and also efficient
for a very large number of bytes (e.g., when the norms are read all at once);
But for anything in between (e.g., 100 bytes), it is a performance disaster.
It can easily be improved, though, and below I include a patch to do that.

The basic problem in the existing code was that if you ask it to read 100
bytes, readBytes() simply calls readByte() 100 times in a loop, which means
we check byte after byte if the buffer has another character, instead of just
checking once how many bytes we have left, and copy them all at once.

My version, attached below, copies these 100 bytes if they are available at
bulk (using System.arraycopy), and if less than 100 are available, whatever
is available gets copied, and then the rest. (as before, when a very large
number of bytes is requested, it is read directly into the final buffer).

In my profiling, this fix caused amazing performance
improvement: previously, BufferedIndexInput.readBytes() took as much as 25%
of the run time, and after the fix, this was down to 1% of the run time! 
However, my scenario is *not* the typical Lucene code, but rather a version of 
Lucene with added payloads, and these payloads average at 100 bytes, where the 
original readBytes() did worst. I expect that my fix will have less of an 
impact on vanilla Lucene, but it still can have an impact because it is used 
for things like reading fields. (I am not aware of a standard Lucene benchmark, 
so I can't provide benchmarks on a more typical case).

In addition to the change to readBytes(), my attached patch also adds a new
unit test to BufferedIndexInput (which previously did not have a unit test).
This test simulates a file which contains a predictable series of bytes, and
then tries to read from it with readByte() and readButes() with various
sizes (many thousands of combinations are tried) and see that exactly the
expected bytes are read. This test is independent of my new readBytes()
inplementation, and can be used to check the old implementation as well.

By the way, it's interesting that BufferedIndexOutput.writeBytes was already 
efficient, and wasn't simply a loop of writeByte(). Only the reading code was 
inefficient. I wonder why this happened.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-695?page=all ]

Nadav Har'El updated LUCENE-695:


Attachment: readbytes.patch

The patch, which includes the change to BufferedIndexInput.readBytes(), and a 
new unit test for that class.

 Improve BufferedIndexInput.readBytes() performance
 --

 Key: LUCENE-695
 URL: http://issues.apache.org/jira/browse/LUCENE-695
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 2.0.0
Reporter: Nadav Har'El
Priority: Minor
 Attachments: readbytes.patch


 During a profiling session, I discovered that BufferedIndexInput.readBytes(),
 the function which reads a bunch of bytes from an index, is very inefficient
 in many cases. It is efficient for one or two bytes, and also efficient
 for a very large number of bytes (e.g., when the norms are read all at once);
 But for anything in between (e.g., 100 bytes), it is a performance disaster.
 It can easily be improved, though, and below I include a patch to do that.
 The basic problem in the existing code was that if you ask it to read 100
 bytes, readBytes() simply calls readByte() 100 times in a loop, which means
 we check byte after byte if the buffer has another character, instead of just
 checking once how many bytes we have left, and copy them all at once.
 My version, attached below, copies these 100 bytes if they are available at
 bulk (using System.arraycopy), and if less than 100 are available, whatever
 is available gets copied, and then the rest. (as before, when a very large
 number of bytes is requested, it is read directly into the final buffer).
 In my profiling, this fix caused amazing performance
 improvement: previously, BufferedIndexInput.readBytes() took as much as 25%
 of the run time, and after the fix, this was down to 1% of the run time! 
 However, my scenario is *not* the typical Lucene code, but rather a version 
 of Lucene with added payloads, and these payloads average at 100 bytes, where 
 the original readBytes() did worst. I expect that my fix will have less of an 
 impact on vanilla Lucene, but it still can have an impact because it is 
 used for things like reading fields. (I am not aware of a standard Lucene 
 benchmark, so I can't provide benchmarks on a more typical case).
 In addition to the change to readBytes(), my attached patch also adds a new
 unit test to BufferedIndexInput (which previously did not have a unit test).
 This test simulates a file which contains a predictable series of bytes, and
 then tries to read from it with readByte() and readButes() with various
 sizes (many thousands of combinations are tried) and see that exactly the
 expected bytes are read. This test is independent of my new readBytes()
 inplementation, and can be used to check the old implementation as well.
 By the way, it's interesting that BufferedIndexOutput.writeBytes was already 
 efficient, and wasn't simply a loop of writeByte(). Only the reading code was 
 inefficient. I wonder why this happened.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-551) Make Lucene - Java 1.9.1 Available in Maven2 repository in iBibilio.org

2006-10-24 Thread Marcel Reutegger (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-551?page=comments#action_12444300 ] 

Marcel Reutegger commented on LUCENE-551:
-

Are there any plans to also publish the new release to the Maven 1 repository 
on ibiblio.org? We at Jackrabbit still use Maven 1.0.2 as our build tool.

 Make Lucene - Java 1.9.1 Available in Maven2 repository in iBibilio.org
 ---

 Key: LUCENE-551
 URL: http://issues.apache.org/jira/browse/LUCENE-551
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 1.9
Reporter: Stephen Duncan Jr

 Please upload 1.9.1 release to iBiblio so that Maven users can easily use the 
 latest release.  Currently 1.4.3 is the most recently available version: 
 http://www.ibiblio.org/maven2/lucene/lucene/
 Please read the following FAQ for more information: 
 http://maven.apache.org/project-faq.html

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-695?page=comments#action_12444316 ] 

Yonik Seeley commented on LUCENE-695:
-

 I wonder why this happened.

readBytes on less than a buffer size probably only happens with binary (or 
compressed) fields, relatively new additions to Lucene, so it probably didn't 
have much of a real-world impact.   I think it is important to fix though, as 
more things may be byte-oriented in the future.

After applying the patch, at least one unit test fails:

[junit] Testcase: testReadPastEOF(org.apache.lucene.index.TestCompoundFile):
FAILED
[junit] Block read past end of file
[junit] junit.framework.AssertionFailedError: Block read past end of file
[junit] at org.apache.lucene.index.TestCompoundFile.testReadPastEOF(Test
CompoundFile.java:616)


 Improve BufferedIndexInput.readBytes() performance
 --

 Key: LUCENE-695
 URL: http://issues.apache.org/jira/browse/LUCENE-695
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 2.0.0
Reporter: Nadav Har'El
Priority: Minor
 Attachments: readbytes.patch


 During a profiling session, I discovered that BufferedIndexInput.readBytes(),
 the function which reads a bunch of bytes from an index, is very inefficient
 in many cases. It is efficient for one or two bytes, and also efficient
 for a very large number of bytes (e.g., when the norms are read all at once);
 But for anything in between (e.g., 100 bytes), it is a performance disaster.
 It can easily be improved, though, and below I include a patch to do that.
 The basic problem in the existing code was that if you ask it to read 100
 bytes, readBytes() simply calls readByte() 100 times in a loop, which means
 we check byte after byte if the buffer has another character, instead of just
 checking once how many bytes we have left, and copy them all at once.
 My version, attached below, copies these 100 bytes if they are available at
 bulk (using System.arraycopy), and if less than 100 are available, whatever
 is available gets copied, and then the rest. (as before, when a very large
 number of bytes is requested, it is read directly into the final buffer).
 In my profiling, this fix caused amazing performance
 improvement: previously, BufferedIndexInput.readBytes() took as much as 25%
 of the run time, and after the fix, this was down to 1% of the run time! 
 However, my scenario is *not* the typical Lucene code, but rather a version 
 of Lucene with added payloads, and these payloads average at 100 bytes, where 
 the original readBytes() did worst. I expect that my fix will have less of an 
 impact on vanilla Lucene, but it still can have an impact because it is 
 used for things like reading fields. (I am not aware of a standard Lucene 
 benchmark, so I can't provide benchmarks on a more typical case).
 In addition to the change to readBytes(), my attached patch also adds a new
 unit test to BufferedIndexInput (which previously did not have a unit test).
 This test simulates a file which contains a predictable series of bytes, and
 then tries to read from it with readByte() and readButes() with various
 sizes (many thousands of combinations are tried) and see that exactly the
 expected bytes are read. This test is independent of my new readBytes()
 inplementation, and can be used to check the old implementation as well.
 By the way, it's interesting that BufferedIndexOutput.writeBytes was already 
 efficient, and wasn't simply a loop of writeByte(). Only the reading code was 
 inefficient. I wonder why this happened.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Peter Keegan (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_12444317 ] 

Peter Keegan commented on LUCENE-693:
-

Yonik,

I tried out your patch, but it causes an exception on some boolean queries.
This one occurred on a boolean query with 3 required terms:

java.lang.ArrayIndexOutOfBoundsException: 2147483647
at org.apache.lucene.search.TermScorer.score(TermScorer.java:129)
at org.apache.lucene.search.ConjunctionScorer.score(
ConjunctionScorer.java:97)
at org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java
:186)
at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
:318)
at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
:282)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
at org.apache.lucene.search.Searcher.search(Searcher.java:116)
at org.apache.lucene.search.Searcher.search(Searcher.java:95)

It looks like the doc id has the sentinel value (Integer.MAX_VALUE).
Note: one of the terms had no occurrences in the index.

Peter



 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_12444319 ] 

Yonik Seeley commented on LUCENE-693:
-

Thanks for trying it out Peter.
Odd it could fail after passing all the Lucene unit tests... I assume this was 
the lucene trunk you were trying?
So the query was just a boolean query with three required term queries?

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-695?page=comments#action_12444322 ] 

Nadav Har'El commented on LUCENE-695:
-

Sorry, I didn't notice that my fix broke this unit test. Thanks for catching 
that.

What is happening is interesting: this test 
(TestCompoundFile.testReadPastEof()) is testing what happens when you read 40 
bytes beyond the end of file, and expects the appropriate exception to be 
thrown. The old code actually did this for 40 bytes, so it passed this test, 
but the interesting thing is that when you asked for more than a buffer-full of 
bytes, say, 10K, the length() checking code was not there! So the old code was 
broken in this respect, just not for 40 bytes which were tested.

I'll fix my patch to add this beyond-end-of-file check, and will post the new 
patch ASAP.

 Improve BufferedIndexInput.readBytes() performance
 --

 Key: LUCENE-695
 URL: http://issues.apache.org/jira/browse/LUCENE-695
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 2.0.0
Reporter: Nadav Har'El
Priority: Minor
 Attachments: readbytes.patch


 During a profiling session, I discovered that BufferedIndexInput.readBytes(),
 the function which reads a bunch of bytes from an index, is very inefficient
 in many cases. It is efficient for one or two bytes, and also efficient
 for a very large number of bytes (e.g., when the norms are read all at once);
 But for anything in between (e.g., 100 bytes), it is a performance disaster.
 It can easily be improved, though, and below I include a patch to do that.
 The basic problem in the existing code was that if you ask it to read 100
 bytes, readBytes() simply calls readByte() 100 times in a loop, which means
 we check byte after byte if the buffer has another character, instead of just
 checking once how many bytes we have left, and copy them all at once.
 My version, attached below, copies these 100 bytes if they are available at
 bulk (using System.arraycopy), and if less than 100 are available, whatever
 is available gets copied, and then the rest. (as before, when a very large
 number of bytes is requested, it is read directly into the final buffer).
 In my profiling, this fix caused amazing performance
 improvement: previously, BufferedIndexInput.readBytes() took as much as 25%
 of the run time, and after the fix, this was down to 1% of the run time! 
 However, my scenario is *not* the typical Lucene code, but rather a version 
 of Lucene with added payloads, and these payloads average at 100 bytes, where 
 the original readBytes() did worst. I expect that my fix will have less of an 
 impact on vanilla Lucene, but it still can have an impact because it is 
 used for things like reading fields. (I am not aware of a standard Lucene 
 benchmark, so I can't provide benchmarks on a more typical case).
 In addition to the change to readBytes(), my attached patch also adds a new
 unit test to BufferedIndexInput (which previously did not have a unit test).
 This test simulates a file which contains a predictable series of bytes, and
 then tries to read from it with readByte() and readButes() with various
 sizes (many thousands of combinations are tried) and see that exactly the
 expected bytes are read. This test is independent of my new readBytes()
 inplementation, and can be used to check the old implementation as well.
 By the way, it's interesting that BufferedIndexOutput.writeBytes was already 
 efficient, and wasn't simply a loop of writeByte(). Only the reading code was 
 inefficient. I wonder why this happened.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_12444334 ] 

Yonik Seeley commented on LUCENE-693:
-

I'm not sure how it's possible, but my version is *solwer* in the performance 
test I came up with.
Very odd... I'm not sure why.

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-693?page=all ]

Yonik Seeley updated LUCENE-693:


Attachment: conjunction.patch

Here is my current patch and test code (which currently seems to be slower with 
this patch).

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Peter Keegan (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_1208 ] 

Peter Keegan commented on LUCENE-693:
-

Well, I'm seeing a good 7% increase over the trunk version. Conjunction
scorer time is mostly in 'skipto' now, which seems reasonable.

Do the test cases try queries with non-existent terms? My failed query
contained 3 required terms, but one of the terms was misspelled and didn't
exist in the index.

Peter



 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-693?page=all ]

Yonik Seeley updated LUCENE-693:


Attachment: conjunction.patch

This version removes the docs[] array and seems to be slightly faster.
Still slower on the synthetic random ConstantScoreQuery tests though.

If anyone else as real-world benchmarks they can try, I'd appreciate the data.

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Peter Keegan (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_1236 ] 

Peter Keegan commented on LUCENE-693:
-

fwiw, my tests were done using 'real world' queries and index. Most queries
have several required clauses. The jvm is 1.6 beta2 with -server. I would be
interested to see results from others, too.

thanks Yonik!

Peter



 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-686) Resources not always reclaimed in scorers after each search

2006-10-24 Thread Hoss Man (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-686?page=comments#action_1238 ] 

Hoss Man commented on LUCENE-686:
-

Quick summary of some discussion from the mailing list...

1) i replied to paul's comments in the bug indicating that while there may not 
be any leaks in the core code base, these changes were needed to allow people 
writing custom Directories or custom Scorers to avoid memory leaks.
2) paul suggested that people writing custom code can work arround this by 
subclassing/customizing the Directory, and all the Scorers, and the 
IndexSearcher
3) i suggested that made the barrier for new custom code rather high, and made 
a poor comparison that got us on a tangent.
4) i argued that since TermDocs had a close method, Scorers needed to call it, 
which ment they needed a close method which was garunteed to be called.
5) paul argued that TermDocs.close in the core right now isn't needed, and we 
might be better off removing it, and requiring any more complicated custom 
implimentations to rely on GC to clean up any resources they have (presumably 
using a finalize method)
6) steven_parkes then raised the point that the fundemental issue is design 
integrity ... we have to agree what the point of TermDocs.close is from an API 
standpoint, and that callers should not have to know what the concrete 
implimentation of hte callee is to know wether close needs to be called.  
Better documentation on the purpose of the method can lead to better discussion 
about wether it can be removed, or if the current behavior is a bug that needs 
fixed.

 Resources not always reclaimed in scorers after each search
 ---

 Key: LUCENE-686
 URL: http://issues.apache.org/jira/browse/LUCENE-686
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
 Environment: All
Reporter: Ning Li
 Attachments: ScorerResourceGC.patch


 Resources are not always reclaimed in scorers after each search.
 For example, close() is not always called for term docs in TermScorer.
 A test will be attached to show when resources are not reclaimed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Scorer.skipTo() valid before next()?

2006-10-24 Thread Chris Hostetter
: I got a bit of a surprise trying to re-implement the ConjunctionScorer.
: It turns out that skipTo(0) does not always return the same thing as
: next() on a newly created scorer.  Some scorers give invalid results
: if skipTo() is called before next().

that sounds like a bug to me...

: The javddoc is unclear on the subject, but the javadoc for both
: score() and skipTo() suggest that calling skipTo() first is valid, and
: that seems to make more sense.

i don't see why you would say the javadoc is unclear, the javadoc for
skipTo seems very clear on the subject.  skipTo(0) should be functionaly
equivilent to...

do {
  if (!next())
return false;
} while (0  doc());
return true;



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Paul Elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_1287 ] 

Paul Elschot commented on LUCENE-693:
-

Yonik,

you wrote: 
 but then learned that calling skipTo() before calling next() doesn't always 
 work.

Could you describe a case in which skipTo() before next()  does not work?

skipTo() as first call on a scorer should work. ReqExclScorer and 
ReqOptSumScorer depend on that for the excluded and optional scorers.

Regards,
Paul Elschot


 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-696) Scorer.skipTo() doesn't always work if called before next()

2006-10-24 Thread Yonik Seeley (JIRA)
Scorer.skipTo() doesn't always work if called before next()
---

 Key: LUCENE-696
 URL: http://issues.apache.org/jira/browse/LUCENE-696
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yonik Seeley


skipTo() doesn't work for all scorers if called before next().

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_1296 ] 

Yonik Seeley commented on LUCENE-693:
-

 Could you describe a case in which skipTo() before next() does not work?

I don't recall, but my attempt to speed up ConjunctionScorer flushed them out.
I'll move back to an older version of that to see what failed and put
details here: http://issues.apache.org/jira/browse/LUCENE-696

 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-696) Scorer.skipTo() doesn't always work if called before next()

2006-10-24 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-696?page=comments#action_12444500 ] 

Yonik Seeley commented on LUCENE-696:
-

It would also simplify some scorers if doc() wasn't undefined before next() or 
skipTo() was called, but instead -1.
This undefined nature of doc() often requires more state to be kept around 
about the scorers.
Things like TermScorer would just need a change from int doc to int doc=-1

Is there any scorer that this would impose a burden or cost on?
Thoughts?

 Scorer.skipTo() doesn't always work if called before next()
 ---

 Key: LUCENE-696
 URL: http://issues.apache.org/jira/browse/LUCENE-696
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yonik Seeley

 skipTo() doesn't work for all scorers if called before next().

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-696) Scorer.skipTo() doesn't always work if called before next()

2006-10-24 Thread Paul Elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-696?page=comments#action_12444506 ] 

Paul Elschot commented on LUCENE-696:
-

Repeating a comment just posted at LUCENE-693:

skipTo() as first call on a scorer should work. ReqExclScorer and 
ReqOptSumScorer depend on that for the excluded and optional scorers.


 Scorer.skipTo() doesn't always work if called before next()
 ---

 Key: LUCENE-696
 URL: http://issues.apache.org/jira/browse/LUCENE-696
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yonik Seeley

 skipTo() doesn't work for all scorers if called before next().

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-696) Scorer.skipTo() doesn't always work if called before next()

2006-10-24 Thread Chris Hostetter

: It would also simplify some scorers if doc() wasn't undefined before
: next() or skipTo() was called, but instead -1.

+1 ... but if we are goingg to change the API requirements for doc(),
we should clarify the requirements or score() ... with doc(), negative
numbers can easily be used as a marker of invalid, but the same rule
isn't as easy to apply with the score() method ... perhaps the
documentation for doc() and score() should be...

doc():   Returns the current document number matching the query.
 Returns -1 if neither next() or skipTo() have been called at
 least once, behavior is undefined if the last call to next()
 or skipTo returned false.
score(): Returns the score of the current document matching the query.
 The value is undefined if doc() reurns -1, or if the last
 call to next() or skipTo returned false.


...we probably want to make the same API changes to Spans, TermEnum,
and TermDocs as well to be consistent.

-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-528) Optimization for IndexWriter.addIndexes()

2006-10-24 Thread Ning Li (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-528?page=all ]

Ning Li updated LUCENE-528:
---

Attachment: AddIndexesNoOptimize.patch

This patch implements addIndexesNoOptimize() following the algorithm described 
earlier.
  - The patch is based on the latest version from trunk.
  - AddIndexesNoOptimize() is implemented. The algorithm description is 
included as comment and the code is commented.
  - The patch includes a test called TestAddIndexesNoOptimize which covers all 
the code in addIndexesNoOptimize().
  - maybeMergeSegments() was conservative and checked for more merges only when 
upperBound * mergeFactor = maxMergeDocs. Change it to check for more merges 
when upperBound  maxMergeDocs.
  - Minor changes in TestIndexWriterMergePolicy to better verify merge 
invariants.
  - The patch passes all unit tests.

One more comment on the implementation:
  - When we copy un-merged segments from S in step 4, ideally, we want to 
simply copy
those segments. However, directory does not support copy yet. In addition, 
source may
use compound file or not and target may use compound file or not. So we use
mergeSegments() to copy each segment, which may cause doc count to change
because deleted docs are garbage collected. That case is handled properly.  

 Optimization for IndexWriter.addIndexes()
 -

 Key: LUCENE-528
 URL: http://issues.apache.org/jira/browse/LUCENE-528
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Steven Tamm
 Assigned To: Otis Gospodnetic
Priority: Minor
 Attachments: AddIndexes.patch, AddIndexesNoOptimize.patch


 One big performance problem with IndexWriter.addIndexes() is that it has to 
 optimize the index both before and after adding the segments.  When you have 
 a very large index, to which you are adding batches of small updates, these 
 calls to optimize make using addIndexes() impossible.  It makes parallel 
 updates very frustrating.
 Here is an optimized function that helps out by calling mergeSegments only on 
 the newly added documents.  It will try to avoid calling mergeSegments until 
 the end, unless you're adding a lot of documents at once.
 I also have an extensive unit test that verifies that this function works 
 correctly if people are interested.  I gave it a different name because it 
 has very different performance characteristics which can make querying take 
 longer.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-696) Scorer.skipTo() doesn't always work if called before next()

2006-10-24 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-696?page=all ]

Yonik Seeley updated LUCENE-696:


Attachment: dismax.patch

DisjunctionMaxScorer turned out to be the only scorer I could see with that 
problem.
Here's the patch w/ tests.

 Scorer.skipTo() doesn't always work if called before next()
 ---

 Key: LUCENE-696
 URL: http://issues.apache.org/jira/browse/LUCENE-696
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yonik Seeley
 Attachments: dismax.patch


 skipTo() doesn't work for all scorers if called before next().

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-697) Scorer.skipTo affects sloppyPhrase scoring

2006-10-24 Thread Yonik Seeley (JIRA)
Scorer.skipTo affects sloppyPhrase scoring
--

 Key: LUCENE-697
 URL: http://issues.apache.org/jira/browse/LUCENE-697
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.0.0
Reporter: Yonik Seeley


If you mix skipTo() and next(), you get different scores than what is returned 
to a hit collector.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-697) Scorer.skipTo affects sloppyPhrase scoring

2006-10-24 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-697?page=comments#action_12444565 ] 

Yonik Seeley commented on LUCENE-697:
-

Here's the ant output from test code to be checked in shortly.
The test code calls skipTo(), skipTo(), next(), next(), etc 
while checking that the results match the hitcollector version.

[junit] Testcase: testP6(org.apache.lucene.search.TestSimpleExplanations):
Caused an ERROR
[junit] ERROR matching docs:
[junit] scorer.more=true doc=1 score=0.7849069
[junit] hitCollector.doc=1 score=0.67974937
[junit]  Scorer=scorer(weight(field:w3 w2~2))
[junit]  Query=field:w3 w2~2
[junit]  [EMAIL PROTECTED]
[junit] java.lang.RuntimeException: ERROR matching docs:
[junit] scorer.more=true doc=1 score=0.7849069
[junit] hitCollector.doc=1 score=0.67974937
[junit]  Scorer=scorer(weight(field:w3 w2~2))
[junit]  Query=field:w3 w2~2
[junit]  [EMAIL PROTECTED]
[junit] at org.apache.lucene.search.QueryUtils$2.collect(QueryUtils.java
:104)
[junit] at org.apache.lucene.search.Scorer.score(Scorer.java:48)
[junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.j
ava:132)
[junit] at org.apache.lucene.search.Searcher.search(Searcher.java:116)
[junit] at org.apache.lucene.search.Searcher.search(Searcher.java:95)
[junit] at org.apache.lucene.search.QueryUtils.checkSkipTo(QueryUtils.ja
va:97)
[junit] at org.apache.lucene.search.QueryUtils.check(QueryUtils.java:75)

[junit] at org.apache.lucene.search.CheckHits.checkHitCollector(CheckHit
s.java:91)
[junit] at org.apache.lucene.search.TestExplanations.qtest(TestExplanati
ons.java:90)
[junit] at org.apache.lucene.search.TestExplanations.qtest(TestExplanati
ons.java:86)
[junit] at org.apache.lucene.search.TestSimpleExplanations.testP6(TestSi
mpleExplanations.java:87)


[junit] Testcase: testP7(org.apache.lucene.search.TestSimpleExplanations):
Caused an ERROR
[junit] ERROR matching docs:
[junit] scorer.more=true doc=1 score=0.7849069
[junit] hitCollector.doc=1 score=0.67974937
[junit]  Scorer=scorer(weight(field:w3 w2~3))
[junit]  Query=field:w3 w2~3
[junit]  [EMAIL PROTECTED]
[junit] java.lang.RuntimeException: ERROR matching docs:
[junit] scorer.more=true doc=1 score=0.7849069
[junit] hitCollector.doc=1 score=0.67974937
[junit]  Scorer=scorer(weight(field:w3 w2~3))
[junit]  Query=field:w3 w2~3
[junit]  [EMAIL PROTECTED]
[junit] at org.apache.lucene.search.QueryUtils$2.collect(QueryUtils.java
:104)
[junit] at org.apache.lucene.search.Scorer.score(Scorer.java:48)
[junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.j
ava:132)
[junit] at org.apache.lucene.search.Searcher.search(Searcher.java:116)
[junit] at org.apache.lucene.search.Searcher.search(Searcher.java:95)
[junit] at org.apache.lucene.search.QueryUtils.checkSkipTo(QueryUtils.ja
va:97)
[junit] at org.apache.lucene.search.QueryUtils.check(QueryUtils.java:75)

[junit] at org.apache.lucene.search.CheckHits.checkHitCollector(CheckHit
s.java:91)
[junit] at org.apache.lucene.search.TestExplanations.qtest(TestExplanati
ons.java:90)
[junit] at org.apache.lucene.search.TestExplanations.qtest(TestExplanati
ons.java:86)
[junit] at org.apache.lucene.search.TestSimpleExplanations.testP7(TestSi
mpleExplanations.java:90)


[junit] Test org.apache.lucene.search.TestSimpleExplanations FAILED

 Scorer.skipTo affects sloppyPhrase scoring
 --

 Key: LUCENE-697
 URL: http://issues.apache.org/jira/browse/LUCENE-697
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.0.0
Reporter: Yonik Seeley

 If you mix skipTo() and next(), you get different scores than what is 
 returned to a hit collector.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-698) FilteredQuery ignores boost

2006-10-24 Thread Yonik Seeley (JIRA)
FilteredQuery ignores boost
---

 Key: LUCENE-698
 URL: http://issues.apache.org/jira/browse/LUCENE-698
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yonik Seeley


Filtered query ignores it's own boost.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-698) FilteredQuery ignores boost

2006-10-24 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-698?page=comments#action_12444570 ] 

Yonik Seeley commented on LUCENE-698:
-

I just commited hashCode() and equals() changes to take boost into account so 
that
generic tests in QueryUtils.check(query) can pass.

One should arguably be able to set the boost on any query clause, so I'm 
leaving this open since I think scoring probably ignores the boost too.

 FilteredQuery ignores boost
 ---

 Key: LUCENE-698
 URL: http://issues.apache.org/jira/browse/LUCENE-698
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yonik Seeley

 Filtered query ignores it's own boost.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-696) Scorer.skipTo() doesn't always work if called before next()

2006-10-24 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-696?page=all ]

Yonik Seeley resolved LUCENE-696.
-

Fix Version/s: 2.0.1
   Resolution: Fixed
 Assignee: Yonik Seeley

Patch committed after further tests were added.

 Scorer.skipTo() doesn't always work if called before next()
 ---

 Key: LUCENE-696
 URL: http://issues.apache.org/jira/browse/LUCENE-696
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yonik Seeley
 Assigned To: Yonik Seeley
 Fix For: 2.0.1

 Attachments: dismax.patch


 skipTo() doesn't work for all scorers if called before next().

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-697) Scorer.skipTo affects sloppyPhrase scoring

2006-10-24 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-697?page=comments#action_12444573 ] 

Yonik Seeley commented on LUCENE-697:
-

Comment out line 104 of QueryUtils.java to reproduce this problem:

  scoreDiff=0; // TODO: remove this go get LUCENE-697 failures 


 Scorer.skipTo affects sloppyPhrase scoring
 --

 Key: LUCENE-697
 URL: http://issues.apache.org/jira/browse/LUCENE-697
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.0.0
Reporter: Yonik Seeley

 If you mix skipTo() and next(), you get different scores than what is 
 returned to a hit collector.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]