I did some profile testing with the new ConjuctionScorer in 2.1 and discovered a new bottleneck in ConjunctionScorer.sortScorers. The java.utils.Arrays.sort method is cloning the Scorers array on every sort, which is quite expensive on large indexes because of the size of the 'norms' array within, and isn't necessary. We rewrote this to use simple insertion sorting as follows:
private void sortScorers() { // squeeze the array down for the sort // if (length != scorers.length) { // Scorer[] temps = new Scorer[length]; // System.arraycopy(scorers, 0, temps, 0, length); // scorers = temps; // } insertionSort( scorers,length ); // note that this comparator is not consistent with equals! // Arrays.sort(scorers, new Comparator() { // sort the array // public int compare(Object o1, Object o2) { // return ((Scorer)o1).doc() - ((Scorer)o2).doc(); // } // }); first = 0; last = length - 1; } private void insertionSort( Scorer[] scores, int len) { for (int i=0; i<len; i++) { for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- ) { swap (scores, j, j-1); } } return; } private void swap(Object[] x, int a, int b) { Object t = x[a]; x[a] = x[b]; x[b] = t; } The squeezing of the array is no longer needed. We also initialized the Scorers array to 8 (instead of 2) to avoid having to grow the array for common queries, although this probably has less performance impact. This change added about 3% to query throughput. Peter On 10/17/06, Yonik Seeley (JIRA) <[EMAIL PROTECTED]> wrote:
[ http://issues.apache.org/jira/browse/LUCENE-443?page=all ] Yonik Seeley resolved LUCENE-443. --------------------------------- Fix Version/s: 2.1 Resolution: Fixed Assignee: Yonik Seeley Thanks! I just committed this. > ConjunctionScorer tune-up > ------------------------- > > Key: LUCENE-443 > URL: http://issues.apache.org/jira/browse/LUCENE-443 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Affects Versions: 1.9 > Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries > Reporter: Abdul Chaudhry > Assigned To: Yonik Seeley > Fix For: 2.1 > > Attachments: Conjunction20060921.patch, ConjunctionScorer.java, ConjunctionScorer.java > > > I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below. > 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries > Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list. > This scaled much better during my load test as the java gargbage collector was less - umm - virulent -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]