[ https://issues.apache.org/jira/browse/LUCENE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yonik Seeley updated LUCENE-693: -------------------------------- Attachment: conjunction.patch Whew... I'd forgotten about this issue. I brushed up one of the last versions I had lying around from a year ago (see lastest conjunction.patch), fixed up my synthetic tests a bit, and got some decent results: 1% faster in top level term conjunctions (wheee) 49% faster in a conjunction of nested term conjunctions (no sort per call to skipTo) 5% faster in a top level ConstantScoreQuery conjunction 144% faster in a conjunction of nested ConstantScoreQuery conjunctions A sort is done the first time, and the scorers are ordered so that the highest will skip first (the idea being that there may be a little info in the first skip about which scorer is most sparse). Michael Busch recently brought up a related idea... that one could skip on low df terms first... but that would of course require some terms in the conjunction. > ConjunctionScorer - more tuneup > ------------------------------- > > Key: LUCENE-693 > URL: https://issues.apache.org/jira/browse/LUCENE-693 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Affects Versions: 2.1 > Environment: Windows Server 2003 x64, Java 1.6, pretty large index > Reporter: Peter Keegan > Attachments: conjunction.patch, conjunction.patch, conjunction.patch, > conjunction.patch, conjunction.patch.nosort1 > > > (See also: #LUCENE-443) > I did some profile testing with the new ConjuctionScorer in 2.1 and > discovered a new bottleneck in ConjunctionScorer.sortScorers. The > java.utils.Arrays.sort method is cloning the Scorers array on every sort, > which is quite expensive on large indexes because of the size of the 'norms' > array within, and isn't necessary. > Here is one possible solution: > private void sortScorers() { > // squeeze the array down for the sort > // if (length != scorers.length) { > // Scorer[] temps = new Scorer[length]; > // System.arraycopy(scorers, 0, temps, 0, length); > // scorers = temps; > // } > insertionSort( scorers,length ); > // note that this comparator is not consistent with equals! > // Arrays.sort(scorers, new Comparator() { // sort the array > // public int compare(Object o1, Object o2) { > // return ((Scorer)o1).doc() - ((Scorer)o2).doc(); > // } > // }); > > first = 0; > last = length - 1; > } > private void insertionSort( Scorer[] scores, int len) > { > for (int i=0; i<len; i++) { > for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- ) { > swap (scores, j, j-1); > } > } > return; > } > private void swap(Object[] x, int a, int b) { > Object t = x[a]; > x[a] = x[b]; > x[b] = t; > } > > The squeezing of the array is no longer needed. > We also initialized the Scorers array to 8 (instead of 2) to avoid having to > grow the array for common queries, although this probably has less > performance impact. > This change added about 3% to query throughput in my testing. > Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]