[ https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shai Erera updated LUCENE-1593: ------------------------------- Attachment: PerfTest.java LUCENE-1593.patch The patch implements all that has been suggested except: * pre-populating the queue in TopFieldCollector - as was noted here previously, this seems to remove the 'if (queueFull)' check but add another 'if' in FieldComparator (which may be executed several times per collect(). * Move initCountingSumScorer() to BS2's ctor and add(). That's because if more than one Scorer is added we create a DisjunctionSumScorer, which initializes its queue by calling next() on the passed-in Scorer. Therefore if we call initCountingSumScorer for every Scorer added, we advance that Scorer as well as all the previous ones. I chose to discard that optimization, which only affects next() and skipTo(). The patch also includes the fix for TestSort in the 2.4 back_compat branch. I only fixed TestSort, and not MultiSearcher and ParallelMultiSearcher. All tests pass. I also ran some performance measurements (all on SRV 2003): || JRE || sort || best time (trunk) || best time (patch) || diff (%) || | SUN 1.6 | int | 1017.59 | 1015.96 | {color:green}~1%{color} | | SUN 1.6 | doc | 767.49 | 763.20 | {color:green}~1%{color} | | IBM 1.5 | int | 1018.77 | 1017.39 | {color:green}~1%{color} | | IBM 1.5 | doc | 768.10 | 764.14 | {color:green}~1%{color} | As you can see, there is a slight performance improvement, but nothing too dramatic. You are welcome to review the patch as well as run the PerfTest I attached. It accepts two arguments: <indexDir> and [sort]. 'sort' is optional and if not defined it sorts by doc. Otherwise, whatever you pass there, it sorts by int. > Optimizations to TopScoreDocCollector and TopFieldCollector > ----------------------------------------------------------- > > Key: LUCENE-1593 > URL: https://issues.apache.org/jira/browse/LUCENE-1593 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Shai Erera > Fix For: 2.9 > > Attachments: LUCENE-1593.patch, PerfTest.java > > > This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code > to remove unnecessary checks. The plan is: > # Ensure that IndexSearcher returns segements in increasing doc Id order, > instead of numDocs(). > # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs > will always have larger ids and therefore cannot compete. > # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF) > and remove the check if reusableSD == null. > # Also move to use "changing top" and then call adjustTop(), in case we > update the queue. > # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker" > for the last SortField. But, doing so should not be necessary (since we > already break ties by docID), and is in fact less efficient (once the above > optimization is in). > # Investigate PQ - can we deprecate insert() and have only > insertWithOverflow()? Add a addDummyObjects method which will populate the > queue without "arranging" it, just store the objects in the array (this can > be used to pre-populate sentinel values)? > I will post a patch as well as some perf measurements as soon as I have them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org