[ https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704323#action_12704323 ]
Michael McCandless commented on LUCENE-1593: -------------------------------------------- {quote} I think we should have an issue handling interfaces deprecation in general for 2.9, since just deprecating Weight does not solve it. You'd have to deprecate Searchable.search* methods which accept Weight, but Searchable is an interface, so you might want to deprecate it entirely and create an AbstractSearchable? That I think also deserves its own thread, don't you think? {quote} Yes, and this presumably depends on the outcome of the first "how much can change in 3.0" thread. bq. I thought that perhaps we can make the following change Once again I'm lacking clarity.... there are many related possible improvements to searching: * This "top" vs "not-top" scorer difference being more explicit * Merging Query/Filter (LUCENE-1518), allowing Filter as a clause to BooleanQuery (LUCENE-1345): it still feels like Query should be a subclass of Filter, since Query "simply" adds scoring to a Filter. * Pushing random-access filters down to the TermScorers, and pre-multiplying in deletes when posible (LUCENE-1536) * Similarly pushing "bottomValue" down to TermScorers for field-sorted searching * Have a single query make a "cheap" and "expensive" scorer so that all "cheap" scorers are checked first and only if they pass are expensive ones checked (LUCENE-1252) * The possible "Scorer.check" (LUCENE-1614) to test if a doc passes w/o next'ing * For AND scoring, picking carefully in what order to test the iterators, maybe also choosing when to use "check" instead of "advance" for some. * "Multiplying out" compound queries. EG +X (A OR B) makes a nested BooleanQuery; multiplying it out and then somehow sharing a single iterator for X's TermScorer, should give better performance. Other "structural" optimizations could apply. * Far-out, and not really affecting APIs, but still related: source code specialization (LUCENE-1594) to get speedups I'm not yet sure what steps to take now (and how) vs later... > Optimizations to TopScoreDocCollector and TopFieldCollector > ----------------------------------------------------------- > > Key: LUCENE-1593 > URL: https://issues.apache.org/jira/browse/LUCENE-1593 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Shai Erera > Fix For: 2.9 > > Attachments: LUCENE-1593.patch, PerfTest.java > > > This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code > to remove unnecessary checks. The plan is: > # Ensure that IndexSearcher returns segements in increasing doc Id order, > instead of numDocs(). > # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs > will always have larger ids and therefore cannot compete. > # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF) > and remove the check if reusableSD == null. > # Also move to use "changing top" and then call adjustTop(), in case we > update the queue. > # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker" > for the last SortField. But, doing so should not be necessary (since we > already break ties by docID), and is in fact less efficient (once the above > optimization is in). > # Investigate PQ - can we deprecate insert() and have only > insertWithOverflow()? Add a addDummyObjects method which will populate the > queue without "arranging" it, just store the objects in the array (this can > be used to pre-populate sentinel values)? > I will post a patch as well as some perf measurements as soon as I have them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org