[jira] [Commented] (LUCENE-4571) speedup disjunction with minShouldMatch

Stefan Pohl (JIRA) Mon, 11 Mar 2013 16:33:15 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599481#comment-13599481
 ]


Stefan Pohl commented on LUCENE-4571:
-------------------------------------

Awesome co-op. Thanks, Robert & Mike, for picking this up.

One comment to 'deferring scoring': I don't know about all current use-cases 
for these Scorers, but if there are some that require only matching, then it is 
probably most efficient to have respective specializations for each Scorer to 
either only match or match+score. Independently, this appears to be an 
orthogonal consideration to separate matching from scoring within Scorers, e.g. 
for not having to have such separate specializations.

If you're just after saving some cycles for not to have a minor response time 
decrease for some queries, then it won't help as much for the optimized 
MinShouldMatchScorer as for the previous implementation because it now 
generates (and scores) much less candidates for each of which it is now much 
more likely to pass the MinShouldMatch-constraint and most of those will hence 
be scored anyways (in use-cases where scoring is required). This is probably 
what you mean by 'this is not helpful to do if you are scoring'?

It would be awesome to have that cost-API for (sub-)Scorers, as most Scorers 
can be rewritten to benefit from it (wow, you could even demonstrate this for 
conjunctive queries) and it also allows some optimizations to work with 
structured queries that otherwise would have a reduced scope to only work on 
flat bag-of-TermScorers queries.
I would second that rewriting the attached new MinShouldMatchScorer to use the 
cost-API, that is, always excluding the very same most costly subScorers and 
heap-merging only the remaining ones would save quite a few heap operations and 
also simplify the implementation. This probably amounts to the desired ~15% 
response time improvement for the little restrictive mm-constraint queries so 
that it convincingly supersedes the previous MinShouldMatchScorer 
implementation.

Looking forward to see the impact of this optimized MinShouldMatchScorer to the 
runtimes of use-cases such as:
http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html
                
> speedup disjunction with minShouldMatch 
> ----------------------------------------
>
>                 Key: LUCENE-4571
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4571
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.1
>            Reporter: Mikhail Khludnev
>         Attachments: LUCENE-4571.patch, LUCENE-4571.patch, LUCENE-4571.patch
>
>
> even minShouldMatch is supplied to DisjunctionSumScorer it enumerates whole 
> disjunction, and verifies minShouldMatch condition [on every 
> doc|https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/DisjunctionSumScorer.java#L70]:
> {code}
>   public int nextDoc() throws IOException {
>     assert doc != NO_MORE_DOCS;
>     while(true) {
>       while (subScorers[0].docID() == doc) {
>         if (subScorers[0].nextDoc() != NO_MORE_DOCS) {
>           heapAdjust(0);
>         } else {
>           heapRemoveRoot();
>           if (numScorers < minimumNrMatchers) {
>             return doc = NO_MORE_DOCS;
>           }
>         }
>       }
>       afterNext();
>       if (nrMatchers >= minimumNrMatchers) {
>         break;
>       }
>     }
>     
>     return doc;
>   }
> {code}
> [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from the 
> heap first, and then push them back advancing behind that top doc. For me the 
> question no.1 is there a performance test for minShouldMatch constrained 
> disjunction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4571) speedup disjunction with minShouldMatch

Reply via email to