[ 
https://issues.apache.org/jira/browse/LUCENE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610275#comment-13610275
 ] 

Robert Muir commented on LUCENE-4571:
-------------------------------------

Thanks Stefan! I can confirm its passing tests and I'm seeing similar benchmark 
results (on iteration #9 now)

the benchmark takes some time on my computer: I'll report back what I see when 
its complete.

{quote}
There are still some assertions failing, but for now I'd attribute them to 
problems with the test (are return statements is missing when actual==null?), 
because these assertions only fail within the baseline/expected impl. 
Lucenebench doesn't complain anymore, even when checking scores.
{quote}

Awesome! Thats definitely a test bug in assertNext()... its just very unlikely 
to happen (means that by rare chance, no documents got term "j" or whatever): 
so it hasn't tripped up in Jenkins. I'll fix this, thanks!

{quote}
Instead of re-implementing all of the previous basic implementation, do you 
think it would be possible to rather re-use a normal DisjunctionSumScorer and 
wrap it with a Scorer that discards candidates that don't match the 
mm-constraint? This way we would get correct scoring for free. E.g.
{quote}

We could: I thought about this. However I did this slow one for a few reasons:
* doesn't prevent any refactoring of core/ code (its a standalone thing just 
for testing)
* should make things pretty easy to debug (this isnt yet done though)
* possibility of finding bugs in the current implementation. especially as we 
near a bugfix release :)

The correct scoring is actually easy to add. But I didnt do this yet because I 
didnt want to hold up this issue on "the perfect test". To me the important 
thing here is matching the correct documents according to minNrShouldMatch. If 
this is correct, its pretty intuitive to me that scoring is correct too: 
wherever we increase nrMatchers there should also be a addition to the score.

But I'll improve this today so we all have that warm fuzzy feeling!

As far as I'm concerned this patch is ready to be committed today. I'll follow 
up with my benchmark results in a comment and improve 
the tests a bit this morning though.

Thanks a lot for taking this on... awesome work.
                
> speedup disjunction with minShouldMatch 
> ----------------------------------------
>
>                 Key: LUCENE-4571
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4571
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.1
>            Reporter: Mikhail Khludnev
>         Attachments: LUCENE-4571.patch, LUCENE-4571.patch, LUCENE-4571.patch, 
> LUCENE-4571.patch, LUCENE-4571.patch, LUCENE-4571.patch
>
>
> even minShouldMatch is supplied to DisjunctionSumScorer it enumerates whole 
> disjunction, and verifies minShouldMatch condition [on every 
> doc|https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/DisjunctionSumScorer.java#L70]:
> {code}
>   public int nextDoc() throws IOException {
>     assert doc != NO_MORE_DOCS;
>     while(true) {
>       while (subScorers[0].docID() == doc) {
>         if (subScorers[0].nextDoc() != NO_MORE_DOCS) {
>           heapAdjust(0);
>         } else {
>           heapRemoveRoot();
>           if (numScorers < minimumNrMatchers) {
>             return doc = NO_MORE_DOCS;
>           }
>         }
>       }
>       afterNext();
>       if (nrMatchers >= minimumNrMatchers) {
>         break;
>       }
>     }
>     
>     return doc;
>   }
> {code}
> [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from the 
> heap first, and then push them back advancing behind that top doc. For me the 
> question no.1 is there a performance test for minShouldMatch constrained 
> disjunction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to