[jira] [Commented] (LUCENE-6919) Change the Scorer API to expose an iterator instead of extending DocIdSetIterator

Adrien Grand (JIRA) Wed, 09 Dec 2015 08:41:42 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048947#comment-15048947
 ]


Adrien Grand commented on LUCENE-6919:
--------------------------------------

Woops, here is the actual patch I wanted to upload.

> Change the Scorer API to expose an iterator instead of extending 
> DocIdSetIterator
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-6919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6919
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6919.patch, LUCENE-6919.patch, LUCENE-6919.patch
>
>
> I was working on trying to address the performance regression on LUCENE-6815 
> but this is hard to do without introducing specialization of 
> DisjunctionScorer which I'd like to avoid at all costs.
> I think the performance regression would be easy to address without 
> specialization if Scorers were changed to return an iterator instead of 
> extending DocIdSetIterator. So conceptually the API would move from
> {code}
> class Scorer extends DocIdSetIterator {
> }
> {code}
> to
> {code}
> class Scorer {
>   DocIdSetIterator iterator();
> }
> {code}
> This would help me because then if none of the sub clauses support two-phase 
> iteration, DisjunctionScorer could directly return the approximation as an 
> iterator instead of having to check if twoPhase == null at every iteration.
> Such an approach could also help remove some method calls. For instance 
> TermScorer.nextDoc calls PostingsEnum.nextDoc but with this change 
> TermScorer.iterator() could return the PostingsEnum and TermScorer would not 
> even appear in stack traces when scoring. I hacked a patch to see how much 
> that would help and luceneutil seems to like the change:
> {noformat}
>                     TaskQPS baseline      StdDev   QPS patch      StdDev      
>           Pct diff
>                   Fuzzy1       88.54     (15.7%)       86.73     (16.6%)   
> -2.0% ( -29% -   35%)
>               AndHighLow      698.98      (4.1%)      691.11      (5.1%)   
> -1.1% (  -9% -    8%)
>                   Fuzzy2       26.47     (11.2%)       26.28     (10.3%)   
> -0.7% ( -19% -   23%)
>              MedSpanNear      141.03      (3.3%)      140.51      (3.2%)   
> -0.4% (  -6% -    6%)
>               HighPhrase       60.66      (2.6%)       60.48      (3.3%)   
> -0.3% (  -5% -    5%)
>              LowSpanNear       29.25      (2.4%)       29.21      (2.1%)   
> -0.1% (  -4% -    4%)
>                MedPhrase       28.32      (1.9%)       28.28      (2.0%)   
> -0.1% (  -3% -    3%)
>                LowPhrase       17.31      (2.1%)       17.29      (2.6%)   
> -0.1% (  -4% -    4%)
>         HighSloppyPhrase       10.93      (6.0%)       10.92      (6.0%)   
> -0.1% ( -11% -   12%)
>          MedSloppyPhrase       72.21      (2.2%)       72.27      (1.8%)    
> 0.1% (  -3% -    4%)
>                  Respell       57.35      (3.2%)       57.41      (3.4%)    
> 0.1% (  -6% -    6%)
>             HighSpanNear       26.71      (3.0%)       26.75      (2.5%)    
> 0.1% (  -5% -    5%)
>             OrNotHighLow      803.46      (3.4%)      807.03      (4.2%)    
> 0.4% (  -6% -    8%)
>          LowSloppyPhrase       88.02      (3.4%)       88.77      (2.5%)    
> 0.8% (  -4% -    7%)
>             OrNotHighMed      200.45      (2.7%)      203.83      (2.5%)    
> 1.7% (  -3% -    7%)
>               OrHighHigh       38.98      (7.9%)       40.30      (6.6%)    
> 3.4% ( -10% -   19%)
>                 HighTerm       92.53      (5.3%)       95.94      (5.8%)    
> 3.7% (  -7% -   15%)
>                OrHighMed       53.80      (7.7%)       55.79      (6.6%)    
> 3.7% (  -9% -   19%)
>               AndHighMed      266.69      (1.7%)      277.15      (2.5%)    
> 3.9% (   0% -    8%)
>                  Prefix3       44.68      (5.4%)       46.60      (7.0%)    
> 4.3% (  -7% -   17%)
>                  MedTerm      261.52      (4.9%)      273.52      (5.4%)    
> 4.6% (  -5% -   15%)
>                 Wildcard       42.39      (6.1%)       44.35      (7.8%)    
> 4.6% (  -8% -   19%)
>                   IntNRQ       10.46      (7.0%)       10.99      (9.5%)    
> 5.0% ( -10% -   23%)
>            OrNotHighHigh       67.15      (4.6%)       70.65      (4.5%)    
> 5.2% (  -3% -   15%)
>            OrHighNotHigh       43.07      (5.1%)       45.36      (5.4%)    
> 5.3% (  -4% -   16%)
>                OrHighLow       64.19      (6.4%)       67.72      (5.5%)    
> 5.5% (  -6% -   18%)
>              AndHighHigh       64.17      (2.3%)       67.87      (2.1%)    
> 5.8% (   1% -   10%)
>                  LowTerm      642.94     (10.9%)      681.48      (8.5%)    
> 6.0% ( -12% -   28%)
>             OrHighNotMed       12.68      (6.9%)       13.51      (6.6%)    
> 6.5% (  -6% -   21%)
>             OrHighNotLow       54.69      (6.8%)       58.25      (7.0%)    
> 6.5% (  -6% -   21%)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6919) Change the Scorer API to expose an iterator instead of extending DocIdSetIterator

Reply via email to