[jira] [Commented] (LUCENE-8806) WANDScorer should support two-phase iterator

Jim Ferenczi (JIRA) Tue, 25 Jun 2019 01:11:09 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872142#comment-16872142
 ]


Jim Ferenczi commented on LUCENE-8806:
--------------------------------------

I ran luceneutil with some disjunctions of phrase and term queries:
{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
      HighPhraseHighTerm        8.47      (1.6%)        4.78      (2.6%)  
-43.6% ( -47% -  -40%)
       MedPhraseHighTerm       15.54      (1.2%)        9.41      (2.5%)  
-39.5% ( -42% -  -36%)
    HighPhraseHighPhrase        5.99      (1.4%)        3.65      (3.0%)  
-39.0% ( -42% -  -35%)
     HighPhraseLowPhrase       15.57      (1.2%)       14.26      (3.6%)   
-8.4% ( -13% -   -3%)
      LowPhraseLowPhrase       27.25      (2.0%)       31.75      (4.5%)   
16.5% (   9% -   23%)
       HighPhraseLowTerm       26.31      (0.9%)       31.42      (3.4%)   
19.4% (  14% -   24%)
       HighPhraseMedTerm       12.95      (1.0%)       15.74      (3.8%)   
21.6% (  16% -   26%)
      MedPhraseMedPhrase        9.21      (2.4%)       11.50      (8.3%)   
24.9% (  13% -   36%)
        MedPhraseLowTerm       24.85      (1.6%)       31.52      (5.5%)   
26.8% (  19% -   34%)
      MedPhraseLowPhrase       11.64      (2.3%)       15.06      (7.1%)   
29.3% (  19% -   39%)
     HighPhraseMedPhrase        8.27      (2.0%)       10.77      (7.2%)   
30.2% (  20% -   40%)
        MedPhraseMedTerm       14.53      (1.7%)       19.33      (5.6%)   
33.0% (  25% -   40%)
{noformat}

While the change speeds up some cases it also shows a non-negligible regression 
with high and med frequencies.
Currently the phrase scorer doesn't check impacts to compute the max score per 
blocks so I tried to hack a simple patch that merges the impacts of the terms 
that appear in the phrase query. The patch keeps the minimum frequency per norm 
value in order to compute an upper bound of the score of the phrase query. I 
ran luceneutil again with the modified patch and results are much better:
{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
      HighPhraseHighTerm        8.22      (3.3%)        8.83      (1.9%)    
7.4% (   2% -   12%)
      LowPhraseLowPhrase       26.57      (0.7%)       28.55      (5.5%)    
7.4% (   1% -   13%)
     HighPhraseMedPhrase        7.98      (0.8%)        9.01      (5.0%)   
12.9% (   7% -   18%)
      MedPhraseMedPhrase        8.95      (1.4%)       10.11      (6.6%)   
12.9% (   4% -   21%)
       MedPhraseHighTerm       15.10      (1.1%)       17.69      (4.6%)   
17.2% (  11% -   23%)
      MedPhraseLowPhrase       11.17      (1.1%)       13.11      (4.9%)   
17.4% (  11% -   23%)
     HighPhraseLowPhrase       15.09      (1.5%)       18.85      (7.4%)   
24.9% (  15% -   34%)
    HighPhraseHighPhrase        5.75      (2.3%)        7.26      (4.5%)   
26.2% (  18% -   33%)
       HighPhraseLowTerm       25.68      (0.7%)       34.46      (2.4%)   
34.2% (  30% -   37%)
        MedPhraseMedTerm       14.23      (0.1%)       20.71      (2.3%)   
45.5% (  43% -   47%)
        MedPhraseLowTerm       24.30      (0.6%)       38.47      (2.4%)   
58.3% (  55% -   61%)
       HighPhraseMedTerm       12.77      (0.6%)       22.21      (3.1%)   
73.9% (  69% -   77%)
{noformat}

However simple phrase queries (without disjunctions) seem to be slower with the 
merging of impacts:
{noformat}
                  TaskQPS baseline      StdDev   QPS patch      StdDev          
      Pct diff
              HighPhrase       10.48      (0.0%)        9.74      (0.0%)   
-7.1% (  -7% -   -7%)
               MedPhrase       20.92      (0.0%)       20.25      (0.0%)   
-3.2% (  -3% -   -3%)
               LowPhrase       24.07      (0.0%)       23.33      (0.0%)   
-3.1% (  -3% -   -3%)
{noformat}

I am not sure that the merging of impacts is correct so far so I'll add some 
tests. It's also unrelated to this change (even if it helps for performance) so 
I'll open a separate issue to discuss this merging of impacts for phrase query 
separately.
Considering the results of this change alone (two-phase iterator for the wand) 
I will not merge it yet since it doesn't improve queries with lots of matches 
but we can revisit when/if the merging of impacts for phrase queries is 
implemented. WDYT ?

> WANDScorer should support two-phase iterator
> --------------------------------------------
>
>                 Key: LUCENE-8806
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8806
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Jim Ferenczi
>            Priority: Major
>         Attachments: LUCENE-8806.patch, LUCENE-8806.patch
>
>
> Following https://issues.apache.org/jira/browse/LUCENE-8770 the WANDScorer 
> should leverage two-phase iterators in order to be faster when used in 
> conjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8806) WANDScorer should support two-phase iterator

Reply via email to