[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

Adrien Grand (JIRA) Wed, 03 Jul 2019 08:22:12 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877919#comment-16877919
 ]


Adrien Grand commented on LUCENE-8311:
--------------------------------------

I opened https://github.com/apache/lucene-solr/pull/760. Performance is a bit 
better than what we had before:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                HighTerm     1395.12      (5.1%)     1230.78      (4.3%)  
-11.8% ( -20% -   -2%)
                 MedTerm     2352.56      (4.7%)     2170.42      (3.9%)   
-7.7% ( -15% -    0%)
             LowSpanNear       13.70      (7.0%)       12.67      (4.9%)   
-7.5% ( -18% -    4%)
            HighSpanNear        5.69      (5.3%)        5.31      (3.2%)   
-6.5% ( -14% -    2%)
             MedSpanNear       23.33      (4.2%)       21.97      (2.4%)   
-5.8% ( -11% -    0%)
              AndHighMed      114.70      (2.9%)      109.40      (4.1%)   
-4.6% ( -11% -    2%)
             AndHighHigh       35.08      (3.2%)       33.51      (4.1%)   
-4.5% ( -11% -    2%)
                 LowTerm     3014.11      (4.7%)     2893.44      (4.7%)   
-4.0% ( -12% -    5%)
               OrHighMed       60.26      (2.5%)       57.96      (2.1%)   
-3.8% (  -8% -    0%)
              OrHighHigh       15.45      (2.5%)       14.87      (2.3%)   
-3.8% (  -8% -    1%)
               LowPhrase       25.81      (3.4%)       24.89      (2.8%)   
-3.6% (  -9% -    2%)
        HighSloppyPhrase        7.44      (6.3%)        7.20      (5.7%)   
-3.3% ( -14% -    9%)
         MedSloppyPhrase       12.76      (5.1%)       12.51      (4.6%)   
-1.9% ( -10% -    8%)
         LowSloppyPhrase       34.24      (4.1%)       33.59      (3.8%)   
-1.9% (  -9% -    6%)
       HighTermMonthSort       70.86     (10.9%)       69.98     (10.7%)   
-1.2% ( -20% -   22%)
                  Fuzzy1      211.28      (3.5%)      208.86      (2.2%)   
-1.1% (  -6% -    4%)
                  Fuzzy2      180.97      (4.4%)      179.47      (2.6%)   
-0.8% (  -7% -    6%)
               OrHighLow      467.25      (2.9%)      467.94      (2.0%)    
0.1% (  -4% -    5%)
                 Prefix3       91.35      (8.1%)       91.52      (7.2%)    
0.2% ( -14% -   16%)
   HighTermDayOfYearSort       62.77      (6.9%)       62.96      (7.5%)    
0.3% ( -13% -   15%)
                Wildcard      129.49      (4.3%)      129.99      (2.8%)    
0.4% (  -6% -    7%)
                 Respell      210.68      (1.9%)      211.58      (2.4%)    
0.4% (  -3% -    4%)
              AndHighLow      541.64      (3.1%)      544.44      (3.2%)    
0.5% (  -5% -    7%)
                  IntNRQ      148.56      (8.3%)      149.44     (10.4%)    
0.6% ( -16% -   21%)
              HighPhrase       10.86      (9.0%)       13.92     (15.2%)   
28.2% (   3% -   57%)
               MedPhrase       62.22      (2.1%)       97.61      (4.6%)   
56.9% (  49% -   64%)
{noformat}

But there is a lot of variance across runs because it depends a lot on which 
query gets picked up. For instance on another run I got

{noformat}
               LowPhrase       39.39      (1.9%)       51.21      (2.2%)   
30.0% (  25% -   34%)
              HighPhrase       13.09      (3.2%)      192.76     (26.8%) 
1372.5% (1301% - 1448%)
{noformat}

In spite of some queries that get slightly slower, I think we should merge this 
since we need phrases to expose good impacts if we want to give boolean queries 
a chance to speed up queries that include phrases. Term queries appear to be a 
bit slower, I'm assuming that this is due to the fact that the JVM cannot do as 
much inlining as before since we are starting to use classes for phrases that 
were only used for term queries before.

> Leverage impacts for phrase queries
> -----------------------------------
>
>                 Key: LUCENE-8311
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8311
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8311.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

Reply via email to