[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

GitBox Mon, 27 Jun 2022 21:11:53 -0700


zacharymorn commented on PR #972:
URL: https://github.com/apache/lucene/pull/972#issuecomment-1168197720


   > > I feel the effect would be similar?
   > 
   > Indeed, sorry I had misread your code!
   > 
   
   No worry, thanks still for the suggestion!
   
   > 
   > No, it shouldn't matter. Bulk scorers sometimes help yield better 
performance because it's easier for them to amortize computation across docs, 
but if they don't yield better performance, there's no point in using a bulk 
scorer instead of a regular scorer.
   
   Ok I see, makes sense.
   
   
   > I agree that it looks like a great speedup, we should get this in! The 
benchmark only tests performance of top-level disjunctions of term queries that 
have two clauses. I'd be curious to get performance numbers for queries like 
the below ones to see if we need to fine-tune a bit more when this new scorer 
gets used. Note that I don't think we need to get the performance better for 
all these queries to merge the change, we could start by only using this new 
scorer for the (common) case of a top-level disjunction of 2 term queries, and 
later see if this scorer can handle more disjunctions.
   > 
   > ```
   > OrAndHigMedAndHighMed: (+including +looking) (+date +finished) # 
disjunction of conjunctions, which don't have as good score upper bounds as 
term queries
   > OrHighPhraseHighPhrase: "united states" "new york" # disjunction of phrase 
queries, which don't have as good score upper bounds as term queries and are 
slow to advance
   > AndHighOrMedMed: +be +(mostly interview) # disjunction within conjunction 
that leads iteration
   > AndMedOrHighHigh: +interview +(at united) # disjunction within conjunction 
that doesn't lead iteration
   > ```
   
   Sounds good! I have run these queries through benchmark and the results look 
somewhat consistent:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
             OrHighPhraseHighPhrase       28.89      (8.7%)       24.19      
(4.7%)  -16.3% ( -27% -   -3%) 0.000
                    AndHighOrMedMed      101.24      (6.6%)      101.09      
(3.0%)   -0.1% (  -9% -   10%) 0.927
                   AndMedOrHighHigh       81.44      (6.3%)       81.62      
(3.7%)    0.2% (  -9% -   10%) 0.895
              OrAndHigMedAndHighMed      128.26      (7.0%)      136.94      
(3.7%)    6.8% (  -3% -   18%) 0.000
                           PKLookup      221.47     (11.7%)      236.93      
(9.1%)    7.0% ( -12% -   31%) 0.035
   ```
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
             OrHighPhraseHighPhrase       27.73      (9.1%)       23.73      
(4.6%)  -14.4% ( -25% -    0%) 0.000
                    AndHighOrMedMed       97.09     (13.1%)       99.30      
(4.3%)    2.3% ( -13% -   22%) 0.462
                   AndMedOrHighHigh       75.87     (15.2%)       80.04      
(5.7%)    5.5% ( -13% -   31%) 0.128
                           PKLookup      219.70     (15.7%)      238.75     
(12.4%)    8.7% ( -16% -   43%) 0.053
              OrAndHigMedAndHighMed      121.83     (13.7%)      134.79      
(4.4%)   10.6% (  -6% -   33%) 0.001
   ```
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
             OrHighPhraseHighPhrase       27.42     (16.2%)       23.99      
(4.0%)  -12.5% ( -28% -    9%) 0.001
                    AndHighOrMedMed       96.61     (15.8%)      100.09      
(3.6%)    3.6% ( -13% -   27%) 0.321
                   AndMedOrHighHigh       75.72     (16.8%)       79.53      
(4.9%)    5.0% ( -14% -   32%) 0.200
              OrAndHigMedAndHighMed      122.33     (16.9%)      136.60      
(4.5%)   11.7% (  -8% -   39%) 0.003
                           PKLookup      207.94     (21.6%)      233.10     
(16.5%)   12.1% ( -21% -   63%) 0.046
   ```
   
   Looks like we may need to restrict the scorer to only term queries, or 
improve it for phrase queries? 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

Reply via email to