[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562919#comment-17562919
 ] 

Zach Chen edited comment on LUCENE-10480 at 7/6/22 2:15 AM:
------------------------------------------------------------

{quote}Nightly benchmarks picked up the change and top-level disjunctions are 
seeing massive speedups, see 
[OrHighHigh|http://people.apache.org/~mikemccand/lucenebench/OrHighHigh.html] 
or [OrHighMed|http://people.apache.org/~mikemccand/lucenebench/OrHighMed.html]. 
However disjunctions within conjunctions got a slowdown, see 
[AndHighOrMedMed|http://people.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html]
 or 
[AndMedOrHighHigh|http://people.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html].
{quote}
The results look encouraging and interesting! I copied and pasted the boolean 
queries from *wikinightly.tasks* into 

*wikimedium.10M.nostopwords.tasks* and ran the benchmark, and was able to 
re-produce the slow-down: 
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                 AndHighOrMedMed      108.16      (6.5%)      100.44      
(5.4%)   -7.1% ( -17% -    5%) 0.000
                AndMedOrHighHigh       68.37      (4.5%)       63.92      
(5.0%)   -6.5% ( -15% -    3%) 0.000
                     AndHighHigh      122.90      (5.5%)      122.77      
(5.5%)   -0.1% ( -10% -   11%) 0.952
                      AndHighMed      113.27      (6.4%)      114.63      
(6.2%)    1.2% ( -10% -   14%) 0.546
                        PKLookup      228.08     (14.4%)      232.90     
(14.7%)    2.1% ( -23% -   36%) 0.646
                      OrHighHigh       26.89      (5.7%)       48.62     
(12.2%)   80.8% (  59% -  104%) 0.000
                       OrHighMed       81.18      (5.9%)      187.05     
(12.2%)  130.4% ( 105% -  157%) 0.000 {code}
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                AndMedOrHighHigh       85.67      (5.3%)       73.23      
(5.7%)  -14.5% ( -24% -   -3%) 0.000
                        PKLookup      260.08     (13.4%)      253.74     
(14.9%)   -2.4% ( -27% -   29%) 0.586
                     AndHighHigh       73.68      (4.7%)       72.70      
(4.1%)   -1.3% (  -9% -    7%) 0.339
                      AndHighMed       89.52      (5.1%)       88.55      
(4.4%)   -1.1% ( -10% -    8%) 0.470
                 AndHighOrMedMed       63.27      (6.5%)       70.48      
(5.7%)   11.4% (   0% -   25%) 0.000
                      OrHighHigh       19.60      (5.3%)       25.62      
(7.6%)   30.8% (  16% -   46%) 0.000
                       OrHighMed      121.08      (5.7%)      236.34     
(10.2%)   95.2% (  74% -  117%) 0.000 {code}
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                AndMedOrHighHigh       86.88      (3.4%)       76.60      
(3.1%)  -11.8% ( -17% -   -5%) 0.000
                     AndHighHigh       30.49      (3.5%)       30.36      
(3.5%)   -0.4% (  -7% -    6%) 0.697
                      AndHighMed      192.76      (3.4%)      193.72      
(3.9%)    0.5% (  -6% -    8%) 0.671
                        PKLookup      262.59      (5.5%)      264.52      
(7.9%)    0.7% ( -11% -   14%) 0.731
                 AndHighOrMedMed       65.47      (3.8%)       73.43      
(3.0%)   12.2% (   5% -   19%) 0.000
                      OrHighHigh       21.47      (4.1%)       36.94      
(8.3%)   72.1% (  57% -   88%) 0.000
                       OrHighMed       99.91      (4.3%)      292.05     
(12.9%)  192.3% ( 167% -  218%) 0.000 {code}
 

However, when I reduced the type of tasks further into just conjunction + 
disjunction (and with default number of search threads), the results actually 
turned positive and were similar to what I saw earlier in 
[https://github.com/apache/lucene/pull/972#issuecomment-1166188875] 
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                 AndHighOrMedMed       58.65     (37.3%)       71.63     
(28.9%)   22.1% ( -32% -  140%) 0.036
                AndMedOrHighHigh       36.43     (39.3%)       44.61     
(30.7%)   22.4% ( -34% -  152%) 0.044
                        PKLookup      163.58     (34.4%)      211.88     
(32.7%)   29.5% ( -27% -  147%) 0.005 {code}
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value                         PKLookup    
  146.51     (22.0%)      188.92     (30.1%)   28.9% ( -18% -  103%) 0.001      
           AndMedOrHighHigh       35.59     (27.1%)       49.99     (37.5%)   
40.4% ( -18% -  144%) 0.000                    AndHighOrMedMed       44.47     
(26.6%)       63.37     (35.8%)   42.5% ( -15% -  142%) 0.000
{code}
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                AndMedOrHighHigh       35.29     (25.0%)       52.22     
(33.5%)   47.9% (  -8% -  141%) 0.000
                        PKLookup      134.13     (23.6%)      204.43     
(25.6%)   52.4% (   2% -  132%) 0.000
                 AndHighOrMedMed       45.96     (25.1%)       74.16     
(34.8%)   61.4% (   1% -  161%) 0.000 {code}
 

If I were to run one task and one query per each benchmark run (there are only 
5 queries for AndMedOrHighHigh in the nightly task) , the results are also 
positive: 
{code:java}
AndMedOrHighHigh: +mostly +(are last) # freq=89401 freq=1921211 freq=830278

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      149.26     (26.2%)      165.25     
(31.6%)   10.7% ( -37% -   92%) 0.243
                AndMedOrHighHigh       25.53     (25.7%)       37.18     
(42.5%)   45.6% ( -17% -  152%) 0.000

------
AndMedOrHighHigh: +interview +(at united) # freq=94736 freq=2834104 freq=1185528

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      241.27     (14.6%)      266.37     
(10.6%)   10.4% ( -12% -   41%) 0.010
                AndMedOrHighHigh       27.52     (32.7%)       51.02     
(46.2%)   85.4% (   4% -  244%) 0.000
------
AndMedOrHighHigh: +hard +(but year) # freq=92045 freq=1484398 freq=1098425

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      152.20     (15.0%)      161.92     
(15.4%)    6.4% ( -20% -   43%) 0.185
                AndMedOrHighHigh       26.02     (35.0%)       38.02     
(38.5%)   46.1% ( -20% -  184%) 0.000
-------
AndMedOrHighHigh: +9 +(name its) # freq=541405 freq=2577591 freq=1160703

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      184.54     (32.6%)      208.37     
(22.2%)   12.9% ( -31% -  100%) 0.143
                AndMedOrHighHigh       18.05     (31.2%)       24.33     
(20.0%)   34.8% ( -12% -  125%) 0.000
-------
AndMedOrHighHigh: +bay +(to but) # freq=117167 freq=6105155 freq=1484398 

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      164.67     (15.7%)      167.94     
(22.2%)    2.0% ( -31% -   47%) 0.744
                AndMedOrHighHigh       25.20     (35.3%)       28.75     
(43.6%)   14.1% ( -47% -  143%) 0.262{code}
Maybe the caching effect is worth looking into as well?

 
{quote}maybe there are bits from advance() that we could move to matches() so 
that we would hand it over to the other clause before we start doing expensive 
operations like computing scores.
{quote}
Yup let me give it a try and see if it changes the results.


was (Author: zacharymorn):
{quote}Nightly benchmarks picked up the change and top-level disjunctions are 
seeing massive speedups, see 
[OrHighHigh|http://people.apache.org/~mikemccand/lucenebench/OrHighHigh.html] 
or [OrHighMed|http://people.apache.org/~mikemccand/lucenebench/OrHighMed.html]. 
However disjunctions within conjunctions got a slowdown, see 
[AndHighOrMedMed|http://people.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html]
 or 
[AndMedOrHighHigh|http://people.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html].
{quote}
The results look encouraging and interesting! I copied and pasted the boolean 
queries from *wikinightly.tasks* into 

*wikimedium.10M.nostopwords.tasks* and ran the benchmark, and was able to 
re-produce the slow-down: 
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                 AndHighOrMedMed      108.16      (6.5%)      100.44      
(5.4%)   -7.1% ( -17% -    5%) 0.000
                AndMedOrHighHigh       68.37      (4.5%)       63.92      
(5.0%)   -6.5% ( -15% -    3%) 0.000
                     AndHighHigh      122.90      (5.5%)      122.77      
(5.5%)   -0.1% ( -10% -   11%) 0.952
                      AndHighMed      113.27      (6.4%)      114.63      
(6.2%)    1.2% ( -10% -   14%) 0.546
                        PKLookup      228.08     (14.4%)      232.90     
(14.7%)    2.1% ( -23% -   36%) 0.646
                      OrHighHigh       26.89      (5.7%)       48.62     
(12.2%)   80.8% (  59% -  104%) 0.000
                       OrHighMed       81.18      (5.9%)      187.05     
(12.2%)  130.4% ( 105% -  157%) 0.000 {code}
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                AndMedOrHighHigh       85.67      (5.3%)       73.23      
(5.7%)  -14.5% ( -24% -   -3%) 0.000
                        PKLookup      260.08     (13.4%)      253.74     
(14.9%)   -2.4% ( -27% -   29%) 0.586
                     AndHighHigh       73.68      (4.7%)       72.70      
(4.1%)   -1.3% (  -9% -    7%) 0.339
                      AndHighMed       89.52      (5.1%)       88.55      
(4.4%)   -1.1% ( -10% -    8%) 0.470
                 AndHighOrMedMed       63.27      (6.5%)       70.48      
(5.7%)   11.4% (   0% -   25%) 0.000
                      OrHighHigh       19.60      (5.3%)       25.62      
(7.6%)   30.8% (  16% -   46%) 0.000
                       OrHighMed      121.08      (5.7%)      236.34     
(10.2%)   95.2% (  74% -  117%) 0.000 {code}
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                AndMedOrHighHigh       86.88      (3.4%)       76.60      
(3.1%)  -11.8% ( -17% -   -5%) 0.000
                     AndHighHigh       30.49      (3.5%)       30.36      
(3.5%)   -0.4% (  -7% -    6%) 0.697
                      AndHighMed      192.76      (3.4%)      193.72      
(3.9%)    0.5% (  -6% -    8%) 0.671
                        PKLookup      262.59      (5.5%)      264.52      
(7.9%)    0.7% ( -11% -   14%) 0.731
                 AndHighOrMedMed       65.47      (3.8%)       73.43      
(3.0%)   12.2% (   5% -   19%) 0.000
                      OrHighHigh       21.47      (4.1%)       36.94      
(8.3%)   72.1% (  57% -   88%) 0.000
                       OrHighMed       99.91      (4.3%)      292.05     
(12.9%)  192.3% ( 167% -  218%) 0.000 {code}
 

However, when I reduced the type of tasks further into just conjunction + 
disjunction (and with default number of search threads), the results actually 
turned positive and were similar to what I saw earlier in 
[https://github.com/apache/lucene/pull/972#issuecomment-1166188875] 
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                 AndHighOrMedMed       58.65     (37.3%)       71.63     
(28.9%)   22.1% ( -32% -  140%) 0.036
                AndMedOrHighHigh       36.43     (39.3%)       44.61     
(30.7%)   22.4% ( -34% -  152%) 0.044
                        PKLookup      163.58     (34.4%)      211.88     
(32.7%)   29.5% ( -27% -  147%) 0.005 {code}
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value                         PKLookup    
  146.51     (22.0%)      188.92     (30.1%)   28.9% ( -18% -  103%) 0.001      
           AndMedOrHighHigh       35.59     (27.1%)       49.99     (37.5%)   
40.4% ( -18% -  144%) 0.000                       AndHighOrMedMed       44.47   
  (26.6%)       63.37     (35.8%)   42.5% ( -15% -  142%) 0.000
{code}
 
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                AndMedOrHighHigh       35.29     (25.0%)       52.22     
(33.5%)   47.9% (  -8% -  141%) 0.000
                        PKLookup      134.13     (23.6%)      204.43     
(25.6%)   52.4% (   2% -  132%) 0.000
                 AndHighOrMedMed       45.96     (25.1%)       74.16     
(34.8%)   61.4% (   1% -  161%) 0.000 {code}
 

If I were to run one task and one query per each benchmark run (there are only 
5 queries for AndMedOrHighHigh in the nightly task) , the results are also 
positive: 
{code:java}
AndMedOrHighHigh: +mostly +(are last) # freq=89401 freq=1921211 freq=830278

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      149.26     (26.2%)      165.25     
(31.6%)   10.7% ( -37% -   92%) 0.243
                AndMedOrHighHigh       25.53     (25.7%)       37.18     
(42.5%)   45.6% ( -17% -  152%) 0.000

------
AndMedOrHighHigh: +interview +(at united) # freq=94736 freq=2834104 freq=1185528

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      241.27     (14.6%)      266.37     
(10.6%)   10.4% ( -12% -   41%) 0.010
                AndMedOrHighHigh       27.52     (32.7%)       51.02     
(46.2%)   85.4% (   4% -  244%) 0.000
------
AndMedOrHighHigh: +hard +(but year) # freq=92045 freq=1484398 freq=1098425

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      152.20     (15.0%)      161.92     
(15.4%)    6.4% ( -20% -   43%) 0.185
                AndMedOrHighHigh       26.02     (35.0%)       38.02     
(38.5%)   46.1% ( -20% -  184%) 0.000
-------
AndMedOrHighHigh: +9 +(name its) # freq=541405 freq=2577591 freq=1160703

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      184.54     (32.6%)      208.37     
(22.2%)   12.9% ( -31% -  100%) 0.143
                AndMedOrHighHigh       18.05     (31.2%)       24.33     
(20.0%)   34.8% ( -12% -  125%) 0.000
-------
AndMedOrHighHigh: +bay +(to but) # freq=117167 freq=6105155 freq=1484398 

                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                        PKLookup      164.67     (15.7%)      167.94     
(22.2%)    2.0% ( -31% -   47%) 0.744
                AndMedOrHighHigh       25.20     (35.3%)       28.75     
(43.6%)   14.1% ( -47% -  143%) 0.262{code}
Maybe the caching effect is worth looking into as well?

 
{quote}maybe there are bits from advance() that we could move to matches() so 
that we would hand it over to the other clause before we start doing expensive 
operations like computing scores.
{quote}
Yup let me give it a try and see if it changes the results.

> Specialize 2-clauses disjunctions
> ---------------------------------
>
>                 Key: LUCENE-10480
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10480
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its 
> invariants: one linked list for the current candidates, one priority queue of 
> scorers that are behind, another one for scorers that are ahead. All this 
> could be simplified in the 2-clauses case, which feels worth specializing for 
> as it's very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to