zacharymorn commented on PR #12194:
URL: https://github.com/apache/lucene/pull/12194#issuecomment-1477253989

   > I have some suggestions for moving this PR forward:
   > 
   > * Enhance CheckIndex to check that peekNextNonMatcthingDocID is correct.
   > * Enhance AssertingScorer to check that peekNextNonMatchingDocID is only 
called when the iterator is positioned. Also check return values.
   > * Revert changes to bitsets and doc-value iterators, let's only focus on 
postings and negations to keep this initial PR simple? We'll add support for 
bitsets and doc-value iterators in follow-ups? Maybe we could consider 
conjunctions too for this initial PR, which are far more common than negations 
in my experience.
   > * See if we can leverage skip data to skip over longer ranges of doc IDs 
with postings.
   > * See if we can reduce the slowdown on `OrNotHighHigh` and other negations 
when the optimization does not kick in.
   > * A `quarter` field is a bit extreme, see if we can also observe good 
speedups with something less extreme like the `month` field?
   
   Thanks @jpountz for the suggestions! The plan makes sense to me. I have 
pushed a commit 
https://github.com/apache/lucene/pull/12194/commits/b2184995df34cb1e42ff987c37c7c81ecb55aca6
 to revert changes to bitset, doc values and `ReqExclScorer` (since it 
leveraged approximation and also negatively impacted `OrNotXY` queries), as 
well as benchmark the implementation via `month`. The latest benchmark results 
look something like these:
   
   Tasks: (had to use `monthPostings` here since `month` was already taken as 
doc value field`
   ```
   AndHighNotMonth: +last -monthPostings:jan #  freq=830278
   AndHighNotMonth: +united -monthPostings:feb #  freq=1185528
   AndHighNotMonth: +year -monthPostings:mar #  freq=1098425
   AndHighNotMonth: +its -monthPostings:apr #  freq=1160703
   AndHighNotMonth: +but -monthPostings:may #  freq=1484398
   AndMedNotMonth: +mostly -monthPostings:jun #  freq=89401
   AndMedNotMonth: +interview -monthPostings:jul #  freq=94736
   AndMedNotMonth: +9 -monthPostings:aug #  freq=541405
   AndMedNotMonth: +hard -monthPostings:sep #  freq=92045
   AndMedNotMonth: +bay -monthPostings:oct #  freq=117167
   ```
   
   Results:
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
             OrHighMedDayTaxoFacets       18.68      (5.2%)       18.11      
(5.5%)   -3.1% ( -13% -    8%) 0.069
                LowIntervalsOrdered      216.28      (8.6%)      211.46      
(9.9%)   -2.2% ( -19% -   17%) 0.447
                MedIntervalsOrdered       84.99     (11.5%)       83.27     
(12.6%)   -2.0% ( -23% -   25%) 0.597
               HighIntervalsOrdered       27.27      (8.5%)       26.73      
(9.0%)   -2.0% ( -17% -   16%) 0.468
              HighTermDayOfYearSort      464.57      (3.5%)      455.78      
(6.3%)   -1.9% ( -11% -    8%) 0.237
               BrowseDateTaxoFacets       18.29      (3.0%)       18.11      
(3.1%)   -1.0% (  -6% -    5%) 0.304
                  HighTermTitleSort      203.48      (2.1%)      201.63      
(2.1%)   -0.9% (  -5% -    3%) 0.174
          BrowseDayOfYearSSDVFacets       14.53      (7.4%)       14.40      
(6.0%)   -0.9% ( -13% -   13%) 0.674
                            LowTerm     1262.59      (5.6%)     1253.06      
(6.1%)   -0.8% ( -11% -   11%) 0.684
               MedTermDayTaxoFacets       29.75      (2.3%)       29.56      
(1.4%)   -0.7% (  -4% -    3%) 0.283
                         AndHighMed      366.10      (4.4%)      363.76      
(5.3%)   -0.6% (  -9% -    9%) 0.676
                         HighPhrase      228.16      (2.5%)      227.25      
(2.8%)   -0.4% (  -5% -    4%) 0.633
                           PKLookup      314.34      (3.1%)      313.17      
(3.7%)   -0.4% (  -6% -    6%) 0.730
            AndHighMedDayTaxoFacets       57.70      (2.9%)       57.52      
(2.4%)   -0.3% (  -5% -    5%) 0.714
                         TermDTSort      211.54      (3.5%)      211.05      
(5.1%)   -0.2% (  -8% -    8%) 0.869
           AndHighHighDayTaxoFacets       26.74      (1.7%)       26.71      
(1.5%)   -0.1% (  -3% -    3%) 0.877
                          LowPhrase       20.75      (2.6%)       20.74      
(2.7%)   -0.0% (  -5% -    5%) 0.958
                          MedPhrase      117.30      (2.4%)      117.36      
(1.9%)    0.0% (  -4% -    4%) 0.946
                          OrHighMed      202.92      (5.4%)      203.12      
(4.6%)    0.1% (  -9% -   10%) 0.950
                        AndHighHigh       74.03      (3.9%)       74.10      
(4.6%)    0.1% (  -8% -    8%) 0.940
                        LowSpanNear       45.31      (1.3%)       45.45      
(0.8%)    0.3% (  -1% -    2%) 0.371
                           Wildcard      190.90      (3.1%)      191.52      
(3.3%)    0.3% (  -5% -    6%) 0.748
          BrowseDayOfYearTaxoFacets       14.14      (2.4%)       14.20      
(3.6%)    0.4% (  -5% -    6%) 0.687
               BrowseDateSSDVFacets        4.98     (17.3%)        5.00     
(17.5%)    0.5% ( -29% -   42%) 0.927
                            Respell       89.97      (2.0%)       90.42      
(1.9%)    0.5% (  -3% -    4%) 0.424
                             IntNRQ       85.40     (19.5%)       85.93     
(18.0%)    0.6% ( -30% -   47%) 0.916
                         OrHighHigh       45.64      (5.8%)       45.93      
(3.7%)    0.6% (  -8% -   10%) 0.679
               HighTermTitleBDVSort       37.81      (1.0%)       38.09      
(2.0%)    0.7% (  -2% -    3%) 0.131
                    MedSloppyPhrase       19.19      (3.2%)       19.34      
(3.9%)    0.8% (  -6% -    8%) 0.485
                       HighSpanNear       10.28      (1.6%)       10.37      
(1.2%)    0.8% (  -1% -    3%) 0.069
                          OrHighLow      914.57      (4.8%)      922.44      
(4.1%)    0.9% (  -7% -   10%) 0.543
                   HighSloppyPhrase        8.44      (3.5%)        8.52      
(4.1%)    0.9% (  -6% -    8%) 0.452
                       OrNotHighLow     1686.77      (3.6%)     1702.27      
(3.7%)    0.9% (  -6% -    8%) 0.425
                    LowSloppyPhrase      240.50      (2.0%)      242.74      
(2.9%)    0.9% (  -3% -    5%) 0.239
                             Fuzzy2       24.44      (1.7%)       24.68      
(1.1%)    1.0% (  -1% -    3%) 0.030
                  HighTermMonthSort     3940.53      (4.8%)     3979.49      
(4.7%)    1.0% (  -8% -   10%) 0.507
                      OrHighNotHigh      510.95      (4.1%)      516.14      
(3.5%)    1.0% (  -6% -    8%) 0.398
                             Fuzzy1      127.91      (1.6%)      129.29      
(1.3%)    1.1% (  -1% -    4%) 0.022
                       OrNotHighMed      781.43      (3.5%)      790.15      
(3.7%)    1.1% (  -5% -    8%) 0.324
                      OrNotHighHigh      525.59      (3.3%)      532.45      
(3.1%)    1.3% (  -4% -    8%) 0.200
                            MedTerm     1202.24      (3.7%)     1220.51      
(4.8%)    1.5% (  -6% -   10%) 0.263
                       OrHighNotMed      569.55      (4.2%)      578.34      
(3.7%)    1.5% (  -6% -    9%) 0.216
                        MedSpanNear      136.46      (2.2%)      138.57      
(1.9%)    1.5% (  -2% -    5%) 0.019
                       OrHighNotLow      676.93      (4.2%)      687.46      
(4.0%)    1.6% (  -6% -   10%) 0.233
                           HighTerm      946.83      (4.1%)      961.82      
(4.9%)    1.6% (  -7% -   10%) 0.266
              BrowseMonthSSDVFacets       20.58     (12.2%)       20.94     
(21.8%)    1.7% ( -28% -   40%) 0.756
                         AndHighLow     1538.93      (5.6%)     1569.90      
(5.7%)    2.0% (  -8% -   14%) 0.263
              BrowseMonthTaxoFacets       16.67      (3.9%)       17.04      
(4.1%)    2.2% (  -5% -   10%) 0.079
                            Prefix3     1668.64      (3.7%)     1709.38      
(4.6%)    2.4% (  -5% -   11%) 0.064
                     AndMedNotMonth      879.42      (4.1%)     1142.62      
(5.9%)   29.9% (  19% -   41%) 0.000
                    AndHighNotMonth      373.72      (4.9%)      600.07      
(7.2%)   60.6% (  46% -   76%) 0.000
   ```
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
              BrowseMonthSSDVFacets       22.81     (23.8%)       21.14     
(15.0%)   -7.3% ( -37% -   41%) 0.244
              BrowseMonthTaxoFacets       17.09      (3.4%)       16.89      
(4.2%)   -1.2% (  -8% -    6%) 0.332
              HighTermDayOfYearSort      532.86      (5.1%)      528.05      
(6.2%)   -0.9% ( -11% -   10%) 0.614
          BrowseDayOfYearSSDVFacets       14.73      (3.9%)       14.64      
(7.2%)   -0.6% ( -11% -   10%) 0.732
                         TermDTSort      226.03      (4.8%)      225.35      
(4.8%)   -0.3% (  -9% -    9%) 0.842
                           PKLookup      312.28      (3.6%)      311.52      
(2.5%)   -0.2% (  -6% -    6%) 0.805
                       OrHighNotMed      557.57      (5.3%)      556.42      
(3.9%)   -0.2% (  -8% -    9%) 0.888
                      OrHighNotHigh      443.79      (4.5%)      443.12      
(3.8%)   -0.2% (  -8% -    8%) 0.910
                             IntNRQ      119.18      (2.9%)      119.05      
(3.4%)   -0.1% (  -6% -    6%) 0.910
            AndHighMedDayTaxoFacets      154.34      (3.0%)      154.17      
(4.1%)   -0.1% (  -6% -    7%) 0.921
                           HighTerm      642.18      (4.7%)      641.61      
(3.0%)   -0.1% (  -7% -    8%) 0.944
           AndHighHighDayTaxoFacets        9.79      (3.5%)        9.79      
(3.6%)   -0.0% (  -6% -    7%) 0.970
                            MedTerm      927.37      (3.9%)      927.88      
(3.5%)    0.1% (  -7% -    7%) 0.963
                             Fuzzy2      135.91      (2.6%)      136.03      
(2.7%)    0.1% (  -5% -    5%) 0.911
                          MedPhrase      191.19      (2.5%)      191.39      
(2.7%)    0.1% (  -4% -    5%) 0.900
                            Respell       97.25      (2.1%)       97.38      
(1.4%)    0.1% (  -3% -    3%) 0.820
                       OrHighNotLow      683.76      (4.4%)      684.68      
(4.2%)    0.1% (  -8% -    9%) 0.921
                             Fuzzy1      132.14      (2.6%)      132.37      
(2.7%)    0.2% (  -4% -    5%) 0.835
                       OrNotHighMed      634.06      (3.8%)      635.26      
(4.9%)    0.2% (  -8% -    9%) 0.892
                           Wildcard      247.85      (3.8%)      248.34      
(2.1%)    0.2% (  -5% -    6%) 0.837
          BrowseDayOfYearTaxoFacets       13.96      (1.8%)       13.99      
(1.5%)    0.2% (  -3% -    3%) 0.688
                         HighPhrase      203.79      (2.1%)      204.33      
(2.5%)    0.3% (  -4% -    4%) 0.713
                          LowPhrase      153.51      (1.9%)      154.15      
(2.6%)    0.4% (  -4% -    5%) 0.563
                LowIntervalsOrdered      146.20      (3.2%)      146.81      
(3.4%)    0.4% (  -5% -    7%) 0.688
               HighTermTitleBDVSort       32.43      (1.8%)       32.57      
(3.1%)    0.4% (  -4% -    5%) 0.587
               MedTermDayTaxoFacets       73.26      (1.3%)       73.62      
(1.7%)    0.5% (  -2% -    3%) 0.307
                        MedSpanNear       99.50      (1.8%)      100.01      
(1.8%)    0.5% (  -2% -    4%) 0.365
                            LowTerm     1091.52      (5.3%)     1097.52      
(4.9%)    0.6% (  -9% -   11%) 0.733
                        LowSpanNear      278.50      (2.0%)      280.09      
(2.0%)    0.6% (  -3% -    4%) 0.369
               HighIntervalsOrdered       36.70      (5.9%)       36.92      
(6.5%)    0.6% ( -11% -   13%) 0.758
             OrHighMedDayTaxoFacets        6.96      (3.4%)        7.00      
(2.9%)    0.6% (  -5% -    7%) 0.518
                    LowSloppyPhrase       29.02      (3.8%)       29.23      
(3.8%)    0.7% (  -6% -    8%) 0.543
                          OrHighLow      601.05      (4.1%)      605.60      
(5.1%)    0.8% (  -8% -   10%) 0.605
                      OrNotHighHigh      530.94      (4.3%)      535.17      
(3.4%)    0.8% (  -6% -    8%) 0.515
                            Prefix3      302.43      (4.2%)      304.98      
(5.8%)    0.8% (  -8% -   11%) 0.599
                   HighSloppyPhrase       56.77      (3.1%)       57.25      
(3.6%)    0.8% (  -5% -    7%) 0.429
                       HighSpanNear       20.22      (1.2%)       20.39      
(1.7%)    0.9% (  -2% -    3%) 0.070
                MedIntervalsOrdered      103.06      (4.5%)      104.16      
(5.0%)    1.1% (  -7% -   10%) 0.473
                          OrHighMed      276.47      (2.9%)      279.58      
(4.4%)    1.1% (  -5% -    8%) 0.338
                         OrHighHigh       51.21      (3.4%)       51.80      
(4.2%)    1.1% (  -6% -    9%) 0.341
                         AndHighMed      230.11      (3.5%)      232.91      
(4.9%)    1.2% (  -6% -    9%) 0.365
                         AndHighLow     2054.63      (4.8%)     2080.21      
(4.8%)    1.2% (  -7% -   11%) 0.413
                    MedSloppyPhrase       89.55      (4.0%)       90.71      
(5.1%)    1.3% (  -7% -   10%) 0.373
                  HighTermTitleSort      193.20      (3.1%)      195.79      
(4.2%)    1.3% (  -5% -    8%) 0.256
                        AndHighHigh       67.40      (3.4%)       68.31      
(4.4%)    1.3% (  -6% -    9%) 0.276
               BrowseDateTaxoFacets       18.16      (3.0%)       18.42      
(3.1%)    1.5% (  -4% -    7%) 0.125
                  HighTermMonthSort     3913.52      (4.4%)     3974.57      
(6.4%)    1.6% (  -8% -   12%) 0.369
                       OrNotHighLow     1855.59      (5.1%)     1899.00      
(5.3%)    2.3% (  -7% -   13%) 0.155
               BrowseDateSSDVFacets        4.61     (19.0%)        5.32     
(25.4%)   15.5% ( -24% -   73%) 0.029
                     AndMedNotMonth      865.67      (2.8%)     1088.95      
(4.8%)   25.8% (  17% -   34%) 0.000
                    AndHighNotMonth       75.79      (0.6%)      275.41     
(14.5%)  263.4% ( 246% -  280%) 0.000
   ```
   
   I'll work on the rest in the next few days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to