[GitHub] [lucene] mikemccand commented on issue #12527: Optimize readInts24 performance for DocIdsWriter

via GitHub Thu, 31 Aug 2023 04:35:40 -0700


mikemccand commented on issue #12527:
URL: https://github.com/apache/lucene/issues/12527#issuecomment-1700871714


   I like this idea, reducing possible IO overhead.  But I tested it with 
`luceneutil` on `wikimediumall`:
   
   ```
                               Task    QPS base      StdDev   QPS base2      
StdDev                Pct diff p-value                                          
                                                              
                             IntNRQ       76.78     (10.7%)       62.62      
(7.6%)  -18.4% ( -33% -    0%) 0.000                                            
                                                              
                          OrHighLow      304.44      (4.9%)      298.02      
(2.5%)   -2.1% (  -9% -    5%) 0.085                                            
                                                              
                         OrHighHigh       34.42      (6.2%)       33.75      
(4.6%)   -1.9% ( -12% -    9%) 0.265                                            
                                                              
                          OrHighMed      132.82      (4.7%)      130.46      
(2.9%)   -1.8% (  -9% -    6%) 0.153                                            
                                                              
                           HighTerm      774.84      (5.6%)      764.08      
(5.7%)   -1.4% ( -12% -   10%) 0.437                                            
                                                              
                            MedTerm      903.16      (5.6%)      890.65      
(5.7%)   -1.4% ( -12% -   10%) 0.438                                            
                                                              
                       OrHighNotLow      513.31      (7.3%)      506.82      
(5.9%)   -1.3% ( -13% -   12%) 0.546                                            
                                                              
        BrowseRandomLabelSSDVFacets        9.36      (4.6%)        9.25      
(5.0%)   -1.2% ( -10% -    8%) 0.420                                            
                                                              
                        AndHighHigh       36.69      (3.1%)       36.28      
(3.9%)   -1.1% (  -7% -    6%) 0.311                                            
                                                              
                         AndHighMed      247.06      (2.4%)      244.49      
(3.2%)   -1.0% (  -6% -    4%) 0.251                                            
                                                              
                      OrNotHighHigh      446.63      (4.6%)      442.08      
(3.8%)   -1.0% (  -9% -    7%) 0.448                                            
                                                              
                       OrHighNotMed      506.58      (7.3%)      501.84      
(6.4%)   -0.9% ( -13% -   13%) 0.666                                            
                                                              
               BrowseDateTaxoFacets        8.32      (5.8%)        8.25      
(5.0%)   -0.9% ( -10% -   10%) 0.611                                            
                                                              
          BrowseDayOfYearTaxoFacets        8.29      (5.7%)        8.22      
(4.9%)   -0.9% ( -10% -   10%) 0.614                                            
                                                              
                      OrHighNotHigh      457.35      (5.8%)      453.90      
(4.9%)   -0.8% ( -10% -   10%) 0.658                                            
                                                              
                       OrNotHighMed      412.12      (2.5%)      409.12      
(2.0%)   -0.7% (  -5% -    3%) 0.315                                            
                                                              
                            LowTerm      705.46      (3.6%)      701.45      
(3.9%)   -0.6% (  -7% -    7%) 0.632                                            
                                                              
             OrHighMedDayTaxoFacets        9.38      (3.5%)        9.34      
(4.6%)   -0.5% (  -8% -    7%) 0.715                                            
                                                              
                   HighSloppyPhrase       10.43      (3.4%)       10.39      
(3.4%)   -0.4% (  -6% -    6%) 0.714                                            
                                                              
        BrowseRandomLabelTaxoFacets        7.59      (3.8%)        7.56      
(3.3%)   -0.4% (  -7% -    6%) 0.731                                            
                                                              
                        MedSpanNear       78.40      (2.6%)       78.11      
(1.8%)   -0.4% (  -4% -    4%) 0.589                                            
                                                              
               BrowseDateSSDVFacets        2.18      (1.6%)        2.18      
(1.1%)   -0.4% (  -3% -    2%) 0.399                                            
                                                              
               HighTermTitleBDVSort        9.16      (1.7%)        9.13      
(1.9%)   -0.3% (  -3% -    3%) 0.551                                            
                                                              
                           PKLookup      253.13      (0.9%)      252.32      
(0.8%)   -0.3% (  -1% -    1%) 0.215                                            
                                                              
                       OrNotHighLow      428.27      (2.2%)      427.00      
(1.6%)   -0.3% (  -4% -    3%) 0.627                                            
                                                              
                        LowSpanNear       16.45      (2.5%)       16.41      
(1.7%)   -0.3% (  -4% -    4%) 0.695                                            
                                                              
               HighIntervalsOrdered       12.58      (2.9%)       12.55      
(3.5%)   -0.3% (  -6% -    6%) 0.797                                            
                                                              
                       HighSpanNear       19.81      (2.1%)       19.78      
(1.7%)   -0.2% (  -3% -    3%) 0.794                                            
                                                              
                    LowSloppyPhrase       38.11      (1.9%)       38.06      
(2.1%)   -0.1% (  -4% -    3%) 0.846                                            
                                                              
                  HighTermTitleSort      263.09      (2.4%)      262.91      
(2.2%)   -0.1% (  -4% -    4%) 0.927                                            
                                                              
                         TermDTSort      482.30      (0.8%)      482.18      
(0.7%)   -0.0% (  -1% -    1%) 0.919                                            
                                                              
                  HighTermMonthSort     3449.43      (1.1%)     3448.77      
(0.7%)   -0.0% (  -1% -    1%) 0.948                                            
                                                              
                         AndHighLow      863.72      (0.9%)      863.69      
(1.2%)   -0.0% (  -2% -    2%) 0.991                                            
                                                              
                    MedSloppyPhrase       46.06      (4.3%)       46.06      
(4.4%)   -0.0% (  -8% -    9%) 1.000                                            
                                                              
                          MedPhrase       32.96      (2.6%)       32.96      
(2.9%)    0.0% (  -5% -    5%) 1.000                                            
                                                              
                             Fuzzy2       53.68      (1.2%)       53.68      
(1.2%)    0.0% (  -2% -    2%) 0.995                                            
                                                              
              BrowseMonthTaxoFacets        8.25      (0.2%)        8.25      
(0.2%)    0.0% (   0% -    0%) 0.789                                            
                                                              
              HighTermDayOfYearSort      751.68      (0.8%)      752.01      
(0.8%)    0.0% (  -1% -    1%) 0.866                                            
                                                              
           AndHighHighDayTaxoFacets        7.74      (2.5%)        7.75      
(1.7%)    0.1% (  -4% -    4%) 0.935                                            
                                                              
                LowIntervalsOrdered       26.92      (2.0%)       26.94      
(2.2%)    0.1% (  -4% -    4%) 0.900                                            
                                                              
               MedTermDayTaxoFacets       31.71      (2.9%)       31.74      
(2.3%)    0.1% (  -4% -    5%) 0.912                                            
                                                              
                           Wildcard       38.55      (2.2%)       38.60      
(1.9%)    0.1% (  -3% -    4%) 0.843                                            
                                                              
                MedIntervalsOrdered        4.74      (2.2%)        4.74      
(2.2%)    0.2% (  -4% -    4%) 0.817                                            
                                                              
              BrowseMonthSSDVFacets       14.04      (9.1%)       14.07     
(11.7%)    0.2% ( -18% -   23%) 0.948                                           
                                                               
                            Respell       39.73      (1.2%)       39.81      
(1.3%)    0.2% (  -2% -    2%) 0.568                                            
                                                              
                             Fuzzy1       62.49      (1.3%)       62.67      
(1.4%)    0.3% (  -2% -    2%) 0.484                                            
                                                              
            AndHighMedDayTaxoFacets       42.88      (1.4%)       43.02      
(1.1%)    0.3% (  -2% -    2%) 0.431                                            
                                                              
                          LowPhrase       20.89      (3.3%)       20.95      
(3.8%)    0.3% (  -6% -    7%) 0.775                                            
                                                              
                         HighPhrase       66.24      (3.4%)       66.51      
(3.6%)    0.4% (  -6% -    7%) 0.713                                            
                                                              
                            Prefix3      198.81      (2.2%)      199.70      
(1.8%)    0.4% (  -3% -    4%) 0.487                                            
                                                              
          BrowseDayOfYearSSDVFacets       12.50      (4.8%)       12.73     
(10.6%)    1.8% ( -12% -   18%) 0.487                                           
                                                               
   ```
   
   It looks like `IntNRQ` (which I think is the only tasks using BKD tree / 
points for range filtering) is upset with it with high confidence (`p=0.000`).  
I'm surprised the impact was THAT large.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] mikemccand commented on issue #12527: Optimize readInts24 performance for DocIdsWriter

Reply via email to