[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

via GitHub Tue, 11 Jul 2023 08:19:46 -0700


tang-hi commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1631024688


   Hi, everyone. I tried the lazy compute idea that I mentioned before. First, 
I attempted to change the code in the main branch to lazy compute,  the 
benchmark results didn't show much difference. Then I applied the lazy compute 
algorithm to the vectorized code, and the benchmark results showed improved 
performance. However, I was surprised to see that the benchmark results for 
Prefix3 were not good. After that, I tested the scalar code, and the benchmark 
results showed a decrease in performance. I have the results of these two 
benchemark below.
   
   ## vectorized
   
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                            Prefix3       90.71      (2.4%)       80.20      
(4.7%)  -11.6% ( -18% -   -4%) 0.000
          BrowseDayOfYearSSDVFacets        6.89     (13.1%)        6.65     
(10.2%)   -3.5% ( -23% -   22%) 0.348
                           Wildcard       56.40      (3.0%)       54.70      
(2.9%)   -3.0% (  -8% -    3%) 0.001
               BrowseDateTaxoFacets        5.72     (11.1%)        5.58      
(7.9%)   -2.5% ( -19% -   18%) 0.415
        BrowseRandomLabelTaxoFacets        5.54      (8.6%)        5.42      
(7.4%)   -2.3% ( -16% -   14%) 0.369
          BrowseDayOfYearTaxoFacets        5.73     (10.3%)        5.61      
(7.5%)   -2.1% ( -18% -   17%) 0.454
              BrowseMonthSSDVFacets        7.01     (11.3%)        6.86     
(11.9%)   -2.1% ( -22% -   23%) 0.567
                             IntNRQ       42.16      (4.9%)       41.72      
(2.7%)   -1.1% (  -8% -    6%) 0.397
                             Fuzzy2       81.88      (1.8%)       82.32      
(2.3%)    0.5% (  -3% -    4%) 0.411
                MedIntervalsOrdered       10.00      (5.2%)       10.05      
(4.0%)    0.5% (  -8% -   10%) 0.711
                             Fuzzy1      114.12      (1.8%)      114.78      
(2.4%)    0.6% (  -3% -    4%) 0.385
        BrowseRandomLabelSSDVFacets        5.31      (5.5%)        5.35      
(8.0%)    0.7% ( -12% -   14%) 0.746
                         TermDTSort      159.43      (5.3%)      160.63      
(4.0%)    0.8% (  -8% -   10%) 0.612
               MedTermDayTaxoFacets       11.07      (3.0%)       11.16      
(2.8%)    0.8% (  -4% -    6%) 0.381
                  HighTermTitleSort      100.01      (3.8%)      101.02      
(4.6%)    1.0% (  -7% -    9%) 0.447
                        MedSpanNear       45.12      (2.5%)       45.61      
(2.3%)    1.1% (  -3% -    6%) 0.157
               HighIntervalsOrdered       14.06      (4.6%)       14.22      
(4.3%)    1.2% (  -7% -   10%) 0.410
               HighTermTitleBDVSort       12.39      (1.7%)       12.54      
(2.9%)    1.2% (  -3% -    5%) 0.104
                       HighSpanNear       17.67      (2.2%)       17.90      
(2.0%)    1.3% (  -2% -    5%) 0.047
                            Respell       68.69      (1.5%)       69.62      
(1.5%)    1.4% (  -1% -    4%) 0.005
               BrowseDateSSDVFacets        1.78     (13.0%)        1.81     
(11.8%)    1.6% ( -20% -   30%) 0.676
           AndHighHighDayTaxoFacets        3.79      (3.4%)        3.86      
(3.4%)    2.0% (  -4% -    9%) 0.067
            AndHighMedDayTaxoFacets       21.71      (2.5%)       22.16      
(2.4%)    2.1% (  -2% -    7%) 0.007
             OrHighMedDayTaxoFacets        8.33      (5.1%)        8.52      
(3.5%)    2.3% (  -6% -   11%) 0.103
                           PKLookup      266.99      (2.7%)      273.32      
(2.4%)    2.4% (  -2% -    7%) 0.003
                       OrHighNotLow      536.68      (3.5%)      551.31      
(4.7%)    2.7% (  -5% -   11%) 0.037
                           HighTerm      772.12      (2.2%)      795.02      
(3.7%)    3.0% (  -2% -    9%) 0.002
                            LowTerm      777.24      (2.1%)      803.48      
(4.0%)    3.4% (  -2% -    9%) 0.001
                            MedTerm      625.99      (3.1%)      647.55      
(5.4%)    3.4% (  -4% -   12%) 0.013
                    MedSloppyPhrase       18.33      (1.8%)       18.98      
(2.8%)    3.5% (   0% -    8%) 0.000
                       OrHighNotMed      461.93      (3.6%)      478.49      
(4.2%)    3.6% (  -4% -   11%) 0.004
                      OrHighNotHigh      526.32      (2.9%)      546.51      
(3.7%)    3.8% (  -2% -   10%) 0.000
                       OrNotHighMed      417.97      (2.6%)      434.74      
(3.1%)    4.0% (  -1% -    9%) 0.000
                      OrNotHighHigh      514.95      (2.9%)      535.68      
(2.7%)    4.0% (  -1% -    9%) 0.000
                         OrHighHigh       37.60      (3.5%)       39.18      
(4.5%)    4.2% (  -3% -   12%) 0.001
                   HighSloppyPhrase        2.28      (2.3%)        2.38      
(2.4%)    4.3% (   0% -    9%) 0.000
                        AndHighHigh       38.76      (2.1%)       40.63      
(2.8%)    4.8% (   0% -    9%) 0.000
              HighTermDayOfYearSort      343.83      (2.4%)      360.72      
(4.7%)    4.9% (  -2% -   12%) 0.000
                    LowSloppyPhrase       60.08      (1.6%)       63.04      
(2.2%)    4.9% (   1% -    8%) 0.000
                       OrNotHighLow      647.85      (1.7%)      680.90      
(2.5%)    5.1% (   0% -    9%) 0.000
                LowIntervalsOrdered        5.71      (3.6%)        6.00      
(2.8%)    5.1% (  -1% -   11%) 0.000
                         HighPhrase       28.70      (1.7%)       30.18      
(1.7%)    5.2% (   1% -    8%) 0.000
                        LowSpanNear        7.41      (2.1%)        7.80      
(1.9%)    5.3% (   1% -    9%) 0.000
                          MedPhrase       11.08      (1.5%)       11.71      
(2.1%)    5.8% (   2% -    9%) 0.000
                          OrHighLow      520.52      (1.6%)      552.94      
(3.5%)    6.2% (   1% -   11%) 0.000
                         AndHighLow     1225.95      (2.7%)     1308.18      
(5.8%)    6.7% (  -1% -   15%) 0.000
                  HighTermMonthSort     3123.59      (2.9%)     3381.95      
(5.3%)    8.3% (   0% -   16%) 0.000
                          LowPhrase       27.73      (1.3%)       30.05      
(2.2%)    8.4% (   4% -   12%) 0.000
                         AndHighMed      109.92      (1.8%)      119.78      
(2.7%)    9.0% (   4% -   13%) 0.000
                          OrHighMed      111.14      (2.2%)      121.78      
(3.3%)    9.6% (   3% -   15%) 0.000
              BrowseMonthTaxoFacets       16.77     (28.5%)       18.90      
(1.4%)   12.7% ( -13% -   59%) 0.046
   
   ## scalar
   
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                    LowSloppyPhrase        9.77      (4.4%)        5.73      
(2.1%)  -41.4% ( -45% -  -36%) 0.000
                        MedSpanNear       11.23      (4.6%)        6.95      
(2.3%)  -38.2% ( -43% -  -32%) 0.000
                         AndHighMed      123.37      (8.0%)       76.65      
(2.0%)  -37.9% ( -44% -  -30%) 0.000
            AndHighMedDayTaxoFacets       26.15      (2.0%)       16.34      
(1.6%)  -37.5% ( -40% -  -34%) 0.000
                       HighSpanNear       13.97      (3.2%)        9.09      
(1.8%)  -34.9% ( -38% -  -30%) 0.000
                MedIntervalsOrdered       21.95      (3.5%)       14.81      
(2.7%)  -32.5% ( -37% -  -27%) 0.000
                         AndHighLow     1070.91      (3.9%)      732.57      
(2.7%)  -31.6% ( -36% -  -25%) 0.000
                          LowPhrase       20.41      (2.9%)       14.15      
(2.3%)  -30.7% ( -34% -  -26%) 0.000
                    MedSloppyPhrase       28.12      (2.8%)       19.93      
(1.2%)  -29.1% ( -32% -  -25%) 0.000
                LowIntervalsOrdered        3.36      (4.6%)        2.39      
(3.3%)  -28.8% ( -35% -  -21%) 0.000
                       OrNotHighLow      640.26      (2.8%)      488.06      
(1.8%)  -23.8% ( -27% -  -19%) 0.000
           AndHighHighDayTaxoFacets       10.74      (2.1%)        8.19      
(2.1%)  -23.7% ( -27% -  -19%) 0.000
                        LowSpanNear      277.31      (1.8%)      212.48      
(1.9%)  -23.4% ( -26% -  -20%) 0.000
                        AndHighHigh       25.31      (5.6%)       19.62      
(2.7%)  -22.5% ( -29% -  -15%) 0.000
                       OrNotHighMed      528.99      (2.2%)      414.93      
(2.7%)  -21.6% ( -25% -  -17%) 0.000
                          OrHighLow      408.04      (3.1%)      322.73      
(2.7%)  -20.9% ( -25% -  -15%) 0.000
                   HighSloppyPhrase       13.08      (3.5%)       10.65      
(2.0%)  -18.6% ( -23% -  -13%) 0.000
                         OrHighHigh       23.91      (5.3%)       19.49      
(2.5%)  -18.5% ( -24% -  -11%) 0.000
                          MedPhrase      187.62      (2.3%)      153.52      
(1.7%)  -18.2% ( -21% -  -14%) 0.000
                          OrHighMed       31.30      (4.8%)       26.68      
(3.4%)  -14.8% ( -21% -   -6%) 0.000
               HighIntervalsOrdered        0.74      (5.2%)        0.64      
(4.7%)  -14.1% ( -22% -   -4%) 0.000
              HighTermDayOfYearSort      327.16      (2.4%)      288.14      
(3.1%)  -11.9% ( -16% -   -6%) 0.000
                  HighTermTitleSort      105.28      (6.8%)       93.57      
(5.1%)  -11.1% ( -21% -    0%) 0.000
                         HighPhrase      173.78      (2.5%)      156.10      
(1.4%)  -10.2% ( -13% -   -6%) 0.000
                         TermDTSort      166.57      (5.5%)      151.19      
(4.4%)   -9.2% ( -18% -    0%) 0.000
                            MedTerm      641.21      (3.4%)      583.88      
(3.2%)   -8.9% ( -15% -   -2%) 0.000
                      OrNotHighHigh      533.02      (2.8%)      486.18      
(2.0%)   -8.8% ( -13% -   -4%) 0.000
                       OrHighNotLow      516.66      (2.9%)      472.41      
(4.7%)   -8.6% ( -15% -    0%) 0.000
             OrHighMedDayTaxoFacets        8.58      (3.7%)        7.85      
(3.8%)   -8.4% ( -15% -    0%) 0.000
                       OrHighNotMed      496.57      (3.1%)      457.70      
(2.7%)   -7.8% ( -13% -   -2%) 0.000
                           HighTerm      587.86      (4.2%)      542.76      
(3.0%)   -7.7% ( -14% -    0%) 0.000
                      OrHighNotHigh      770.22      (2.5%)      723.29      
(2.9%)   -6.1% ( -11% -    0%) 0.000
                             Fuzzy2       31.75      (2.2%)       29.98      
(5.6%)   -5.6% ( -13% -    2%) 0.000
               MedTermDayTaxoFacets       38.41      (1.8%)       36.45      
(1.4%)   -5.1% (  -8% -   -1%) 0.000
                            LowTerm      900.74      (2.3%)      866.32      
(3.2%)   -3.8% (  -9% -    1%) 0.000
               HighTermTitleBDVSort        7.78      (2.9%)        7.61      
(3.6%)   -2.2% (  -8% -    4%) 0.034
              BrowseMonthSSDVFacets        9.22      (6.5%)        9.08      
(7.8%)   -1.4% ( -14% -   13%) 0.532
                           PKLookup      260.37      (3.8%)      258.96      
(3.6%)   -0.5% (  -7% -    7%) 0.645
                            Respell       54.02      (2.2%)       53.76      
(2.2%)   -0.5% (  -4% -    4%) 0.492
                             Fuzzy1       81.60      (2.4%)       81.29      
(1.7%)   -0.4% (  -4% -    3%) 0.557
        BrowseRandomLabelSSDVFacets        6.07      (8.1%)        6.05      
(8.3%)   -0.3% ( -15% -   17%) 0.894
          BrowseDayOfYearSSDVFacets        8.14      (6.9%)        8.12      
(7.3%)   -0.3% ( -13% -   14%) 0.885
                  HighTermMonthSort     2956.82      (4.0%)     2967.31      
(4.3%)    0.4% (  -7% -    9%) 0.787
                           Wildcard       73.82      (3.0%)       75.25      
(3.5%)    1.9% (  -4% -    8%) 0.061
                            Prefix3      190.39      (4.0%)      195.04      
(3.9%)    2.4% (  -5% -   10%) 0.050
        BrowseRandomLabelTaxoFacets        5.77     (11.3%)        5.92     
(10.1%)    2.6% ( -16% -   27%) 0.441
               BrowseDateTaxoFacets        6.26     (15.9%)        6.43     
(15.7%)    2.7% ( -24% -   40%) 0.588
          BrowseDayOfYearTaxoFacets        6.28     (15.9%)        6.47     
(16.0%)    2.9% ( -25% -   41%) 0.565
              BrowseMonthTaxoFacets       10.24     (52.4%)       10.54     
(54.2%)    2.9% ( -68% -  230%) 0.862
                             IntNRQ       43.93     (15.3%)       45.33      
(8.8%)    3.2% ( -18% -   32%) 0.419
               BrowseDateSSDVFacets        1.57     (11.2%)        1.64     
(13.6%)    4.6% ( -18% -   33%) 0.244
   
   
   My current conclusions are:
   1. Lazy compute works well when combined with vectors.
   2. Perhaps we can improve the performance of the scalar code. The current 
version may still have room for improvement, but I currently don't have any 
ideas. If you have any suggestions, please feel free to propose them or 
directly commit to this branch.
   3. The benchmark results for Prefix3 seem odd because it only shows a 
performance decrease in the vectorized version, even in the scalar version, the 
performance of Prefix3 doesn't decrease. Any clues?
   
   I would love to hear your opinions and welcome any ideas.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

Reply via email to