Re: [PR] Speedup PriorityQueue a little [lucene]

via GitHub Sun, 20 Oct 2024 15:20:58 -0700


original-brownbear commented on code in PR #13936:
URL: https://github.com/apache/lucene/pull/13936#discussion_r1807955670



##########
lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java:
##########
@@ -270,7 +280,7 @@ public final boolean remove(T element) {
     return false;
   }
 
-  private final boolean upHeap(int origPos) {
+  private boolean upHeap(int origPos, T[] heap) {

Review Comment:
   Whether or not it's worth doing this kind of optimization for the observed 
gain is a tricky question. From the perspective of a user of a large (and 
read-heavy) ES, Opensearch or similar deployment, an O(1%) gain might translate 
into a lot of dollars saved and this kind of thing is well worth the effort.
   Personally, an extra 10 lines of code for the observed speedups seems like a 
reasonable deal, but that's admittedly quite subjective. Maybe a stronger 
argument would be: optimizing this kind of thing in core hot code removes 
potential bottlenecks from the system, enabling other optimizations. If the 
core logic puts massive pressure on e.g. the CPU cache then optimizations (or 
regressions!) in higher-level code are masked on CPUs with smaller caches. So 
doing a 1% optimization and living with slightly more complicated code makes 
more sense here than a 1% gain would in more "peripheral" code. Also, you could 
use that same angle and argue that this code hardly ever gets touched, so the 
maintenance burden added matters less than it would elsewhere.
   
   That said :) as far as the technical details go I don't think it can hoist 
out those reads and it's not an exclusively C2/hotstpot specific thing either. 
Since Java allows using reflection to update final field values (except for 
fields that are either static, on a record or on a hidden classs) the compiler 
can't hoist the field access out of the loop I think (maybe in some happy cases 
escape analysis helps here).
   You can make the JIT hoist these things via `-XX:+TrustFinalNonStaticFields` 
which gives me a result like (main vs main with that flag set).
   
   <details>
   <summary>results</summary>
   <pre>
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
               HighTermTitleBDVSort       25.72      (7.0%)       25.77      
(6.5%)    0.2% ( -12% -   14%) 0.897
        BrowseRandomLabelSSDVFacets        3.37      (6.0%)        3.40      
(4.2%)    1.0% (  -8% -   11%) 0.409
                          OrHighMed      214.06      (3.7%)      216.37      
(3.8%)    1.1% (  -6% -    8%) 0.206
                      OrHighNotHigh      353.55      (8.4%)      358.52      
(8.9%)    1.4% ( -14% -   20%) 0.475
                        AndHighHigh      111.32      (4.9%)      113.08      
(5.5%)    1.6% (  -8% -   12%) 0.179
                      OrNotHighHigh      567.88      (4.8%)      577.94      
(4.9%)    1.8% (  -7% -   12%) 0.108
                           PKLookup      241.21      (2.1%)      245.53      
(2.1%)    1.8% (  -2% -    6%) 0.000
                           HighTerm      455.94      (6.6%)      464.35      
(7.5%)    1.8% ( -11% -   17%) 0.250
                            MedTerm      590.06      (6.5%)      601.24      
(6.0%)    1.9% (  -9% -   15%) 0.182
                         AndHighMed      156.22      (3.1%)      159.19      
(2.9%)    1.9% (  -3% -    8%) 0.005
                            LowTerm      750.87      (4.6%)      765.45      
(4.2%)    1.9% (  -6% -   11%) 0.052
        BrowseRandomLabelTaxoFacets        4.48      (8.6%)        4.57      
(3.9%)    2.0% (  -9% -   15%) 0.182
                       OrNotHighMed      479.29      (4.6%)      489.00      
(5.4%)    2.0% (  -7% -   12%) 0.074
                  HighTermMonthSort     1515.68      (6.4%)     1546.97      
(7.0%)    2.1% ( -10% -   16%) 0.171
                         OrHighHigh       85.48      (4.6%)       87.32      
(5.3%)    2.2% (  -7% -   12%) 0.055
               MedTermDayTaxoFacets       19.13      (3.0%)       19.55      
(4.1%)    2.2% (  -4% -    9%) 0.007
                MedIntervalsOrdered       28.59      (6.3%)       29.23      
(4.7%)    2.2% (  -8% -   14%) 0.079
                          OrHighLow      610.70      (5.0%)      624.94      
(5.0%)    2.3% (  -7% -   13%) 0.040
                       OrHighNotMed      474.52      (5.5%)      485.78      
(5.7%)    2.4% (  -8% -   14%) 0.061
                             Fuzzy2       66.51      (3.2%)       68.09      
(3.0%)    2.4% (  -3% -    8%) 0.001
               BrowseDateSSDVFacets        1.24      (7.7%)        1.27      
(8.1%)    2.4% ( -12% -   19%) 0.181
                        MedSpanNear      119.05      (4.4%)      121.94      
(4.4%)    2.4% (  -6% -   11%) 0.016
                  HighTermTitleSort       76.83      (4.8%)       78.72      
(3.7%)    2.5% (  -5% -   11%) 0.011
           AndHighHighDayTaxoFacets       14.60      (3.8%)       14.96      
(3.5%)    2.5% (  -4% -   10%) 0.003
              BrowseMonthTaxoFacets       11.04     (38.5%)       11.32     
(40.3%)    2.5% ( -55% -  132%) 0.778
                       OrNotHighLow     1089.24      (4.0%)     1117.30      
(4.0%)    2.6% (  -5% -   10%) 0.004
                         TermDTSort      188.79      (4.6%)      193.74      
(4.9%)    2.6% (  -6% -   12%) 0.015
                           Wildcard      426.59      (4.2%)      437.79      
(4.2%)    2.6% (  -5% -   11%) 0.006
                          MedPhrase       78.10      (3.4%)       80.38      
(3.2%)    2.9% (  -3% -    9%) 0.000
                            Prefix3     1068.70      (7.7%)     1100.07      
(7.7%)    2.9% ( -11% -   19%) 0.094
                         AndHighLow     1546.10      (5.3%)     1591.97      
(6.0%)    3.0% (  -7% -   15%) 0.020
                LowIntervalsOrdered      134.11      (6.2%)      138.10      
(5.0%)    3.0% (  -7% -   15%) 0.019
                    MedSloppyPhrase       47.07      (4.5%)       48.49      
(3.7%)    3.0% (  -5% -   11%) 0.001
            AndHighMedDayTaxoFacets       65.36      (2.3%)       67.38      
(2.3%)    3.1% (  -1% -    7%) 0.000
                        LowSpanNear      175.93      (3.7%)      181.36      
(4.7%)    3.1% (  -5% -   11%) 0.001
                         HighPhrase      131.54      (7.2%)      135.70      
(5.8%)    3.2% (  -9% -   17%) 0.033
                             Fuzzy1      108.08      (3.4%)      111.62      
(2.1%)    3.3% (  -2% -    9%) 0.000
          BrowseDayOfYearSSDVFacets        4.52      (7.7%)        4.67      
(7.9%)    3.4% ( -11% -   20%) 0.056
                       OrHighNotLow      550.21      (7.0%)      569.01      
(7.9%)    3.4% ( -10% -   19%) 0.043
              HighTermDayOfYearSort      380.03      (7.6%)      393.27      
(6.6%)    3.5% (  -9% -   19%) 0.030
                       HighSpanNear       11.37      (4.5%)       11.77      
(6.0%)    3.5% (  -6% -   14%) 0.004
                            Respell       54.77      (1.6%)       56.69      
(1.7%)    3.5% (   0% -    6%) 0.000
                   HighSloppyPhrase       30.28      (5.2%)       31.40      
(4.8%)    3.7% (  -5% -   14%) 0.001
                          LowPhrase       76.63      (5.6%)       79.65      
(5.6%)    3.9% (  -6% -   16%) 0.002
             OrHighMedDayTaxoFacets        6.78      (6.2%)        7.05      
(6.9%)    4.0% (  -8% -   18%) 0.007
                             IntNRQ       78.26      (6.5%)       81.38      
(7.2%)    4.0% (  -9% -   18%) 0.010
                    LowSloppyPhrase       65.45      (6.6%)       68.14      
(6.0%)    4.1% (  -7% -   17%) 0.004
               HighIntervalsOrdered        9.16      (6.5%)        9.59      
(5.9%)    4.6% (  -7% -   18%) 0.001
              BrowseMonthSSDVFacets        4.48     (10.4%)        4.70     
(12.4%)    4.8% ( -16% -   30%) 0.062
               BrowseDateTaxoFacets        5.38     (10.8%)        5.67     
(12.7%)    5.4% ( -16% -   32%) 0.043
          BrowseDayOfYearTaxoFacets        5.44     (10.6%)        5.74     
(12.7%)    5.5% ( -16% -   32%) 0.039
   
   </pre>
   </details>
   
   So to me it feels like manually hoisting field access is a generally valid 
optimization in a world that has reflective writes to final fields. To me, 
reducing field access is not in the same category as e.g. extracting cold paths 
artifically to make a method inline or other such tricks that are specific to 
C2 and hardware. This is just giving the compiler input that it cannot 
practically work out with the constraints imposed by the language and the JIT's 
runtime cost needing 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Speedup PriorityQueue a little [lucene]

Reply via email to