[jira] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

Feng Guo (Jira) Wed, 29 Dec 2021 23:31:06 -0800

    [ https://issues.apache.org/jira/browse/LUCENE-10334 ]


    Feng Guo deleted comment on LUCENE-10334:
    -----------------------------------

was (Author: gf2121):
Thanks [~rcmuir] for suggestion! I tried some optimizations on this patch:

1. I replaced {{DirectWriter#unsignedBitsRequired}} with 
{{PackedInts#unsignedBitsRequired}} at first since ForUtil can support all bpv, 
this change can reduce some index size. But now i rollbacked this change since 
the decode of 1,2,4,8,12,16... could also be a bit faster in ForUtil.

2. {{ForUtil#decode}} will do a {{switch}} for each call, this can be avoided 
by the way like what we do in {{{}DirectReader{}}}, choose a implementation of 
an interface at the beginning. I applied this change in ForUtil.

I'm not sure which is the major optimization but the report seems better now:
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
         AndHighMedDayTaxoFacets       71.49      (2.1%)       64.72      
(2.0%)   -9.5% ( -13% -   -5%) 0.000
            MedTermDayTaxoFacets       25.79      (2.6%)       24.00      
(1.8%)   -6.9% ( -11% -   -2%) 0.000
        AndHighHighDayTaxoFacets       13.13      (3.4%)       12.63      
(3.1%)   -3.9% ( -10% -    2%) 0.000
          OrHighMedDayTaxoFacets       13.71      (4.1%)       13.41      
(4.7%)   -2.2% ( -10% -    6%) 0.118
                        PKLookup      204.87      (3.9%)      203.03      
(3.6%)   -0.9% (  -8% -    6%) 0.450
                         Prefix3      113.85      (3.6%)      113.32      
(4.6%)   -0.5% (  -8% -    8%) 0.724
                    HighSpanNear       25.34      (2.5%)       25.26      
(3.1%)   -0.3% (  -5% -    5%) 0.714
                     LowSpanNear       55.96      (2.0%)       55.80      
(2.1%)   -0.3% (  -4% -    3%) 0.658
                     MedSpanNear       56.84      (2.4%)       56.90      
(2.2%)    0.1% (  -4% -    4%) 0.895
                 MedSloppyPhrase       26.57      (1.8%)       26.60      
(1.9%)    0.1% (  -3% -    3%) 0.831
                HighSloppyPhrase       30.20      (3.7%)       30.24      
(3.6%)    0.2% (  -6% -    7%) 0.890
                       OrHighMed       49.96      (2.1%)       50.06      
(1.7%)    0.2% (  -3% -    4%) 0.742
                      AndHighMed       96.70      (2.9%)       96.95      
(2.6%)    0.3% (  -5% -    5%) 0.772
             LowIntervalsOrdered       23.32      (4.6%)       23.38      
(4.5%)    0.3% (  -8% -    9%) 0.856
                      OrHighHigh       38.09      (1.9%)       38.20      
(1.8%)    0.3% (  -3% -    4%) 0.643
                      TermDTSort      128.55     (14.7%)      128.94     
(11.6%)    0.3% ( -22% -   31%) 0.942
                          Fuzzy1       99.54      (7.1%)       99.86      
(8.0%)    0.3% ( -13% -   16%) 0.893
            HighIntervalsOrdered       15.58      (2.6%)       15.65      
(2.6%)    0.4% (  -4% -    5%) 0.636
                         Respell       63.96      (1.9%)       64.22      
(2.3%)    0.4% (  -3% -    4%) 0.542
                   OrHighNotHigh      611.12      (5.8%)      613.85      
(6.2%)    0.4% ( -10% -   13%) 0.814
             MedIntervalsOrdered       59.48      (5.2%)       59.75      
(5.1%)    0.5% (  -9% -   11%) 0.780
                     AndHighHigh       58.76      (3.0%)       59.16      
(3.0%)    0.7% (  -5% -    6%) 0.478
                   OrNotHighHigh      619.53      (6.0%)      623.79      
(7.1%)    0.7% ( -11% -   14%) 0.740
                      HighPhrase       31.00      (2.5%)       31.26      
(2.7%)    0.8% (  -4% -    6%) 0.307
                      AndHighLow      828.41      (5.9%)      835.65      
(7.1%)    0.9% ( -11% -   14%) 0.672
                    OrNotHighLow      986.46      (6.8%)      995.13     
(10.5%)    0.9% ( -15% -   19%) 0.752
            HighTermTitleBDVSort      110.39     (12.3%)      111.38     
(11.1%)    0.9% ( -20% -   27%) 0.807
                          IntNRQ      151.29      (2.6%)      152.96      
(3.5%)    1.1% (  -4% -    7%) 0.262
                         LowTerm     1876.18      (7.8%)     1897.19      
(8.3%)    1.1% ( -13% -   18%) 0.660
           HighTermDayOfYearSort      108.34     (18.9%)      109.87     
(17.4%)    1.4% ( -29% -   46%) 0.805
               HighTermMonthSort       65.84     (11.0%)       66.78     
(11.7%)    1.4% ( -19% -   27%) 0.689
                    OrHighNotMed      770.05      (5.3%)      782.54      
(8.8%)    1.6% ( -11% -   16%) 0.480
                        Wildcard      182.10      (5.5%)      185.24      
(7.2%)    1.7% ( -10% -   15%) 0.394
                 LowSloppyPhrase       33.75      (6.6%)       34.35      
(8.8%)    1.8% ( -12% -   18%) 0.478
                       MedPhrase      161.57      (3.8%)      164.62      
(6.1%)    1.9% (  -7% -   12%) 0.242
                    OrHighNotLow      679.46      (7.2%)      693.59      
(7.6%)    2.1% ( -11% -   18%) 0.374
                    OrNotHighMed      690.91      (7.4%)      706.15      
(8.8%)    2.2% ( -13% -   19%) 0.390
                        HighTerm     1388.14      (6.3%)     1420.26      
(7.8%)    2.3% ( -11% -   17%) 0.302
                       LowPhrase      410.16      (5.0%)      420.38      
(5.0%)    2.5% (  -7% -   13%) 0.114
                       OrHighLow      479.96      (5.1%)      492.39      
(5.7%)    2.6% (  -7% -   14%) 0.128
                         MedTerm     1575.41      (5.9%)     1618.88      
(8.2%)    2.8% ( -10% -   17%) 0.221
                          Fuzzy2       64.75      (8.3%)       66.76      
(8.3%)    3.1% ( -12% -   21%) 0.237
           BrowseMonthTaxoFacets       14.39     (12.1%)       18.58     
(17.3%)   29.1% (   0% -   66%) 0.000
     BrowseRandomLabelTaxoFacets       12.01      (8.5%)       17.01     
(18.2%)   41.6% (  13% -   74%) 0.000
            BrowseDateTaxoFacets       13.72     (11.2%)       19.83     
(26.5%)   44.5% (   6% -   92%) 0.000
       BrowseDayOfYearTaxoFacets       13.84     (11.5%)       20.03     
(27.4%)   44.8% (   5% -   94%) 0.000
     BrowseRandomLabelSSDVFacets       10.31      (2.6%)       17.72      
(4.2%)   71.9% (  63% -   80%) 0.000
           BrowseMonthSSDVFacets       15.56      (3.3%)       34.58     
(12.3%)  122.3% ( 103% -  142%) 0.000
       BrowseDayOfYearSSDVFacets       14.17      (2.9%)       32.91     
(11.6%)  132.3% ( 114% -  151%) 0.000
{code}
So now the major problem is no longer "SSDV faster Taxo slower", but "Dense 
faster Sparse slower" (as expected). I wonder if this is a acceptable trade off?

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-10334
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10334
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Feng Guo
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Previous talk is here: [https://github.com/apache/lucene/pull/557]
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> -*Benchmark based on wiki10m*- (Previous benchmark results are wrong so i 
> deleted it to avoid misleading, let's see the benchmark in comments.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

Reply via email to