[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430507#comment-13430507
 ] 

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

I tried smaller block sizes than 128.  Here's 128 (base) vs 64:
{noformat}
                Task    QPS base StdDev base QPS block64StdDev block64      Pct 
diff
         AndHighHigh       23.91        0.57       22.28        0.27  -10% -   
-3%
          AndHighMed       60.63        1.02       56.96        1.13   -9% -   
-2%
     MedSloppyPhrase        7.69        0.01        7.30        0.13   -6% -   
-3%
    HighSloppyPhrase        1.93        0.02        1.83        0.04   -8% -   
-1%
     LowSloppyPhrase        6.84        0.03        6.57        0.11   -6% -   
-1%
              Fuzzy1       65.49        0.85       63.50        1.68   -6% -    
0%
          HighPhrase        1.57        0.04        1.53        0.04   -7% -    
3%
           OrHighLow       22.89        0.98       22.38        0.61   -8% -    
4%
           OrHighMed       17.65        0.70       17.27        0.43   -8% -    
4%
              IntNRQ        9.50        0.48        9.33        0.36  -10% -    
7%
          OrHighHigh        8.98        0.36        8.84        0.19   -7% -    
4%
            HighTerm       29.60        2.64       29.16        1.44  -13% -   
13%
              Fuzzy2       65.54        0.86       64.63        2.13   -5% -    
3%
            Wildcard       45.27        1.27       44.78        0.48   -4% -    
2%
             MedTerm      150.40       12.65      148.99        6.63  -12% -   
12%
             Prefix3       72.55        2.55       72.31        1.02   -5% -    
4%
             LowTerm      421.62       38.27      422.40        9.47  -10% -   
12%
         LowSpanNear        7.55        0.34        7.62        0.22   -6% -    
8%
        HighSpanNear        1.34        0.09        1.35        0.06   -9% -   
12%
           MedPhrase       12.45        0.24       12.66        0.13   -1% -    
4%
             Respell       59.54        1.80       60.95        1.86   -3% -    
8%
         MedSpanNear        3.70        0.24        3.80        0.15   -7% -   
14%
            PKLookup      154.56        2.45      158.96        1.89    0% -    
5%
           LowPhrase       20.21        0.33       20.95        0.15    1% -    
6%
          AndHighLow      577.81       12.46      637.96       29.80    3% -   
18%
{noformat}

And 128 (base) vs 32:
{noformat}
                Task    QPS base StdDev base QPS block64StdDev block64      Pct 
diff
         AndHighHigh       23.86        0.52       20.68        0.59  -17% -   
-8%
              IntNRQ        9.48        0.38        8.84        0.46  -15% -    
2%
    HighSloppyPhrase        1.87        0.04        1.76        0.06  -11% -    
0%
             Prefix3       72.65        2.18       68.24        2.96  -12% -    
1%
            HighTerm       29.91        1.40       28.28        2.94  -19% -    
9%
            Wildcard       44.74        0.83       42.43        1.49  -10% -    
0%
        HighSpanNear        1.37        0.08        1.30        0.07  -15% -    
6%
             MedTerm      152.73        5.28      145.45       14.69  -17% -    
8%
     MedSloppyPhrase        7.46        0.12        7.12        0.25   -9% -    
0%
          HighPhrase        1.57        0.03        1.50        0.01   -7% -   
-1%
           OrHighLow       22.94        0.70       22.00        1.10  -11% -    
3%
          AndHighMed       58.72        1.79       56.60        1.95   -9% -    
2%
     LowSloppyPhrase        6.67        0.10        6.44        0.20   -7% -    
1%
           OrHighMed       17.52        0.56       17.00        0.82  -10% -    
5%
         LowSpanNear        7.53        0.35        7.34        0.39  -11% -    
7%
          OrHighHigh        8.84        0.31        8.62        0.43  -10% -    
6%
         MedSpanNear        3.79        0.20        3.71        0.21  -12% -    
9%
            PKLookup      153.34        3.22      150.19        4.91   -7% -    
3%
              Fuzzy1       62.93        1.77       62.28        2.23   -7% -    
5%
             LowTerm      410.23       21.57      410.83       35.19  -13% -   
14%
           MedPhrase       12.55        0.14       12.65        0.08    0% -    
2%
           LowPhrase       20.42        0.17       20.77        0.21    0% -    
3%
              Fuzzy2       61.44        3.12       64.13        1.97   -3% -   
13%
             Respell       56.65        3.29       60.21        1.39   -1% -   
15%
          AndHighLow      588.05       12.37      720.63       19.33   16% -   
28%
{noformat}

It looks like there's some speedup to AndHighLow and LowPhrase ... but
slowdowns in other (harder) queries... so I think net/net we should
leave block size at 128.

                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to