64, etc.)

Adrien Grand (JIRA) Wed, 08 Aug 2012 12:38:24 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431324#comment-13431324
 ]


Adrien Grand edited comment on LUCENE-3892 at 8/8/12 7:38 PM:
--------------------------------------------------------------

I did some changes to the {{BlockPacked}} codec:
 - encoding and decoding using int[] instead of long[]
 - selection of the format based on a configurable overhead ratio.

The results are encouraging (using acceptableOverheadRatio = PackedInts.DEFAULT 
= 20%):
{noformat}
                Task    QPS 3892 StdDev 3892QPS 3892-packedStdDev 3892-packed   
   Pct diff
            PKLookup      256.93        8.89      256.85        7.47   -6% -    
6%
           OrHighLow      145.14        9.86      145.14        9.35  -12% -   
14%
             Respell      110.26        1.84      110.27        2.01   -3% -    
3%
         AndHighHigh      112.97        0.81      113.19        2.17   -2% -    
2%
              Fuzzy1      102.15        1.47      102.86        3.13   -3% -    
5%
          OrHighHigh       94.56        6.56       95.43        6.35  -11% -   
15%
              Fuzzy2       42.49        0.77       42.89        1.43   -4% -    
6%
           OrHighMed      175.30       11.34      177.42       10.83  -10% -   
14%
          AndHighLow     1925.02       23.92     1952.57       48.68   -2% -    
5%
          HighPhrase        8.96        0.41        9.11        0.46   -7% -   
11%
            Wildcard      189.79        2.13      193.12        1.57    0% -    
3%
        HighSpanNear        6.47        0.15        6.59        0.25   -4% -    
8%
             Prefix3      256.67        2.58      262.40        2.84    0% -    
4%
             LowTerm     1746.52       52.80     1789.54       54.30   -3% -    
8%
            HighTerm      238.70       13.46      245.63       16.60   -9% -   
16%
             MedTerm      923.64       38.19      951.18       46.85   -5% -   
12%
          AndHighMed      364.46        3.65      377.09       10.03    0% -    
7%
              IntNRQ       56.58        1.02       58.84        0.80    0% -    
7%
    HighSloppyPhrase       11.73        0.30       12.40        0.62   -2% -   
13%
         LowSpanNear       29.64        0.96       32.44        0.98    2% -   
16%
         MedSpanNear       22.96        0.72       25.16        0.85    2% -   
16%
           MedPhrase       40.99        1.25       45.09        1.24    3% -   
16%
     LowSloppyPhrase       37.88        0.99       41.98        1.49    4% -   
17%
           LowPhrase       64.40        2.04       71.84        1.41    5% -   
17%
     MedSloppyPhrase       42.29        1.16       47.32        1.54    5% -   
18%
{noformat}

I hope this will be confirmed on your computers this time .:-)
                
      was (Author: jpountz):
    I did some changes to the {{BlockPacked}} codec:
 - encoding and decoding using int[] instead of long[]
 - selection of the format based on a configurable overhead ratio.

The results are encouraging:
{noformat}
                Task    QPS 3892 StdDev 3892QPS 3892-packedStdDev 3892-packed   
   Pct diff
            PKLookup      256.93        8.89      256.85        7.47   -6% -    
6%
           OrHighLow      145.14        9.86      145.14        9.35  -12% -   
14%
             Respell      110.26        1.84      110.27        2.01   -3% -    
3%
         AndHighHigh      112.97        0.81      113.19        2.17   -2% -    
2%
              Fuzzy1      102.15        1.47      102.86        3.13   -3% -    
5%
          OrHighHigh       94.56        6.56       95.43        6.35  -11% -   
15%
              Fuzzy2       42.49        0.77       42.89        1.43   -4% -    
6%
           OrHighMed      175.30       11.34      177.42       10.83  -10% -   
14%
          AndHighLow     1925.02       23.92     1952.57       48.68   -2% -    
5%
          HighPhrase        8.96        0.41        9.11        0.46   -7% -   
11%
            Wildcard      189.79        2.13      193.12        1.57    0% -    
3%
        HighSpanNear        6.47        0.15        6.59        0.25   -4% -    
8%
             Prefix3      256.67        2.58      262.40        2.84    0% -    
4%
             LowTerm     1746.52       52.80     1789.54       54.30   -3% -    
8%
            HighTerm      238.70       13.46      245.63       16.60   -9% -   
16%
             MedTerm      923.64       38.19      951.18       46.85   -5% -   
12%
          AndHighMed      364.46        3.65      377.09       10.03    0% -    
7%
              IntNRQ       56.58        1.02       58.84        0.80    0% -    
7%
    HighSloppyPhrase       11.73        0.30       12.40        0.62   -2% -   
13%
         LowSpanNear       29.64        0.96       32.44        0.98    2% -   
16%
         MedSpanNear       22.96        0.72       25.16        0.85    2% -   
16%
           MedPhrase       40.99        1.25       45.09        1.24    3% -   
16%
     LowSloppyPhrase       37.88        0.99       41.98        1.49    4% -   
17%
           LowPhrase       64.40        2.04       71.84        1.41    5% -   
17%
     MedSloppyPhrase       42.29        1.16       47.32        1.54    5% -   
18%
{noformat}

I hope this will be confirmed on your computers this time .:-)
                  
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

Reply via email to