[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430373#comment-13430373
 ] 

Adrien Grand commented on LUCENE-3892:
--------------------------------------

I backported Mike's changes to the {{BlockPacked}} codec and tried to 
understand why it was slower than {{Block}}...

The use of {{java.nio.*Buffer}} seemed to be the bottleneck 
({{ByteBuffer.asLongBuffer}} and {{ByteBuffer.getLong}} especially are _very_ 
slow) of the decoding step so I switched back to decoding from long[] (instead 
of LongBuffer) and added direct decoding from byte[] to avoid having to convert 
the bytes to longs before decoding.

Tests passed with -Dtests.postingsformat=BlockPacked. Here are the results of 
the benchmark (unfortunately, it started before Mike committed r1370179):

{noformat}
                Task    QPS 3892 StdDev 3892QPS 3892-packedStdDev 3892-packed   
   Pct diff
            PKLookup      259.41        9.06      255.77        8.89   -8% -    
5%
          AndHighLow     1656.30       50.44     1653.85       55.05   -6% -    
6%
         AndHighHigh       82.90        1.82       83.47        2.52   -4% -    
6%
          AndHighMed      274.76       11.11      278.51       13.42   -7% -   
10%
             Prefix3      285.41        4.82      289.60        6.31   -2% -    
5%
            HighTerm      230.78       14.33      235.16       20.61  -12% -   
18%
              IntNRQ       55.91        1.03       57.13        2.73   -4% -    
9%
             LowTerm     1720.10       47.06     1759.16       55.47   -3% -    
8%
            Wildcard      290.54        3.82      297.39        5.42    0% -    
5%
             MedTerm      733.01       35.38      750.46       50.37   -8% -   
14%
        HighSpanNear        6.93        0.23        7.12        0.39   -6% -   
11%
          HighPhrase        6.46        0.22        6.65        0.46   -7% -   
14%
             Respell       96.11        2.84       99.00        3.98   -3% -   
10%
          OrHighHigh       38.07        2.53       39.23        3.06  -10% -   
19%
              Fuzzy2       50.29        1.70       51.87        2.25   -4% -   
11%
           MedPhrase       26.20        0.94       27.03        1.07   -4% -   
11%
           OrHighMed      138.83        7.76      143.54        9.79   -8% -   
16%
              Fuzzy1      100.58        2.15      104.21        3.99   -2% -    
9%
    HighSloppyPhrase        5.26        0.11        5.45        0.24   -3% -   
10%
           OrHighLow       78.43        5.55       81.80        6.89  -10% -   
21%
         MedSpanNear       32.75        1.13       34.28        1.73   -3% -   
13%
           LowPhrase       90.27        3.20       95.06        3.58   -2% -   
13%
         LowSpanNear       46.40        1.95       48.89        2.40   -3% -   
15%
     MedSloppyPhrase       36.29        1.00       38.59        1.46    0% -   
13%
     LowSloppyPhrase       37.41        1.11       40.48        1.39    1% -   
15%
{noformat}

Mike, Billy, could you check that {{BLockPacked}} is at least as fast as 
{{Block}} on your computer too?
                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to