[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-3892:
------------------------------

    Attachment: LUCENE-3892_for.patch
                LUCENE-3892_pfor.patch

The new "3892_pfor" patch fixed some "SuppressingCodec" stuff since last two 
weeks. And the "3892_for" lazily implements "For" postingsformat based on 
current codes. These two patches are temporary separated, in order to prevent 
performance reduction for the sake of method overriding.

Currently, blocksize ranges from 32 to 128 are tested on both two patches. 
However, for those skipping-intensive queries, there is no significant 
performance gain when smaller blocksize was applied. 

Here is a previous result for PFor, with blockSize=64, comparing with 128(in 
brackets):
{noformat}
                Task    QPS Base StdDev Base    QPS PFor StdDev PFor      Pct 
diff
              Phrase        4.93        0.36        3.10        0.33  -47% -  
-25%  (-47% -  -25%)
          AndHighMed       27.92        2.26       19.16        1.72  -42% -  
-18%  (-37% -  -15%)
            SpanNear        2.73        0.16        1.96        0.24  -40% -  
-14%  (-36% -  -13%)
        SloppyPhrase        4.19        0.21        3.20        0.30  -34% -  
-12%  (-30% -   -6%)
            Wildcard       19.44        0.87       17.11        0.94  -20% -   
-2%  (-17% -    3%)
         AndHighHigh        7.50        0.38        6.61        0.59  -23% -    
1%  (-19% -    6%)
              IntNRQ        4.06        0.52        3.88        0.35  -22% -   
19%  (-16% -   24%)
             Prefix3       31.00        1.69       30.45        2.29  -13% -   
11%  ( -6% -   20%)
          OrHighHigh        4.16        0.47        4.11        0.34  -18% -   
20%  (-14% -   27%)
           OrHighMed        4.98        0.59        4.94        0.41  -18% -   
22%  (-14% -   27%)
             Respell       40.29        2.11       40.11        2.13  -10% -   
10%  (-15% -    2%)
        TermBGroup1M       20.50        0.32       20.52        0.80   -5% -    
5%  (  1% -   10%)
         TermGroup1M       13.51        0.43       13.61        0.40   -5% -    
7%  (  1% -    9%)
              Fuzzy1       43.20        1.83       44.02        1.95   -6% -   
11%  (-11% -    1%)
            PKLookup       87.16        1.78       89.52        0.94    0% -    
5%  ( -2% -    7%)
              Fuzzy2       16.09        0.80       16.54        0.77   -6% -   
13%  (-11% -    6%)
                Term       43.56        1.53       45.26        3.84   -8% -   
16%  (  2% -   26%)
      TermBGroup1M1P       21.33        0.64       22.24        1.23   -4% -   
13%  (  0% -   14%) 
{noformat}

Also, the For postingsformat shows few performance change. So I suppose the 
bottleneck isn't in this method: PForUtil.patchException.
Here is an example with blockSize=64:
{noformat}
                Task    QPS Base StdDev Base     QPS For  StdDev For      Pct 
diff
              Phrase        5.03        0.45        3.30        0.43  -47% -  
-18%
          AndHighMed       28.05        2.33       18.83        1.77  -43% -  
-19%
            SpanNear        2.69        0.18        1.94        0.25  -40% -  
-12%
        SloppyPhrase        4.19        0.20        3.22        0.35  -34% -  
-10%
         AndHighHigh        7.61        0.46        6.41        0.54  -27% -   
-2%
             Respell       41.36        1.65       37.94        2.42  -17% -    
1%
            Wildcard       19.20        0.77       17.89        0.99  -15% -    
2%
          OrHighHigh        4.22        0.37        3.94        0.32  -21% -   
10%
           OrHighMed        5.06        0.46        4.73        0.39  -21% -   
11%
              Fuzzy1       44.15        1.31       42.38        1.74  -10% -    
2%
              Fuzzy2       16.48        0.59       15.84        0.76  -11% -    
4%
         TermGroup1M       13.32        0.35       13.44        0.53   -5% -    
7%
            PKLookup       87.70        1.81       88.62        1.22   -2% -    
4%
        TermBGroup1M       20.14        0.47       20.40        0.59   -3% -    
6%
             Prefix3       30.31        1.49       31.08        2.26   -9% -   
15%
      TermBGroup1M1P       21.13        0.46       21.79        1.42   -5% -   
12%
              IntNRQ        3.96        0.45        4.14        0.46  -16% -   
31%
                Term       43.07        1.51       46.06        4.50   -6% -   
21%
{noformat}
                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892_for.patch, LUCENE-3892_pfor.patch, 
> LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, 
> LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to