[jira] [Updated] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

Michael McCandless (JIRA) Tue, 22 Jan 2013 12:58:13 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-4609:
---------------------------------------

    Attachment: LUCENE-4609.patch

New prototype collector, this time using simple int[] instead of PackedInts.

Trunk (base) vs prototype collector (comp):
{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                  IntNRQ      114.81      (6.2%)      112.35      (8.4%)   
-2.1% ( -15% -   13%)
                 Prefix3      176.77      (4.7%)      173.10      (7.4%)   
-2.1% ( -13% -   10%)
                Wildcard      254.90      (3.2%)      250.81      (3.3%)   
-1.6% (  -7% -    5%)
              AndHighLow      371.35      (2.6%)      366.23      (2.3%)   
-1.4% (  -6% -    3%)
                PKLookup      302.90      (1.7%)      299.45      (1.7%)   
-1.1% (  -4% -    2%)
                 Respell      143.44      (3.1%)      143.18      (3.4%)   
-0.2% (  -6% -    6%)
                  Fuzzy2       86.16      (2.0%)       88.32      (3.1%)    
2.5% (  -2% -    7%)
         LowSloppyPhrase       67.41      (1.8%)       69.45      (2.9%)    
3.0% (  -1% -    7%)
             LowSpanNear       37.85      (2.6%)       39.38      (3.0%)    
4.0% (  -1% -    9%)
            HighSpanNear       10.19      (2.6%)       10.62      (3.2%)    
4.2% (  -1% -   10%)
                 MedTerm      111.19      (1.4%)      117.18      (1.6%)    
5.4% (   2% -    8%)
                  Fuzzy1       83.60      (2.5%)       88.65      (2.8%)    
6.0% (   0% -   11%)
              AndHighMed      171.63      (1.4%)      182.81      (2.0%)    
6.5% (   3% -   10%)
             MedSpanNear       64.59      (2.0%)       69.13      (2.1%)    
7.0% (   2% -   11%)
               LowPhrase       57.89      (5.3%)       63.54      (4.5%)    
9.8% (   0% -   20%)
              HighPhrase       37.97     (11.0%)       41.79      (8.3%)   
10.1% (  -8% -   32%)
         MedSloppyPhrase       63.51      (2.0%)       70.31      (3.2%)   
10.7% (   5% -   16%)
                 LowTerm      145.85      (1.5%)      169.28      (1.6%)   
16.1% (  12% -   19%)
        HighSloppyPhrase        2.97      (8.4%)        3.47     (12.4%)   
16.6% (  -3% -   40%)
             AndHighHigh       46.49      (1.0%)       54.30      (1.2%)   
16.8% (  14% -   19%)
               MedPhrase      101.99      (4.1%)      128.31      (4.7%)   
25.8% (  16% -   36%)
               OrHighMed       24.97      (1.7%)       35.04      (3.6%)   
40.3% (  34% -   46%)
                HighTerm       26.22      (1.2%)       37.55      (3.6%)   
43.2% (  38% -   48%)
               OrHighLow       24.31      (1.5%)       34.89      (3.8%)   
43.5% (  37% -   49%)
              OrHighHigh       17.72      (1.4%)       26.44      (4.5%)   
49.3% (  42% -   55%)
{noformat}

So this is at least good news ... it means if we can speed up decode there are 
gain to be had ... but RAM usage is now 105231 KB (hmm not THAT much larger 
than 63880 KB ... interesting).
                
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
>                 Key: LUCENE-4609
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4609
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Priority: Minor
>         Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, 
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

Reply via email to