[jira] [Commented] (LUCENE-4764) Faster but more RAM/Disk consuming DocValuesFormat for facets

Michael McCandless (JIRA) Sun, 10 Feb 2013 15:05:13 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575570#comment-13575570
 ]


Michael McCandless commented on LUCENE-4764:
--------------------------------------------

bq. I think that it would actually be interesting to test only VInt, without 
dgap. Because the ords seem to be arbitrary, I'm not even sure what they buy 
us. Mike, can you try that? 

No dgap compression, 1M docs, 7 dims per doc.  Looks like we lost a bit:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 MedTerm      258.50      (1.5%)      252.69      (1.6%)   
-2.2% (  -5% -    0%)
               OrHighLow       55.96      (2.4%)       54.73      (2.0%)   
-2.2% (  -6% -    2%)
               OrHighMed       57.47      (2.4%)       56.33      (2.1%)   
-2.0% (  -6% -    2%)
              HighPhrase       44.47     (10.9%)       43.63     (10.7%)   
-1.9% ( -21% -   22%)
              OrHighHigh       38.53      (2.6%)       37.88      (2.3%)   
-1.7% (  -6% -    3%)
                HighTerm       65.49      (1.2%)       64.70      (1.9%)   
-1.2% (  -4% -    1%)
                 Prefix3       46.82      (1.5%)       46.30      (1.2%)   
-1.1% (  -3% -    1%)
               MedPhrase      149.78      (5.5%)      148.17      (5.3%)   
-1.1% ( -11% -   10%)
             AndHighHigh       93.50      (1.0%)       92.73      (0.8%)   
-0.8% (  -2% -    1%)
        HighSloppyPhrase        3.26      (6.8%)        3.24      (8.0%)   
-0.8% ( -14% -   15%)
            HighSpanNear       11.60      (1.7%)       11.51      (1.9%)   
-0.8% (  -4% -    2%)
               LowPhrase       73.57      (5.6%)       73.00      (5.0%)   
-0.8% ( -10% -   10%)
             LowSpanNear       43.68      (2.0%)       43.35      (2.3%)   
-0.8% (  -4% -    3%)
             MedSpanNear       90.77      (1.5%)       90.10      (1.4%)   
-0.7% (  -3% -    2%)
         LowSloppyPhrase       82.66      (1.9%)       82.13      (1.7%)   
-0.6% (  -4% -    2%)
         MedSloppyPhrase       92.12      (2.2%)       91.65      (2.2%)   
-0.5% (  -4% -    3%)
                 LowTerm      466.62      (1.4%)      464.83      (1.9%)   
-0.4% (  -3% -    2%)
              AndHighMed      347.12      (1.7%)      348.61      (1.1%)    
0.4% (  -2% -    3%)
                Wildcard      120.82      (1.2%)      121.50      (1.6%)    
0.6% (  -2% -    3%)
                  IntNRQ       23.40      (1.6%)       23.76      (1.4%)    
1.5% (  -1% -    4%)
                  Fuzzy1       80.87      (2.4%)       82.38      (2.6%)    
1.9% (  -3% -    7%)
                 Respell       71.83      (3.0%)       73.46      (3.2%)    
2.3% (  -3% -    8%)
              AndHighLow     1159.47      (3.8%)     1189.72      (2.4%)    
2.6% (  -3% -    9%)
                  Fuzzy2       88.04      (3.0%)       91.48      (3.7%)    
3.9% (  -2% -   10%)
{noformat}

Trunk bytes for the DV facet field was 9219009, and no-dgap was
10163419 (~10% larger).  So net/net dGap seems to help!

                
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
>                 Key: LUCENE-4764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4764) Faster but more RAM/Disk consuming DocValuesFormat for facets

Reply via email to