[jira] [Commented] (LUCENE-4764) Faster but more RAM/Disk consuming DocValuesFormat for facets

Michael McCandless (JIRA) Tue, 12 Feb 2013 02:49:17 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576538#comment-13576538
 ]


Michael McCandless commented on LUCENE-4764:
--------------------------------------------

I decided to test whether the specialization (checking if DV format is
FacetDVFormat and "directly" accessing its address/bytes) helps:

Base = new DV format; comp = new DV format + spec, 9 dims:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
             LowSpanNear        7.15      (2.3%)        7.14      (2.0%)   
-0.1% (  -4% -    4%)
                 Respell       45.60      (3.4%)       45.64      (3.3%)    
0.1% (  -6% -    7%)
                  Fuzzy1       24.79      (1.4%)       24.85      (1.3%)    
0.3% (  -2% -    2%)
             MedSpanNear       18.07      (1.3%)       18.12      (1.6%)    
0.3% (  -2% -    3%)
              AndHighMed       40.34      (0.8%)       40.47      (0.9%)    
0.3% (  -1% -    2%)
               MedPhrase       42.25      (2.9%)       42.40      (2.7%)    
0.4% (  -5% -    6%)
                 LowTerm       35.62      (1.1%)       35.76      (1.3%)    
0.4% (  -2% -    2%)
              AndHighLow       64.53      (1.7%)       64.78      (1.2%)    
0.4% (  -2% -    3%)
                  Fuzzy2       29.06      (1.6%)       29.19      (1.7%)    
0.4% (  -2% -    3%)
         MedSloppyPhrase       16.88      (1.1%)       16.97      (1.5%)    
0.5% (  -2% -    3%)
               LowPhrase       15.01      (4.7%)       15.09      (4.8%)    
0.5% (  -8% -   10%)
            HighSpanNear        2.92      (1.9%)        2.94      (1.7%)    
0.7% (  -2% -    4%)
         LowSloppyPhrase       15.48      (1.6%)       15.60      (2.1%)    
0.7% (  -2% -    4%)
              HighPhrase       13.50      (8.8%)       13.60      (8.6%)    
0.7% ( -15% -   19%)
                 MedTerm       22.64      (1.1%)       22.91      (1.2%)    
1.2% (  -1% -    3%)
                Wildcard       14.29      (0.9%)       14.47      (1.4%)    
1.3% (   0% -    3%)
             AndHighHigh       12.40      (0.9%)       12.56      (1.2%)    
1.3% (   0% -    3%)
        HighSloppyPhrase        0.82      (4.3%)        0.83      (5.2%)    
1.9% (  -7% -   11%)
               OrHighMed        7.74      (1.3%)        7.90      (1.4%)    
2.0% (   0% -    4%)
               OrHighLow        7.82      (1.4%)        7.98      (1.7%)    
2.0% (   0% -    5%)
                HighTerm        8.35      (1.1%)        8.52      (1.5%)    
2.1% (   0% -    4%)
                 Prefix3        6.48      (1.1%)        6.62      (1.1%)    
2.3% (   0% -    4%)
              OrHighHigh        4.58      (1.6%)        4.69      (1.5%)    
2.3% (   0% -    5%)
                  IntNRQ        2.41      (1.6%)        2.48      (1.5%)    
2.7% (   0% -    5%)
{noformat}

Same, but w/ 7 dims:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
             MedSpanNear       28.73      (1.9%)       28.33      (2.7%)   
-1.4% (  -5% -    3%)
                 Respell       45.08      (4.7%)       44.73      (4.0%)   
-0.8% (  -9% -    8%)
             LowSpanNear        8.38      (2.6%)        8.33      (2.5%)   
-0.6% (  -5% -    4%)
                  Fuzzy2       52.13      (3.5%)       51.85      (3.5%)   
-0.5% (  -7% -    6%)
            HighSpanNear        3.53      (1.7%)        3.51      (1.9%)   
-0.5% (  -3% -    3%)
                  Fuzzy1       46.42      (2.5%)       46.29      (2.3%)   
-0.3% (  -4% -    4%)
               MedPhrase      109.24      (5.5%)      109.16      (5.9%)   
-0.1% ( -10% -   11%)
              HighPhrase       17.28     (10.4%)       17.28     (10.6%)    
0.0% ( -19% -   23%)
        HighSloppyPhrase        0.92      (8.0%)        0.92      (5.9%)    
0.0% ( -12% -   15%)
             AndHighHigh       23.28      (1.2%)       23.29      (0.8%)    
0.0% (  -1% -    2%)
               LowPhrase       21.08      (6.1%)       21.10      (6.6%)    
0.1% ( -11% -   13%)
              AndHighLow      586.97      (2.5%)      587.46      (2.3%)    
0.1% (  -4% -    5%)
         LowSloppyPhrase       20.38      (3.1%)       20.41      (2.6%)    
0.1% (  -5% -    6%)
                 LowTerm      110.38      (2.0%)      110.52      (1.4%)    
0.1% (  -3% -    3%)
              AndHighMed      105.08      (1.0%)      105.31      (0.9%)    
0.2% (  -1% -    2%)
                Wildcard       27.23      (2.5%)       27.30      (1.8%)    
0.3% (  -3% -    4%)
         MedSloppyPhrase       25.94      (3.2%)       26.04      (2.1%)    
0.4% (  -4% -    5%)
                  IntNRQ        3.52      (3.6%)        3.54      (2.6%)    
0.6% (  -5% -    7%)
                HighTerm       19.05      (3.3%)       19.18      (2.7%)    
0.6% (  -5% -    6%)
                 Prefix3       12.89      (3.3%)       12.97      (2.3%)    
0.7% (  -4% -    6%)
                 MedTerm       46.70      (3.0%)       47.06      (2.6%)    
0.8% (  -4% -    6%)
               OrHighLow       17.06      (4.2%)       17.22      (3.5%)    
1.0% (  -6% -    9%)
               OrHighMed       16.54      (4.2%)       16.71      (3.6%)    
1.0% (  -6% -    9%)
              OrHighHigh        8.72      (4.4%)        8.83      (3.7%)    
1.2% (  -6% -    9%)
{noformat}

So net/net the specialization doesn't help much here...

                
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
>                 Key: LUCENE-4764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4764) Faster but more RAM/Disk consuming DocValuesFormat for facets

Reply via email to