[jira] [Commented] (LUCENE-5819) Add block tree postings format that supports term ords

Michael McCandless (JIRA) Tue, 15 Jul 2014 06:03:22 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062020#comment-14062020
 ]


Michael McCandless commented on LUCENE-5819:
--------------------------------------------

I ran a quick perf test of Lucene41 vs OrdsLucene41, on wikimediumall:
{noformat}
Report after iter 19:
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                PKLookup      153.33      (8.7%)      131.17      (8.5%)  
-14.4% ( -29% -    3%)
                 Respell       35.40      (5.4%)       31.41      (7.9%)  
-11.3% ( -23% -    2%)
              AndHighLow      241.05      (3.3%)      224.00     (14.7%)   
-7.1% ( -24% -   11%)
                  Fuzzy2       69.73      (6.3%)       65.30      (5.5%)   
-6.3% ( -17% -    5%)
                  Fuzzy1       44.32      (9.4%)       41.90     (11.8%)   
-5.5% ( -24% -   17%)
                 LowTerm      313.68      (2.4%)      296.93     (10.8%)   
-5.3% ( -18% -    8%)
                Wildcard       39.40      (5.7%)       37.35      (9.7%)   
-5.2% ( -19% -   10%)
                  IntNRQ        3.57      (9.3%)        3.41     (14.5%)   
-4.6% ( -26% -   21%)
         MedSloppyPhrase        4.98      (3.3%)        4.76     (12.7%)   
-4.4% ( -19% -   12%)
               MedPhrase        6.18      (3.8%)        5.95     (13.1%)   
-3.7% ( -19% -   13%)
                HighTerm       27.78      (5.8%)       26.75     (10.1%)   
-3.7% ( -18% -   12%)
             AndHighHigh       13.51      (2.0%)       13.02      (9.9%)   
-3.6% ( -15% -    8%)
         LowSloppyPhrase      134.71      (3.3%)      130.50     (12.1%)   
-3.1% ( -17% -   12%)
                 Prefix3        8.88      (9.7%)        8.65     (15.6%)   
-2.7% ( -25% -   25%)
               LowPhrase       49.67      (3.1%)       48.38     (11.4%)   
-2.6% ( -16% -   12%)
                 MedTerm      117.97      (4.5%)      115.01      (6.9%)   
-2.5% ( -13% -    9%)
              HighPhrase        7.87      (6.0%)        7.73     (13.3%)   
-1.8% ( -19% -   18%)
            HighSpanNear        4.68      (6.6%)        4.61     (14.7%)   
-1.4% ( -21% -   21%)
              AndHighMed       49.48      (1.6%)       48.95      (5.0%)   
-1.1% (  -7% -    5%)
             LowSpanNear       23.70      (4.6%)       23.55     (10.4%)   
-0.7% ( -14% -   15%)
        HighSloppyPhrase        5.90      (4.4%)        5.87     (11.2%)   
-0.5% ( -15% -   15%)
            OrNotHighLow       36.90     (12.3%)       37.07     (12.9%)    
0.5% ( -22% -   29%)
              OrHighHigh        4.16     (15.2%)        4.19     (16.7%)    
0.8% ( -27% -   38%)
           OrHighNotHigh       11.86     (13.8%)       11.98     (18.4%)    
0.9% ( -27% -   38%)
             MedSpanNear        4.32      (5.3%)        4.39     (10.7%)    
1.5% ( -13% -   18%)
            OrHighNotMed       26.10     (14.7%)       26.60     (12.8%)    
1.9% ( -22% -   34%)
            OrHighNotLow       19.61     (15.8%)       20.08     (13.9%)    
2.4% ( -23% -   38%)
            OrNotHighMed       13.84     (15.9%)       14.19     (16.7%)    
2.6% ( -25% -   41%)
               OrHighMed       27.09     (18.5%)       27.87     (19.4%)    
2.9% ( -29% -   50%)
               OrHighLow       36.24     (15.4%)       37.42     (15.3%)    
3.2% ( -23% -   40%)
           OrNotHighHigh        9.70     (16.6%)       10.11     (15.5%)    
4.2% ( -23% -   43%)
{noformat}

Net/net the terms-dict heavy operations (PKLookup, respell, fuzzy,
maybe IntNRQ) take some hit, since there is added cost to decode
ordinals from the FST; I think the other changes are likely noise.

Also, the net terms index (size of FSTs that are loaded into RAM,
\*.tip/\*.tipo) grew from 31M to 46M (~48% larger)...


> Add block tree postings format that supports term ords
> ------------------------------------------------------
>
>                 Key: LUCENE-5819
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5819
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/other
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 4.10
>
>         Attachments: LUCENE-5819.patch
>
>
> BlockTree is our default terms dictionary today, but it doesn't
> support term ords, which is an optional API in the postings format to
> retrieve the ordinal for the currently seek'd term, and also later
> seek by that ordinal e.g. to lookup the term.
> This can possibly be useful for e.g. faceting, and maybe at some point
> we can share the postings terms dict with the one used by sorted/set
> DV for cases when app wants to invert and facet on a given field.
> The older (3.x) block terms dict can easily support ords, and we have
> a Lucene41OrdsPF in test-framework, but it's not as fast / compact as
> block-tree, and doesn't (can't easily) implement an optimized
> intersect, but it could be for fields we'd want to facet on, these
> tradeoffs don't matter.  It's nice to have options...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5819) Add block tree postings format that supports term ords

Reply via email to