[
https://issues.apache.org/jira/browse/LUCENE-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062020#comment-14062020
]
Michael McCandless commented on LUCENE-5819:
--------------------------------------------
I ran a quick perf test of Lucene41 vs OrdsLucene41, on wikimediumall:
{noformat}
Report after iter 19:
Task QPS base StdDev QPS comp StdDev
Pct diff
PKLookup 153.33 (8.7%) 131.17 (8.5%)
-14.4% ( -29% - 3%)
Respell 35.40 (5.4%) 31.41 (7.9%)
-11.3% ( -23% - 2%)
AndHighLow 241.05 (3.3%) 224.00 (14.7%)
-7.1% ( -24% - 11%)
Fuzzy2 69.73 (6.3%) 65.30 (5.5%)
-6.3% ( -17% - 5%)
Fuzzy1 44.32 (9.4%) 41.90 (11.8%)
-5.5% ( -24% - 17%)
LowTerm 313.68 (2.4%) 296.93 (10.8%)
-5.3% ( -18% - 8%)
Wildcard 39.40 (5.7%) 37.35 (9.7%)
-5.2% ( -19% - 10%)
IntNRQ 3.57 (9.3%) 3.41 (14.5%)
-4.6% ( -26% - 21%)
MedSloppyPhrase 4.98 (3.3%) 4.76 (12.7%)
-4.4% ( -19% - 12%)
MedPhrase 6.18 (3.8%) 5.95 (13.1%)
-3.7% ( -19% - 13%)
HighTerm 27.78 (5.8%) 26.75 (10.1%)
-3.7% ( -18% - 12%)
AndHighHigh 13.51 (2.0%) 13.02 (9.9%)
-3.6% ( -15% - 8%)
LowSloppyPhrase 134.71 (3.3%) 130.50 (12.1%)
-3.1% ( -17% - 12%)
Prefix3 8.88 (9.7%) 8.65 (15.6%)
-2.7% ( -25% - 25%)
LowPhrase 49.67 (3.1%) 48.38 (11.4%)
-2.6% ( -16% - 12%)
MedTerm 117.97 (4.5%) 115.01 (6.9%)
-2.5% ( -13% - 9%)
HighPhrase 7.87 (6.0%) 7.73 (13.3%)
-1.8% ( -19% - 18%)
HighSpanNear 4.68 (6.6%) 4.61 (14.7%)
-1.4% ( -21% - 21%)
AndHighMed 49.48 (1.6%) 48.95 (5.0%)
-1.1% ( -7% - 5%)
LowSpanNear 23.70 (4.6%) 23.55 (10.4%)
-0.7% ( -14% - 15%)
HighSloppyPhrase 5.90 (4.4%) 5.87 (11.2%)
-0.5% ( -15% - 15%)
OrNotHighLow 36.90 (12.3%) 37.07 (12.9%)
0.5% ( -22% - 29%)
OrHighHigh 4.16 (15.2%) 4.19 (16.7%)
0.8% ( -27% - 38%)
OrHighNotHigh 11.86 (13.8%) 11.98 (18.4%)
0.9% ( -27% - 38%)
MedSpanNear 4.32 (5.3%) 4.39 (10.7%)
1.5% ( -13% - 18%)
OrHighNotMed 26.10 (14.7%) 26.60 (12.8%)
1.9% ( -22% - 34%)
OrHighNotLow 19.61 (15.8%) 20.08 (13.9%)
2.4% ( -23% - 38%)
OrNotHighMed 13.84 (15.9%) 14.19 (16.7%)
2.6% ( -25% - 41%)
OrHighMed 27.09 (18.5%) 27.87 (19.4%)
2.9% ( -29% - 50%)
OrHighLow 36.24 (15.4%) 37.42 (15.3%)
3.2% ( -23% - 40%)
OrNotHighHigh 9.70 (16.6%) 10.11 (15.5%)
4.2% ( -23% - 43%)
{noformat}
Net/net the terms-dict heavy operations (PKLookup, respell, fuzzy,
maybe IntNRQ) take some hit, since there is added cost to decode
ordinals from the FST; I think the other changes are likely noise.
Also, the net terms index (size of FSTs that are loaded into RAM,
\*.tip/\*.tipo) grew from 31M to 46M (~48% larger)...
> Add block tree postings format that supports term ords
> ------------------------------------------------------
>
> Key: LUCENE-5819
> URL: https://issues.apache.org/jira/browse/LUCENE-5819
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/other
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5819.patch
>
>
> BlockTree is our default terms dictionary today, but it doesn't
> support term ords, which is an optional API in the postings format to
> retrieve the ordinal for the currently seek'd term, and also later
> seek by that ordinal e.g. to lookup the term.
> This can possibly be useful for e.g. faceting, and maybe at some point
> we can share the postings terms dict with the one used by sorted/set
> DV for cases when app wants to invert and facet on a given field.
> The older (3.x) block terms dict can easily support ords, and we have
> a Lucene41OrdsPF in test-framework, but it's not as fast / compact as
> block-tree, and doesn't (can't easily) implement an optimized
> intersect, but it could be for fields we'd want to facet on, these
> tradeoffs don't matter. It's nice to have options...
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]