[
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576094#comment-13576094
]
Michael McCandless commented on LUCENE-4764:
--------------------------------------------
I re-tested trunk vs this new DV format, with all 9 dims on the full 6.6M
wikibig index. (The added 2 dims, username and categories, have many many
unique values):
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
HighPhrase 13.68 (8.1%) 13.64 (8.4%)
-0.3% ( -15% - 17%)
LowPhrase 15.05 (4.4%) 15.08 (4.4%)
0.1% ( -8% - 9%)
LowSpanNear 7.12 (2.5%) 7.17 (2.3%)
0.6% ( -4% - 5%)
AndHighLow 64.03 (1.3%) 64.55 (1.3%)
0.8% ( -1% - 3%)
HighSloppyPhrase 0.82 (5.7%) 0.83 (4.8%)
1.1% ( -8% - 12%)
Respell 44.90 (4.0%) 45.43 (4.3%)
1.2% ( -6% - 9%)
LowSloppyPhrase 15.37 (2.1%) 15.57 (1.8%)
1.3% ( -2% - 5%)
HighSpanNear 2.91 (1.8%) 2.95 (1.9%)
1.3% ( -2% - 5%)
Fuzzy2 28.55 (2.0%) 29.02 (2.1%)
1.7% ( -2% - 5%)
MedSloppyPhrase 16.56 (1.2%) 16.94 (1.2%)
2.3% ( 0% - 4%)
AndHighMed 39.47 (0.8%) 40.40 (1.0%)
2.4% ( 0% - 4%)
Fuzzy1 24.08 (1.3%) 24.73 (1.4%)
2.7% ( 0% - 5%)
MedSpanNear 17.70 (1.6%) 18.19 (1.6%)
2.8% ( 0% - 6%)
MedPhrase 41.06 (2.2%) 42.46 (2.6%)
3.4% ( -1% - 8%)
LowTerm 34.19 (0.9%) 35.69 (1.0%)
4.4% ( 2% - 6%)
AndHighHigh 11.92 (1.2%) 12.50 (1.1%)
4.9% ( 2% - 7%)
Wildcard 13.13 (1.8%) 14.43 (1.5%)
9.9% ( 6% - 13%)
OrHighMed 7.09 (2.7%) 7.85 (1.6%)
10.8% ( 6% - 15%)
OrHighLow 7.16 (2.3%) 7.93 (1.6%)
10.8% ( 6% - 15%)
HighTerm 7.59 (2.3%) 8.47 (1.6%)
11.5% ( 7% - 15%)
MedTerm 20.14 (1.9%) 22.82 (1.1%)
13.3% ( 10% - 16%)
Prefix3 5.78 (2.2%) 6.56 (1.5%)
13.4% ( 9% - 17%)
OrHighHigh 4.03 (2.3%) 4.65 (2.0%)
15.4% ( 10% - 20%)
IntNRQ 1.92 (2.2%) 2.45 (1.9%)
27.5% ( 22% - 32%)
{noformat}
145.3 MB for the new DV vs 129.0 MB for trunk = ~12.6% bigger.
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
> Key: LUCENE-4764
> URL: https://issues.apache.org/jira/browse/LUCENE-4764
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]