[
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560129#comment-13560129
]
Michael McCandless commented on LUCENE-4609:
--------------------------------------------
Ugh! My DV total bytes numbers were too high: luceneutil also indexes
title field as DV. So ignore past byte sizes ... here's the [correct,
I hope!] byte sizes for the NO_PARENTS case, full 6.6M Wikipedia en
index: DV (index) 151208 KB, int[] (in RAM): 305889 KB. And
NO_PARENTS perf (base = trunk, comp = int[] collector):
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
Wildcard 74.70 (3.3%) 74.32 (1.9%)
-0.5% ( -5% - 4%)
PKLookup 245.87 (1.8%) 244.80 (2.0%)
-0.4% ( -4% - 3%)
HighPhrase 15.68 (5.7%) 15.72 (6.4%)
0.2% ( -11% - 12%)
Respell 111.09 (3.5%) 111.33 (3.7%)
0.2% ( -6% - 7%)
AndHighLow 97.90 (1.6%) 98.16 (1.4%)
0.3% ( -2% - 3%)
LowSpanNear 7.62 (3.8%) 7.67 (3.5%)
0.7% ( -6% - 8%)
Prefix3 45.94 (5.6%) 46.34 (2.7%)
0.9% ( -6% - 9%)
IntNRQ 18.04 (8.2%) 18.20 (4.6%)
0.9% ( -11% - 14%)
LowSloppyPhrase 17.77 (2.9%) 17.94 (4.8%)
1.0% ( -6% - 8%)
Fuzzy2 41.36 (2.4%) 42.68 (2.3%)
3.2% ( -1% - 8%)
LowPhrase 16.94 (2.4%) 17.65 (3.5%)
4.1% ( -1% - 10%)
HighSpanNear 2.98 (2.8%) 3.14 (2.1%)
5.3% ( 0% - 10%)
AndHighMed 49.18 (1.0%) 51.97 (0.7%)
5.7% ( 3% - 7%)
HighSloppyPhrase 0.90 (6.7%) 0.97 (12.6%)
6.8% ( -11% - 27%)
MedSloppyPhrase 18.54 (1.8%) 19.91 (3.0%)
7.4% ( 2% - 12%)
MedSpanNear 19.86 (1.6%) 21.36 (2.0%)
7.5% ( 3% - 11%)
MedPhrase 55.57 (2.2%) 60.31 (2.3%)
8.5% ( 3% - 13%)
Fuzzy1 33.38 (1.4%) 37.19 (1.9%)
11.4% ( 8% - 14%)
AndHighHigh 12.58 (1.2%) 14.66 (0.9%)
16.6% ( 14% - 18%)
LowTerm 40.41 (1.2%) 47.14 (1.4%)
16.6% ( 13% - 19%)
MedTerm 23.00 (1.4%) 27.14 (3.0%)
18.0% ( 13% - 22%)
OrHighMed 7.50 (2.2%) 10.16 (2.3%)
35.6% ( 30% - 40%)
OrHighLow 7.55 (2.0%) 10.30 (2.8%)
36.3% ( 30% - 41%)
HighTerm 7.92 (1.9%) 10.98 (2.8%)
38.6% ( 33% - 44%)
OrHighHigh 4.30 (2.7%) 6.39 (3.0%)
48.6% ( 41% - 55%)
{noformat}
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Shai Erera
> Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch,
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the
> category ordinals. We have several such encoders, including VInt (default),
> and block encoders.
> It would be interesting to implement and benchmark a
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and
> the max value you can see and (2) one that decides for each doc on the
> optimal bitsPerValue, writes it as a header in the byte[] or something.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]