[
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560048#comment-13560048
]
Michael McCandless commented on LUCENE-4609:
--------------------------------------------
The above results were 1M index; here's the full wikipedia en (6.6M docs)
results:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
HighSpanNear 2.91 (2.1%) 2.90 (2.4%)
-0.6% ( -5% - 4%)
Prefix3 46.35 (4.0%) 46.07 (3.9%)
-0.6% ( -8% - 7%)
PKLookup 240.11 (1.4%) 238.95 (1.9%)
-0.5% ( -3% - 2%)
Wildcard 73.79 (2.2%) 73.48 (2.3%)
-0.4% ( -4% - 4%)
IntNRQ 18.05 (6.1%) 18.01 (5.9%)
-0.2% ( -11% - 12%)
Respell 96.78 (3.1%) 98.09 (3.3%)
1.3% ( -4% - 7%)
LowSloppyPhrase 17.63 (4.4%) 17.91 (3.8%)
1.6% ( -6% - 10%)
AndHighLow 108.80 (2.8%) 110.58 (4.2%)
1.6% ( -5% - 8%)
LowSpanNear 7.53 (4.8%) 7.67 (5.6%)
1.8% ( -8% - 12%)
HighSloppyPhrase 0.87 (10.1%) 0.90 (9.6%)
3.2% ( -14% - 25%)
Fuzzy2 42.22 (2.5%) 43.90 (2.7%)
4.0% ( -1% - 9%)
HighPhrase 15.32 (7.5%) 15.93 (5.4%)
4.0% ( -8% - 18%)
LowPhrase 17.09 (4.3%) 18.10 (2.9%)
5.9% ( -1% - 13%)
AndHighMed 52.60 (1.4%) 55.90 (2.1%)
6.3% ( 2% - 9%)
MedSpanNear 20.09 (2.0%) 21.44 (1.8%)
6.7% ( 2% - 10%)
MedSloppyPhrase 18.69 (3.0%) 20.00 (2.7%)
7.0% ( 1% - 13%)
Fuzzy1 33.68 (2.0%) 37.26 (2.2%)
10.6% ( 6% - 15%)
MedPhrase 57.00 (2.9%) 63.56 (3.3%)
11.5% ( 5% - 18%)
MedTerm 19.22 (1.2%) 21.70 (1.1%)
12.9% ( 10% - 15%)
LowTerm 41.98 (1.2%) 48.26 (1.8%)
15.0% ( 11% - 18%)
AndHighHigh 12.09 (1.0%) 13.98 (1.2%)
15.7% ( 13% - 18%)
HighTerm 7.11 (2.1%) 9.11 (2.0%)
28.1% ( 23% - 32%)
OrHighMed 6.67 (2.4%) 8.55 (2.1%)
28.2% ( 23% - 33%)
OrHighLow 6.76 (2.1%) 8.70 (2.3%)
28.6% ( 23% - 33%)
OrHighHigh 3.84 (2.5%) 5.33 (2.7%)
38.7% ( 32% - 45%)
{noformat}
On-disk size of _dv* is 464768 KB and in memory int[] is 669428 KB (44% more).
Next I'll try NO_PARENTS ord policy...
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Shai Erera
> Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch,
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the
> category ordinals. We have several such encoders, including VInt (default),
> and block encoders.
> It would be interesting to implement and benchmark a
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and
> the max value you can see and (2) one that decides for each doc on the
> optimal bitsPerValue, writes it as a header in the byte[] or something.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]