[
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-4609:
---------------------------------------
Attachment: LUCENE-4609.patch
New prototype collector, this time using simple int[] instead of PackedInts.
Trunk (base) vs prototype collector (comp):
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
IntNRQ 114.81 (6.2%) 112.35 (8.4%)
-2.1% ( -15% - 13%)
Prefix3 176.77 (4.7%) 173.10 (7.4%)
-2.1% ( -13% - 10%)
Wildcard 254.90 (3.2%) 250.81 (3.3%)
-1.6% ( -7% - 5%)
AndHighLow 371.35 (2.6%) 366.23 (2.3%)
-1.4% ( -6% - 3%)
PKLookup 302.90 (1.7%) 299.45 (1.7%)
-1.1% ( -4% - 2%)
Respell 143.44 (3.1%) 143.18 (3.4%)
-0.2% ( -6% - 6%)
Fuzzy2 86.16 (2.0%) 88.32 (3.1%)
2.5% ( -2% - 7%)
LowSloppyPhrase 67.41 (1.8%) 69.45 (2.9%)
3.0% ( -1% - 7%)
LowSpanNear 37.85 (2.6%) 39.38 (3.0%)
4.0% ( -1% - 9%)
HighSpanNear 10.19 (2.6%) 10.62 (3.2%)
4.2% ( -1% - 10%)
MedTerm 111.19 (1.4%) 117.18 (1.6%)
5.4% ( 2% - 8%)
Fuzzy1 83.60 (2.5%) 88.65 (2.8%)
6.0% ( 0% - 11%)
AndHighMed 171.63 (1.4%) 182.81 (2.0%)
6.5% ( 3% - 10%)
MedSpanNear 64.59 (2.0%) 69.13 (2.1%)
7.0% ( 2% - 11%)
LowPhrase 57.89 (5.3%) 63.54 (4.5%)
9.8% ( 0% - 20%)
HighPhrase 37.97 (11.0%) 41.79 (8.3%)
10.1% ( -8% - 32%)
MedSloppyPhrase 63.51 (2.0%) 70.31 (3.2%)
10.7% ( 5% - 16%)
LowTerm 145.85 (1.5%) 169.28 (1.6%)
16.1% ( 12% - 19%)
HighSloppyPhrase 2.97 (8.4%) 3.47 (12.4%)
16.6% ( -3% - 40%)
AndHighHigh 46.49 (1.0%) 54.30 (1.2%)
16.8% ( 14% - 19%)
MedPhrase 101.99 (4.1%) 128.31 (4.7%)
25.8% ( 16% - 36%)
OrHighMed 24.97 (1.7%) 35.04 (3.6%)
40.3% ( 34% - 46%)
HighTerm 26.22 (1.2%) 37.55 (3.6%)
43.2% ( 38% - 48%)
OrHighLow 24.31 (1.5%) 34.89 (3.8%)
43.5% ( 37% - 49%)
OrHighHigh 17.72 (1.4%) 26.44 (4.5%)
49.3% ( 42% - 55%)
{noformat}
So this is at least good news ... it means if we can speed up decode there are
gain to be had ... but RAM usage is now 105231 KB (hmm not THAT much larger
than 63880 KB ... interesting).
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Shai Erera
> Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch,
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the
> category ordinals. We have several such encoders, including VInt (default),
> and block encoders.
> It would be interesting to implement and benchmark a
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and
> the max value you can see and (2) one that decides for each doc on the
> optimal bitsPerValue, writes it as a header in the byte[] or something.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]