[
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-4609:
---------------------------------------
Attachment: LUCENE-4609.patch
Here's another attempt (totally prototype / not committable) at using
PackedInts to hold the ords ...
It's hacked up: it visits all byte[] from DocValues in the index and converts
to in-RAM PackedInts arrays, and then does all facet counting from those arrays.
But, the performance is sort of 'meh':
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
MedTerm 109.40 (1.5%) 102.06 (1.5%)
-6.7% ( -9% - -3%)
AndHighLow 374.95 (3.0%) 361.19 (2.6%)
-3.7% ( -8% - 1%)
AndHighMed 172.57 (1.5%) 169.35 (1.1%)
-1.9% ( -4% - 0%)
Prefix3 177.54 (6.2%) 174.26 (8.0%)
-1.8% ( -15% - 13%)
IntNRQ 116.07 (7.5%) 113.97 (9.3%)
-1.8% ( -17% - 16%)
Fuzzy2 86.19 (2.4%) 85.16 (2.8%)
-1.2% ( -6% - 4%)
AndHighHigh 46.76 (1.4%) 46.36 (1.1%)
-0.8% ( -3% - 1%)
LowTerm 146.56 (1.8%) 145.58 (1.4%)
-0.7% ( -3% - 2%)
HighTerm 26.35 (2.0%) 26.20 (2.1%)
-0.6% ( -4% - 3%)
MedSpanNear 64.98 (2.3%) 64.62 (2.8%)
-0.5% ( -5% - 4%)
LowSloppyPhrase 67.07 (2.3%) 66.80 (3.6%)
-0.4% ( -6% - 5%)
OrHighMed 25.18 (1.6%) 25.10 (2.1%)
-0.3% ( -3% - 3%)
Wildcard 256.33 (3.1%) 255.56 (3.5%)
-0.3% ( -6% - 6%)
PKLookup 305.42 (2.3%) 304.72 (2.1%)
-0.2% ( -4% - 4%)
OrHighLow 24.59 (1.3%) 24.54 (2.2%)
-0.2% ( -3% - 3%)
Fuzzy1 81.38 (3.0%) 81.60 (2.7%)
0.3% ( -5% - 6%)
Respell 141.17 (3.8%) 141.87 (3.9%)
0.5% ( -6% - 8%)
LowSpanNear 38.34 (3.2%) 38.78 (3.0%)
1.1% ( -4% - 7%)
MedSloppyPhrase 63.80 (2.1%) 64.53 (3.5%)
1.1% ( -4% - 6%)
HighSpanNear 10.20 (2.8%) 10.32 (3.1%)
1.2% ( -4% - 7%)
MedPhrase 103.16 (4.5%) 104.72 (2.1%)
1.5% ( -4% - 8%)
OrHighHigh 17.81 (1.5%) 18.18 (2.7%)
2.1% ( -2% - 6%)
LowPhrase 58.77 (5.5%) 60.49 (3.0%)
2.9% ( -5% - 12%)
HighPhrase 38.68 (10.0%) 40.46 (5.6%)
4.6% ( -10% - 22%)
HighSloppyPhrase 2.97 (7.9%) 3.22 (12.6%)
8.3% ( -11% - 31%)
{noformat}
Maybe if I used the bulk read PackedInts APIs instead it would be better...
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Shai Erera
> Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the
> category ordinals. We have several such encoders, including VInt (default),
> and block encoders.
> It would be interesting to implement and benchmark a
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and
> the max value you can see and (2) one that decides for each doc on the
> optimal bitsPerValue, writes it as a header in the byte[] or something.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]