[
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574481#comment-13574481
]
Michael McCandless commented on LUCENE-4609:
--------------------------------------------
OK the new format doesn't do very well. This is all wikipedia (6.6M "big"
docs), 7 facet dims:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
MedTerm 46.85 (2.4%) 28.22 (0.7%)
-39.8% ( -41% - -37%)
HighTerm 19.09 (2.5%) 12.27 (0.9%)
-35.7% ( -38% - -33%)
OrHighLow 16.83 (2.8%) 11.21 (1.0%)
-33.4% ( -36% - -30%)
OrHighMed 16.35 (2.8%) 11.00 (1.0%)
-32.7% ( -35% - -29%)
Prefix3 12.87 (2.8%) 8.81 (0.9%)
-31.5% ( -34% - -28%)
Wildcard 27.22 (2.2%) 18.68 (0.7%)
-31.4% ( -33% - -29%)
LowTerm 110.58 (1.8%) 79.25 (0.6%)
-28.3% ( -30% - -26%)
OrHighHigh 8.61 (2.9%) 6.19 (1.3%)
-28.1% ( -31% - -24%)
IntNRQ 3.54 (2.9%) 2.55 (1.2%)
-27.9% ( -31% - -24%)
AndHighHigh 23.19 (1.4%) 17.67 (0.7%)
-23.8% ( -25% - -22%)
Fuzzy1 46.94 (1.7%) 40.34 (1.6%)
-14.1% ( -17% - -10%)
MedPhrase 110.00 (5.6%) 98.08 (4.2%)
-10.8% ( -19% - -1%)
MedSloppyPhrase 25.93 (2.5%) 23.37 (1.6%)
-9.9% ( -13% - -5%)
MedSpanNear 28.43 (2.5%) 25.68 (1.2%)
-9.7% ( -13% - -6%)
AndHighMed 105.06 (0.9%) 95.74 (1.0%)
-8.9% ( -10% - -7%)
LowPhrase 21.26 (6.2%) 19.86 (5.3%)
-6.6% ( -16% - 5%)
HighSpanNear 3.53 (2.0%) 3.30 (1.2%)
-6.5% ( -9% - -3%)
Fuzzy2 52.61 (2.6%) 49.64 (2.5%)
-5.6% ( -10% - 0%)
HighPhrase 17.44 (10.2%) 16.66 (9.5%)
-4.5% ( -21% - 16%)
HighSloppyPhrase 0.92 (7.3%) 0.88 (5.7%)
-4.5% ( -16% - 9%)
LowSloppyPhrase 20.28 (3.1%) 19.59 (2.0%)
-3.4% ( -8% - 1%)
Respell 46.30 (3.2%) 45.27 (3.4%)
-2.2% ( -8% - 4%)
LowSpanNear 8.36 (2.8%) 8.20 (1.9%)
-1.9% ( -6% - 2%)
AndHighLow 578.66 (3.0%) 569.71 (3.1%)
-1.5% ( -7% - 4%)
{noformat}
Also it's quite a bit more RAM / disk consuming: 306 MB of .dvm/d files on disk
vs 178 MB for trunk (and remember that part of this is the title SortedDV field.
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Shai Erera
> Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch,
> LUCENE-4609.patch, LUCENE-4609.patch, SemiPackedEncoder.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the
> category ordinals. We have several such encoders, including VInt (default),
> and block encoders.
> It would be interesting to implement and benchmark a
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and
> the max value you can see and (2) one that decides for each doc on the
> optimal bitsPerValue, writes it as a header in the byte[] or something.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]