[ https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574481#comment-13574481 ]
Michael McCandless commented on LUCENE-4609: -------------------------------------------- OK the new format doesn't do very well. This is all wikipedia (6.6M "big" docs), 7 facet dims: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff MedTerm 46.85 (2.4%) 28.22 (0.7%) -39.8% ( -41% - -37%) HighTerm 19.09 (2.5%) 12.27 (0.9%) -35.7% ( -38% - -33%) OrHighLow 16.83 (2.8%) 11.21 (1.0%) -33.4% ( -36% - -30%) OrHighMed 16.35 (2.8%) 11.00 (1.0%) -32.7% ( -35% - -29%) Prefix3 12.87 (2.8%) 8.81 (0.9%) -31.5% ( -34% - -28%) Wildcard 27.22 (2.2%) 18.68 (0.7%) -31.4% ( -33% - -29%) LowTerm 110.58 (1.8%) 79.25 (0.6%) -28.3% ( -30% - -26%) OrHighHigh 8.61 (2.9%) 6.19 (1.3%) -28.1% ( -31% - -24%) IntNRQ 3.54 (2.9%) 2.55 (1.2%) -27.9% ( -31% - -24%) AndHighHigh 23.19 (1.4%) 17.67 (0.7%) -23.8% ( -25% - -22%) Fuzzy1 46.94 (1.7%) 40.34 (1.6%) -14.1% ( -17% - -10%) MedPhrase 110.00 (5.6%) 98.08 (4.2%) -10.8% ( -19% - -1%) MedSloppyPhrase 25.93 (2.5%) 23.37 (1.6%) -9.9% ( -13% - -5%) MedSpanNear 28.43 (2.5%) 25.68 (1.2%) -9.7% ( -13% - -6%) AndHighMed 105.06 (0.9%) 95.74 (1.0%) -8.9% ( -10% - -7%) LowPhrase 21.26 (6.2%) 19.86 (5.3%) -6.6% ( -16% - 5%) HighSpanNear 3.53 (2.0%) 3.30 (1.2%) -6.5% ( -9% - -3%) Fuzzy2 52.61 (2.6%) 49.64 (2.5%) -5.6% ( -10% - 0%) HighPhrase 17.44 (10.2%) 16.66 (9.5%) -4.5% ( -21% - 16%) HighSloppyPhrase 0.92 (7.3%) 0.88 (5.7%) -4.5% ( -16% - 9%) LowSloppyPhrase 20.28 (3.1%) 19.59 (2.0%) -3.4% ( -8% - 1%) Respell 46.30 (3.2%) 45.27 (3.4%) -2.2% ( -8% - 4%) LowSpanNear 8.36 (2.8%) 8.20 (1.9%) -1.9% ( -6% - 2%) AndHighLow 578.66 (3.0%) 569.71 (3.1%) -1.5% ( -7% - 4%) {noformat} Also it's quite a bit more RAM / disk consuming: 306 MB of .dvm/d files on disk vs 178 MB for trunk (and remember that part of this is the title SortedDV field. > Write a PackedIntsEncoder/Decoder for facets > -------------------------------------------- > > Key: LUCENE-4609 > URL: https://issues.apache.org/jira/browse/LUCENE-4609 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet > Reporter: Shai Erera > Priority: Minor > Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, > LUCENE-4609.patch, LUCENE-4609.patch, SemiPackedEncoder.patch > > > Today the facets API lets you write IntEncoder/Decoder to encode/decode the > category ordinals. We have several such encoders, including VInt (default), > and block encoders. > It would be interesting to implement and benchmark a > PackedIntsEncoder/Decoder, with potentially two variants: (1) receives > bitsPerValue up front, when you e.g. know that you have a small taxonomy and > the max value you can see and (2) one that decides for each doc on the > optimal bitsPerValue, writes it as a header in the byte[] or something. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org