[ https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575570#comment-13575570 ]
Michael McCandless commented on LUCENE-4764: -------------------------------------------- bq. I think that it would actually be interesting to test only VInt, without dgap. Because the ords seem to be arbitrary, I'm not even sure what they buy us. Mike, can you try that? No dgap compression, 1M docs, 7 dims per doc. Looks like we lost a bit: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff MedTerm 258.50 (1.5%) 252.69 (1.6%) -2.2% ( -5% - 0%) OrHighLow 55.96 (2.4%) 54.73 (2.0%) -2.2% ( -6% - 2%) OrHighMed 57.47 (2.4%) 56.33 (2.1%) -2.0% ( -6% - 2%) HighPhrase 44.47 (10.9%) 43.63 (10.7%) -1.9% ( -21% - 22%) OrHighHigh 38.53 (2.6%) 37.88 (2.3%) -1.7% ( -6% - 3%) HighTerm 65.49 (1.2%) 64.70 (1.9%) -1.2% ( -4% - 1%) Prefix3 46.82 (1.5%) 46.30 (1.2%) -1.1% ( -3% - 1%) MedPhrase 149.78 (5.5%) 148.17 (5.3%) -1.1% ( -11% - 10%) AndHighHigh 93.50 (1.0%) 92.73 (0.8%) -0.8% ( -2% - 1%) HighSloppyPhrase 3.26 (6.8%) 3.24 (8.0%) -0.8% ( -14% - 15%) HighSpanNear 11.60 (1.7%) 11.51 (1.9%) -0.8% ( -4% - 2%) LowPhrase 73.57 (5.6%) 73.00 (5.0%) -0.8% ( -10% - 10%) LowSpanNear 43.68 (2.0%) 43.35 (2.3%) -0.8% ( -4% - 3%) MedSpanNear 90.77 (1.5%) 90.10 (1.4%) -0.7% ( -3% - 2%) LowSloppyPhrase 82.66 (1.9%) 82.13 (1.7%) -0.6% ( -4% - 2%) MedSloppyPhrase 92.12 (2.2%) 91.65 (2.2%) -0.5% ( -4% - 3%) LowTerm 466.62 (1.4%) 464.83 (1.9%) -0.4% ( -3% - 2%) AndHighMed 347.12 (1.7%) 348.61 (1.1%) 0.4% ( -2% - 3%) Wildcard 120.82 (1.2%) 121.50 (1.6%) 0.6% ( -2% - 3%) IntNRQ 23.40 (1.6%) 23.76 (1.4%) 1.5% ( -1% - 4%) Fuzzy1 80.87 (2.4%) 82.38 (2.6%) 1.9% ( -3% - 7%) Respell 71.83 (3.0%) 73.46 (3.2%) 2.3% ( -3% - 8%) AndHighLow 1159.47 (3.8%) 1189.72 (2.4%) 2.6% ( -3% - 9%) Fuzzy2 88.04 (3.0%) 91.48 (3.7%) 3.9% ( -2% - 10%) {noformat} Trunk bytes for the DV facet field was 9219009, and no-dgap was 10163419 (~10% larger). So net/net dGap seems to help! > Faster but more RAM/Disk consuming DocValuesFormat for facets > ------------------------------------------------------------- > > Key: LUCENE-4764 > URL: https://issues.apache.org/jira/browse/LUCENE-4764 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4764.patch > > > The new default DV format for binary fields has much more > RAM-efficient encoding of the address for each document ... but it's > also a bit slower at decode time, which affects facets because we > decode for every collected docID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org