[ https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558401#comment-13558401 ]
Adrien Grand commented on LUCENE-4599: -------------------------------------- OK, I think I understood: I had forgotten to turn debug off, and although documents in this collection are rather big, queries tend to favor small docs, whose chunks contain more documents (up to 30). I ran the benchmark again with a very small chunk size (128) so that chunks would likely contain a single doc and results got better : {noformat} Fuzzy2 94.39 (7.8%) 88.33 (7.5%) -6.4% ( -20% - 9%) MedTerm 292.09 (2.7%) 279.01 (2.6%) -4.5% ( -9% - 0%) OrHighHigh 76.84 (7.4%) 73.58 (5.8%) -4.2% ( -16% - 9%) Fuzzy1 93.07 (4.8%) 89.59 (4.4%) -3.7% ( -12% - 5%) OrHighMed 69.23 (6.4%) 67.17 (4.9%) -3.0% ( -13% - 8%) HighPhrase 8.54 (9.4%) 8.36 (11.6%) -2.1% ( -21% - 20%) LowPhrase 125.02 (2.5%) 122.91 (3.4%) -1.7% ( -7% - 4%) MedPhrase 39.97 (5.3%) 39.58 (7.6%) -1.0% ( -13% - 12%) HighTerm 177.70 (2.4%) 176.21 (2.2%) -0.8% ( -5% - 3%) LowTerm 370.26 (3.7%) 367.36 (2.8%) -0.8% ( -7% - 5%) OrHighLow 106.08 (5.2%) 105.41 (4.7%) -0.6% ( -10% - 9%) LowSloppyPhrase 71.29 (5.2%) 70.95 (5.3%) -0.5% ( -10% - 10%) HighSloppyPhrase 30.52 (5.6%) 30.39 (5.2%) -0.4% ( -10% - 10%) PKLookup 339.12 (3.0%) 338.09 (3.1%) -0.3% ( -6% - 5%) MedSloppyPhrase 71.13 (4.2%) 70.95 (4.4%) -0.3% ( -8% - 8%) AndHighLow 259.19 (3.8%) 258.54 (5.1%) -0.2% ( -8% - 8%) Respell 69.04 (3.7%) 68.92 (3.2%) -0.2% ( -6% - 6%) AndHighHigh 74.49 (1.5%) 74.47 (1.8%) -0.0% ( -3% - 3%) Wildcard 157.16 (2.0%) 157.21 (1.9%) 0.0% ( -3% - 3%) AndHighMed 79.81 (2.1%) 80.16 (1.6%) 0.4% ( -3% - 4%) MedSpanNear 14.09 (3.6%) 14.16 (4.4%) 0.5% ( -7% - 8%) Prefix3 281.17 (2.7%) 282.85 (2.5%) 0.6% ( -4% - 5%) HighSpanNear 7.73 (3.9%) 7.79 (2.8%) 0.8% ( -5% - 7%) IntNRQ 143.14 (3.0%) 144.45 (3.2%) 0.9% ( -5% - 7%) LowSpanNear 23.85 (6.6%) 24.36 (6.0%) 2.2% ( -9% - 15%) {noformat} (Decreasing the chunk size from 16KB to 128 made the compression ratio increase from 66% to 68%.) > Compressed term vectors > ----------------------- > > Key: LUCENE-4599 > URL: https://issues.apache.org/jira/browse/LUCENE-4599 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs, core/termvectors > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Fix For: 4.2 > > Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log, > CompressingTVF_ingest_rate.png, highlightNoStop.tasks, > Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch, > LUCENE-4599.patch, solr.patch > > > We should have codec-compressed term vectors similarly to what we have with > stored fields. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org