[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481768#comment-13481768 ]
Michael McCandless commented on LUCENE-4498: -------------------------------------------- Looks good: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff Respell 86.70 (3.0%) 84.04 (2.6%) -3.1% ( -8% - 2%) OrHighMed 41.52 (5.8%) 40.44 (6.1%) -2.6% ( -13% - 9%) OrHighLow 25.43 (6.0%) 24.77 (6.4%) -2.6% ( -14% - 10%) OrHighHigh 9.38 (5.9%) 9.15 (6.4%) -2.5% ( -14% - 10%) Wildcard 93.94 (4.1%) 92.36 (2.0%) -1.7% ( -7% - 4%) MedTerm 211.10 (12.3%) 208.78 (13.4%) -1.1% ( -23% - 27%) IntNRQ 10.74 (11.3%) 10.62 (7.8%) -1.1% ( -18% - 20%) HighTerm 25.59 (14.0%) 25.35 (15.0%) -1.0% ( -26% - 32%) MedSpanNear 13.77 (2.3%) 13.68 (1.6%) -0.7% ( -4% - 3%) HighSloppyPhrase 4.09 (5.4%) 4.07 (5.2%) -0.5% ( -10% - 10%) HighSpanNear 6.84 (2.9%) 6.81 (2.1%) -0.4% ( -5% - 4%) Prefix3 17.81 (5.7%) 17.74 (1.5%) -0.4% ( -7% - 7%) Fuzzy1 77.54 (2.5%) 77.25 (2.7%) -0.4% ( -5% - 4%) AndHighLow 719.17 (2.7%) 716.49 (2.3%) -0.4% ( -5% - 4%) Fuzzy2 68.94 (2.4%) 68.69 (2.8%) -0.4% ( -5% - 5%) LowSpanNear 12.89 (1.8%) 12.85 (1.3%) -0.3% ( -3% - 2%) MedSloppyPhrase 29.92 (3.4%) 29.85 (3.4%) -0.2% ( -6% - 6%) LowTerm 500.58 (5.9%) 500.52 (7.0%) -0.0% ( -12% - 13%) LowSloppyPhrase 9.57 (4.4%) 9.60 (4.3%) 0.4% ( -7% - 9%) LowPhrase 9.64 (2.8%) 9.70 (3.0%) 0.7% ( -4% - 6%) AndHighMed 86.68 (1.2%) 87.26 (1.2%) 0.7% ( -1% - 3%) MedPhrase 7.07 (4.3%) 7.15 (4.6%) 1.1% ( -7% - 10%) HighPhrase 4.79 (4.8%) 4.84 (5.6%) 1.1% ( -8% - 12%) AndHighHigh 25.81 (1.7%) 26.20 (1.2%) 1.5% ( -1% - 4%) PKLookup 193.31 (2.1%) 204.74 (1.6%) 5.9% ( 2% - 9%) {noformat} > pulse docfreq=1 DOCS_ONLY for 4.1 codec > --------------------------------------- > > Key: LUCENE-4498 > URL: https://issues.apache.org/jira/browse/LUCENE-4498 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Robert Muir > Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, > LUCENE-4498.patch > > > We have pulsing codec, but currently this has some downsides: > * its very general, wrapping an arbitrary postingsformat and pulsing > everything in the postings for an arbitrary docfreq/totalTermFreq cutoff > * reuse is hairy: because it specializes its enums based on these cutoffs, > when walking thru terms e.g. merging there is a lot of sophisticated stuff to > avoid the worst cases where we clone indexinputs for tons of terms. > On the other hand the way the 4.1 codec encodes "primary key" fields is > pretty silly, we write the docStartFP vlong in the term dictionary metadata, > which tells us where to seek in the .doc to read our one lonely vint. > I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just > write the lone doc delta where we would write docStartFP. > We can avoid the hairy reuse problem too, by just supporting this in > refillDocs() in BlockDocsEnum instead of specializing. > This would remove the additional seek for "primary key" fields without really > any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org