[
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481768#comment-13481768
]
Michael McCandless commented on LUCENE-4498:
--------------------------------------------
Looks good:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
Respell 86.70 (3.0%) 84.04 (2.6%)
-3.1% ( -8% - 2%)
OrHighMed 41.52 (5.8%) 40.44 (6.1%)
-2.6% ( -13% - 9%)
OrHighLow 25.43 (6.0%) 24.77 (6.4%)
-2.6% ( -14% - 10%)
OrHighHigh 9.38 (5.9%) 9.15 (6.4%)
-2.5% ( -14% - 10%)
Wildcard 93.94 (4.1%) 92.36 (2.0%)
-1.7% ( -7% - 4%)
MedTerm 211.10 (12.3%) 208.78 (13.4%)
-1.1% ( -23% - 27%)
IntNRQ 10.74 (11.3%) 10.62 (7.8%)
-1.1% ( -18% - 20%)
HighTerm 25.59 (14.0%) 25.35 (15.0%)
-1.0% ( -26% - 32%)
MedSpanNear 13.77 (2.3%) 13.68 (1.6%)
-0.7% ( -4% - 3%)
HighSloppyPhrase 4.09 (5.4%) 4.07 (5.2%)
-0.5% ( -10% - 10%)
HighSpanNear 6.84 (2.9%) 6.81 (2.1%)
-0.4% ( -5% - 4%)
Prefix3 17.81 (5.7%) 17.74 (1.5%)
-0.4% ( -7% - 7%)
Fuzzy1 77.54 (2.5%) 77.25 (2.7%)
-0.4% ( -5% - 4%)
AndHighLow 719.17 (2.7%) 716.49 (2.3%)
-0.4% ( -5% - 4%)
Fuzzy2 68.94 (2.4%) 68.69 (2.8%)
-0.4% ( -5% - 5%)
LowSpanNear 12.89 (1.8%) 12.85 (1.3%)
-0.3% ( -3% - 2%)
MedSloppyPhrase 29.92 (3.4%) 29.85 (3.4%)
-0.2% ( -6% - 6%)
LowTerm 500.58 (5.9%) 500.52 (7.0%)
-0.0% ( -12% - 13%)
LowSloppyPhrase 9.57 (4.4%) 9.60 (4.3%)
0.4% ( -7% - 9%)
LowPhrase 9.64 (2.8%) 9.70 (3.0%)
0.7% ( -4% - 6%)
AndHighMed 86.68 (1.2%) 87.26 (1.2%)
0.7% ( -1% - 3%)
MedPhrase 7.07 (4.3%) 7.15 (4.6%)
1.1% ( -7% - 10%)
HighPhrase 4.79 (4.8%) 4.84 (5.6%)
1.1% ( -8% - 12%)
AndHighHigh 25.81 (1.7%) 26.20 (1.2%)
1.5% ( -1% - 4%)
PKLookup 193.31 (2.1%) 204.74 (1.6%)
5.9% ( 2% - 9%)
{noformat}
> pulse docfreq=1 DOCS_ONLY for 4.1 codec
> ---------------------------------------
>
> Key: LUCENE-4498
> URL: https://issues.apache.org/jira/browse/LUCENE-4498
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Robert Muir
> Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch,
> LUCENE-4498.patch
>
>
> We have pulsing codec, but currently this has some downsides:
> * its very general, wrapping an arbitrary postingsformat and pulsing
> everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
> * reuse is hairy: because it specializes its enums based on these cutoffs,
> when walking thru terms e.g. merging there is a lot of sophisticated stuff to
> avoid the worst cases where we clone indexinputs for tons of terms.
> On the other hand the way the 4.1 codec encodes "primary key" fields is
> pretty silly, we write the docStartFP vlong in the term dictionary metadata,
> which tells us where to seek in the .doc to read our one lonely vint.
> I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just
> write the lone doc delta where we would write docStartFP.
> We can avoid the hairy reuse problem too, by just supporting this in
> refillDocs() in BlockDocsEnum instead of specializing.
> This would remove the additional seek for "primary key" fields without really
> any of the downsides of pulsing today.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]