Robert Muir created LUCENE-4498:
-----------------------------------
Summary: pulse docfreq=1 DOCS_ONLY for 4.1 codec
Key: LUCENE-4498
URL: https://issues.apache.org/jira/browse/LUCENE-4498
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs
Reporter: Robert Muir
We have pulsing codec, but currently this has some downsides:
* its very general, wrapping an arbitrary postingsformat and pulsing everything
in the postings for an arbitrary docfreq/totalTermFreq cutoff
* reuse is hairy: because it specializes its enums based on these cutoffs, when
walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid
the worst cases where we clone indexinputs for tons of terms.
On the other hand the way the 4.1 codec encodes "primary key" fields is pretty
silly, we write the docStartFP vlong in the term dictionary metadata, which
tells us where to seek in the .doc to read our one lonely vint.
I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just
write the lone doc delta where we would write docStartFP.
We can avoid the hairy reuse problem too, by just supporting this in
refillDocs() in BlockDocsEnum instead of specializing.
This would remove the additional seek for "primary key" fields without really
any of the downsides of pulsing today.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]