[
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-4498:
--------------------------------
Attachment: LUCENE-4498.patch
duh I forgot to actually not seek in the previous patch: here's the updated
patch.
> pulse docfreq=1 DOCS_ONLY for 4.1 codec
> ---------------------------------------
>
> Key: LUCENE-4498
> URL: https://issues.apache.org/jira/browse/LUCENE-4498
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Robert Muir
> Attachments: LUCENE-4498.patch, LUCENE-4498.patch
>
>
> We have pulsing codec, but currently this has some downsides:
> * its very general, wrapping an arbitrary postingsformat and pulsing
> everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
> * reuse is hairy: because it specializes its enums based on these cutoffs,
> when walking thru terms e.g. merging there is a lot of sophisticated stuff to
> avoid the worst cases where we clone indexinputs for tons of terms.
> On the other hand the way the 4.1 codec encodes "primary key" fields is
> pretty silly, we write the docStartFP vlong in the term dictionary metadata,
> which tells us where to seek in the .doc to read our one lonely vint.
> I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just
> write the lone doc delta where we would write docStartFP.
> We can avoid the hairy reuse problem too, by just supporting this in
> refillDocs() in BlockDocsEnum instead of specializing.
> This would remove the additional seek for "primary key" fields without really
> any of the downsides of pulsing today.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]