[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-4498: -------------------------------- Attachment: LUCENE-4498.patch duh I forgot to actually not seek in the previous patch: here's the updated patch. > pulse docfreq=1 DOCS_ONLY for 4.1 codec > --------------------------------------- > > Key: LUCENE-4498 > URL: https://issues.apache.org/jira/browse/LUCENE-4498 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Robert Muir > Attachments: LUCENE-4498.patch, LUCENE-4498.patch > > > We have pulsing codec, but currently this has some downsides: > * its very general, wrapping an arbitrary postingsformat and pulsing > everything in the postings for an arbitrary docfreq/totalTermFreq cutoff > * reuse is hairy: because it specializes its enums based on these cutoffs, > when walking thru terms e.g. merging there is a lot of sophisticated stuff to > avoid the worst cases where we clone indexinputs for tons of terms. > On the other hand the way the 4.1 codec encodes "primary key" fields is > pretty silly, we write the docStartFP vlong in the term dictionary metadata, > which tells us where to seek in the .doc to read our one lonely vint. > I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just > write the lone doc delta where we would write docStartFP. > We can avoid the hairy reuse problem too, by just supporting this in > refillDocs() in BlockDocsEnum instead of specializing. > This would remove the additional seek for "primary key" fields without really > any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org