[ https://issues.apache.org/jira/browse/LUCENE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169656#comment-13169656 ]
Yonik Seeley commented on LUCENE-3584: -------------------------------------- I'm traveling this week and don't have access to that box, but I should be able to get to it next week sometime. > bulk postings should be codec private > ------------------------------------- > > Key: LUCENE-3584 > URL: https://issues.apache.org/jira/browse/LUCENE-3584 > Project: Lucene - Java > Issue Type: Task > Reporter: Robert Muir > Assignee: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3584.patch > > > In LUCENE-2723, a lot of work was done to speed up Lucene's bulk postings > read API. > There were some upsides: > * you could specify things like 'i dont care about frequency data up front'. > This made things like multitermquery->filter and other consumers that don't > care about freqs faster. But this is unrelated to 'bulkness' and we have a > separate patch now for this on LUCENE-2929. > * the buffersize for standardcodec was increased to 128, increasing > performance > for TermQueries, but this was unrelated too. > But there were serious downsides/nocommits: > * the API was hairy because it tried to be 'one-size-fits-all'. This made > consumer code crazy. > * the API could not really be specialized to your codec: e.g. could never > take advantage that e.g. docs and freqs are aligned. > * the API forced codecs to implement delta encoding for things like documents > and positions. > But this is totally up to the codec how it wants to encode! Some codecs > might not use delta encoding. > * using such an API for positions was only theoretical, it would have been > super complicated and I doubt ever > performant or maintainable. > * there was a regression with advance(), probably because the api forced you > to do both a linear scan thru > the remaining buffer, then refill... > I think a cleaner approach is to let codecs do whatever they want to > implement the DISI > contract. This lets codecs have the freedom to implement whatever > compression/buffering they want > for the best performance, and keeps consumers simple. If a codec uses delta > encoding, or if it wants > to defer this to the last possible minute or do it at decode time, thats its > own business. Maybe a codec > doesn't want to do any buffering at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org