[
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427667#comment-13427667
]
Michael McCandless commented on LUCENE-4283:
--------------------------------------------
I think we shouldn't have to do our own buffering up of the skip points within
one block?
Can't we call skipWriter.bufferSkip every skipInterval docs (and pass it
lastDocID, etc.)? Then it can write the skip point immediately.
Also, in BlockPostingsReader, why do we need a separate docBufferOffset? Can't
we just set docBufferUpto to wherever (36, 64, 96) we had skipped to within the
block?
> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize.
> Every time the skipper reaches the last level 0 skip point, we'll have to
> decode a whole block to read doc/freq data. Also, a higher level skip list
> will be created only for those df>blockSize^k, which means for most terms,
> skipping will just be a linear scan. If we increase current blockSize for
> better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set
> skipInterval=32*k.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]