[
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698319#comment-13698319
]
Paul Elschot commented on LUCENE-5084:
--------------------------------------
bq. maybe we should have a static utility method to check that so that
consumers of this API can opt for a FixedBitSet if their doc set is going to be
dense?
We could, but in which class? For example, in CachingWrapperFilter it might be
good to save memory, so it could be there.
Also, would the expected size be the only thing to check for? When decoding
speed is also important, other DocIdSets might be preferable.
bq. the ceil of the log in base 2 is computed through a loop
numberOfLeadingZeros is indeed better than a loop. We need the Long variant
here.
bq. use PackedInts.getMutable to store the low-order bits instead of a raw
long[]
Can PackedInts.getMutable also be used in a codec? Longs are needed for the
high bits, see below, and the high and low bits can be conveniently stored next
to each other in an index.
bq. shouldn't the iterator's getCost method return efDecoder.numValues instead
of efEncoder.numValues?
Yes.
bq. Maybe we could just support the encoding of monotonically increasing
sequences of ints to make things simpler?
I considered a decoder that returns ints but it that would require a lot more
casting in the decoder.
Decoding the unary encoded high bits is best done on longs, so mixing longs and
ints in encoder is not really an option.
We could pass the actual NO_MORE_VALUES to be used as an argument to the
decoder, would that help?
As to why decoding the unary encoded high bits is best done on longs, see
Algorithm 2 in "Broadword Implementation of Rank/Select Queries", Sebastiano
Vigna, January 30, 2012, http://vigna.di.unimi.it/ftp/papers/Broadword.pdf .
I also have an initial java implementation of that, but it is not used here
yet, there are only a few comments in the code here that it might be used. I'll
open another issue for broadword bit selection later.
> EliasFanoDocIdSet
> -----------------
>
> Key: LUCENE-5084
> URL: https://issues.apache.org/jira/browse/LUCENE-5084
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Paul Elschot
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]