[
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696352#comment-13696352
]
Paul Elschot commented on LUCENE-5084:
--------------------------------------
For the patch of 30 June 2013:
This consists of the class EliasFanoDocIdSet and the two classes that it uses,
EliasFanoEncoder and EliasFanoDecoder.
The last two are implemented on long, EliasFanoDocIdSet does the casting to and
from int for DocIdSet.
There are various ways in which the decoding speed could still be improved:
Addition of an index on the high bits, see the Vigna paper.
Use of broadword bit searching, actually better done at another issue, this
uses Long method bitCount and numberOfTrailingZeros.
I have not yet profiled for performance bottlenecks.
The decoder is not really complete, it has an advanceToIndex method but no
backToIndex method yet.
Nevertheless this is usable now because the compression works and the linear
searches that are done (because of the lack on indexing) will access no more
than roughly 3N bits, where N is the number of doc ids in the set, and
FixedBitSet.nextDoc() can (theoretically) access a number of bits equal to the
number of docs in a segment.
TestEliasFanoSequence tests EliasFanoEncoder and EliasFanoDecoder.
TestEliasFanoDocIdSet tests EliasFanoDocIdSet.
I have used package o.a.l.util.eliasfano, this could be changed to
o.a.l.util.packed for example.
There is a NOCOMMIT for a static longHex method that dumps a long in fixed
width hex format, is there a better place for this method?
> EliasFanoDocIdSet
> -----------------
>
> Key: LUCENE-5084
> URL: https://issues.apache.org/jira/browse/LUCENE-5084
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Paul Elschot
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]