[
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696415#comment-13696415
]
Paul Elschot commented on LUCENE-5084:
--------------------------------------
That was fast :)
The results posted above seem to be in line with the formula below, and that is
quite nice to see.
The upper bound for the size in bits per encoded number in an EliasFanoDocIdSet
is:
{noformat}2 + ceil(2log(upperBound/numValues)){noformat}
and a few constant size objects also have to be added in there.
Please note that there is no index yet. The index will be relatively small, for
example for the 10% case above with 641 kB I expect an index size of about 12
kB, adding about 2% to the size.
This index will consist of N/256 entries of a single number with max value 3N,
i.e. ceil(2log(3N)) bits per index entry.
The code posted here is still young. So even though it has some test cases, I'd
like be reassured that in the code that produced the posted results, there is
at least a basic test that verifies that all input docs are available after
compression. Is that the case?
> EliasFanoDocIdSet
> -----------------
>
> Key: LUCENE-5084
> URL: https://issues.apache.org/jira/browse/LUCENE-5084
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Paul Elschot
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]