[ 
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696415#comment-13696415
 ] 

Paul Elschot commented on LUCENE-5084:
--------------------------------------

That was fast :)
The results posted above seem to be in line with the formula below, and that is 
quite nice to see.

The upper bound for the size in bits per encoded number in an EliasFanoDocIdSet 
is:
{noformat}2 + ceil(2log(upperBound/numValues)){noformat}
and a few constant size objects also have to be added in there.



Please note that there is no index yet. The index will be relatively small, for 
example for the 10% case above with 641 kB I expect an index size of about 12 
kB, adding about 2% to the size.
This index will consist of N/256 entries of a single number  with max value 3N, 
i.e. ceil(2log(3N)) bits per index entry.


The code posted here is still young. So even though it has some test cases, I'd 
like be reassured that in the code that produced the posted results, there is 
at least a basic test that verifies that all input docs are available after 
compression. Is that the case?

                
> EliasFanoDocIdSet
> -----------------
>
>                 Key: LUCENE-5084
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5084
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Paul Elschot
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to