[
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696404#comment-13696404
]
Adrien Grand commented on LUCENE-5084:
--------------------------------------
I have not dug much through the code but I tested it against various
randomly-generated sets with numDocs=10M, and the compression looks great:
||Load||FixedBitSet||WAH8DocIdSet(LUCENE-5081)||EliasFanoDocIdSet(this
issue)||PForDeltaDocIdSet(from kamikaze, LUCENE-2750)||
|0.001% |1.2 MB |424 bytes |344 bytes |9 KB
|0.01% |1.2 MB |3.4 KB |2 KB |10.6 KB
|0.1% |1.2 MB |28.4 KB |14.7 KB |25.1 KB
|1% |1.2 MB |223.2 KB |104.6 KB |132.3 KB
|10% |1.2 MB |1 MB |641 KB |860.5 KB
|30% |1.2 MB |1.2 MB |1.3 MB |1.9 MB
|50% |1.2 MB |1.2 MB |1.8 MB |2.7 MB
|70% |1.2 MB |1.2 MB |2 MB |3 MB
|90% |1.2 MB |1.2 MB |2.3 MB |3.1 MB
I especially like the fact that it saves almost half the memory even for pretty
large sets that contain 1/10th of all doc IDs.
bq. I have used package o.a.l.util.eliasfano, this could be changed to
o.a.l.util.packed for example.
Indeed maybe we don't need a dedicated package for this DocIdSet.
oal.util.packed would be fine I think.
bq. There is a NOCOMMIT for a static longHex method that dumps a long in fixed
width hex format, is there a better place for this method?
I think it is OK to leave it here.
I'll try to dig more thoroughly into the patch in the next few days...
> EliasFanoDocIdSet
> -----------------
>
> Key: LUCENE-5084
> URL: https://issues.apache.org/jira/browse/LUCENE-5084
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Paul Elschot
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]