[jira] [Comment Edited] (LUCENE-5081) Compress doc ID sets

Paul Elschot (JIRA) Sun, 30 Jun 2013 04:13:26 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696317#comment-13696317
 ]


Paul Elschot edited comment on LUCENE-5081 at 6/30/13 11:12 AM:
----------------------------------------------------------------

I'm (slowly) working on an implemention of Elias Fano compression, basically as 
described in in sections 3 and 4 of this article:

Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, 
http://arxiv.org/pdf/1206.4300
The article is quite interesting, not in the least because it compares MG4J 
directly to Lucene 3.6.

The implementation I am working on also does backward iteration, but it has no 
index yet, so it is still somewhat slow for advance()'ing to distant targets. 
Since there is interest in compression, I'll open an issue for this soon.

For backward iteration I think it would be good to extend DocIdSetIterator with 
backwards iterating method(s), possibly in a subclass, and use an 
implementation of that in the block joins.


                
      was (Author: [email protected]):
    I'm (slowly) working on an implemention of Elias Fano compression, 
basically as described in in sections 3 and 4 of this article:

Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, 
http://arxiv.org/pdf/1206.4300
The article is quite interesting, not in the least because it compares MG4J 
directly to Lucene 3.6.

The implementation I am working on also does backward iteration, but it has no 
index yet, so it is still somewhat slow for advanceTo() with distant targets. 
Since there is interest in compression, I'll open an issue for this soon.

For backward iteration I think it would be good to extend DocIdSetIterator with 
backwards iterating method(s), possibly in a subclass, and use an 
implementation of that in the block joins.


                  
> Compress doc ID sets
> --------------------
>
>                 Key: LUCENE-5081
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5081
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5081.patch
>
>
> Our filters use bit sets a lot to store document IDs. However, it is likely 
> that most of them are sparse hence easily compressible. Having efficient 
> compressed sets would allow for caching more data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-5081) Compress doc ID sets

Reply via email to