[ 
https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003321#comment-14003321
 ] 

Robert Muir commented on LUCENE-5688:
-------------------------------------

You otherwise don't load hardly anything in RAM, so its extremely trappy to do 
this.

As i mentioned, the obvious approach is O(log N), like android's SparseArray. 
so array 1 is increasing documents that have value (can be a 
monotonicblockreader). you can binarysearch that to find your value in the 
"real values".

You have to decide how 'missing' should be represented. currently it will be 1 
bit per document as well. if it stays that way, you can check that first (which 
is the typical case) before binary searching.

In all cases this has performance implications (slower access), and isn't 
specific to numerics (all dv fields could be sparse). So I think its best to 
start outside of the default codec rather than trying to do it automatically. 
Not everyone will want the space-time tradeoff.

> NumericDocValues fields with sparse data can be compressed better 
> ------------------------------------------------------------------
>
>                 Key: LUCENE-5688
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5688
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Varun Thacker
>            Priority: Minor
>         Attachments: LUCENE-5688.patch
>
>
> I ran into this problem where I had a dynamic field in Solr and indexed data 
> into lots of fields. For each field only a few documents had actual values 
> and the remaining documents the default value ( 0 ) got indexed. Now when I 
> merge segments, the index size jumps up.
> For example I have 10 segments - Each with 1 DV field. When I merge segments 
> into 1 that segment will contain all 10 DV fields with lots if 0s. 
> This was the motivation behind trying to come up with a compression for a use 
> case like this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to