[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better

Shai Erera (JIRA) Sun, 01 Jun 2014 00:28:25 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014922#comment-14014922
 ]


Shai Erera commented on LUCENE-5688:
------------------------------------

Ahh, I see now that you only wrote a DVFormat, not a Codec. In that case I 
agree, apps should plug it in per-field and that it doesn't need to wrap 
another format. Can you perhaps make the Consumer/Producer package-private? I 
think only the Format needs to be public?

About Binary field, indeed it doesn't write the data if a BytesRef is missing, 
but it does write all the meta information, e.g. the missing bitset, the 
addresses (in case the BytesRef aren't of equal length). So I think sparseness 
should be really sparse. But I'm fine if you leave that out for now - we first 
need to make sure the numeric field performs and that there are any gains (even 
if only during indexing).

> NumericDocValues fields with sparse data can be compressed better 
> ------------------------------------------------------------------
>
>                 Key: LUCENE-5688
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5688
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Varun Thacker
>            Priority: Minor
>         Attachments: LUCENE-5688.patch, LUCENE-5688.patch
>
>
> I ran into this problem where I had a dynamic field in Solr and indexed data 
> into lots of fields. For each field only a few documents had actual values 
> and the remaining documents the default value ( 0 ) got indexed. Now when I 
> merge segments, the index size jumps up.
> For example I have 10 segments - Each with 1 DV field. When I merge segments 
> into 1 that segment will contain all 10 DV fields with lots if 0s. 
> This was the motivation behind trying to come up with a compression for a use 
> case like this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better

Reply via email to