[
https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014922#comment-14014922
]
Shai Erera commented on LUCENE-5688:
------------------------------------
Ahh, I see now that you only wrote a DVFormat, not a Codec. In that case I
agree, apps should plug it in per-field and that it doesn't need to wrap
another format. Can you perhaps make the Consumer/Producer package-private? I
think only the Format needs to be public?
About Binary field, indeed it doesn't write the data if a BytesRef is missing,
but it does write all the meta information, e.g. the missing bitset, the
addresses (in case the BytesRef aren't of equal length). So I think sparseness
should be really sparse. But I'm fine if you leave that out for now - we first
need to make sure the numeric field performs and that there are any gains (even
if only during indexing).
> NumericDocValues fields with sparse data can be compressed better
> ------------------------------------------------------------------
>
> Key: LUCENE-5688
> URL: https://issues.apache.org/jira/browse/LUCENE-5688
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Varun Thacker
> Priority: Minor
> Attachments: LUCENE-5688.patch, LUCENE-5688.patch
>
>
> I ran into this problem where I had a dynamic field in Solr and indexed data
> into lots of fields. For each field only a few documents had actual values
> and the remaining documents the default value ( 0 ) got indexed. Now when I
> merge segments, the index size jumps up.
> For example I have 10 segments - Each with 1 DV field. When I merge segments
> into 1 that segment will contain all 10 DV fields with lots if 0s.
> This was the motivation behind trying to come up with a compression for a use
> case like this.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]