[ https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Varun Thacker updated LUCENE-5688: ---------------------------------- Attachment: LUCENE-5688.patch Here is a quick patch. Wanted to get some feedback on the approach. When I run the showIndexBloat method without the SPARSE_COMPRESSED changes, this is the size of the docValues data - {noformat} -rw-r--r-- 1 varun wheel 9.9M May 20 18:28 _a_Lucene45_0.dvd -rw-r--r-- 1 varun wheel 312B May 20 18:28 _a_Lucene45_0.dvm {noformat} With the SPARSE_COMPRESSED changes {noformat} -rw-r--r-- 1 varun wheel 2.7M May 20 18:51 _a_Lucene45_0.dvd -rw-r--r-- 1 varun wheel 352B May 20 18:51 _a_Lucene45_0.dvm {noformat} > NumericDocValues fields with sparse data can be compressed better > ------------------------------------------------------------------ > > Key: LUCENE-5688 > URL: https://issues.apache.org/jira/browse/LUCENE-5688 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Varun Thacker > Priority: Minor > Attachments: LUCENE-5688.patch > > > I ran into this problem where I had a dynamic field in Solr and indexed data > into lots of fields. For each field only a few documents had actual values > and the remaining documents the default value ( 0 ) got indexed. Now when I > merge segments, the index size jumps up. > For example I have 10 segments - Each with 1 DV field. When I merge segments > into 1 that segment will contain all 10 DV fields with lots if 0s. > This was the motivation behind trying to come up with a compression for a use > case like this. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org