Robert Muir created LUCENE-7081:
-----------------------------------

             Summary: Docvalues terms dict should sometimes prefix-compress 
fixed-length data.
                 Key: LUCENE-7081
                 URL: https://issues.apache.org/jira/browse/LUCENE-7081
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Robert Muir


For Sorted/SortedSet types, we encode ordinals and a term dictionary (similar 
to old lucene 3 term dictionary).

Originally we had no prefix compression, so we "save space" in the fixed-width 
case by avoiding addressing, we can just use multiplication: 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/lucene54/Lucene54DocValuesConsumer.java#L423-L425
 

But it means no compression whatsoever of the actual bytes, even if values are 
enormous, I don't think its necessarily a good tradeoff. The lack of prefix 
compression can become much more magnified now that we have fixed width 128-bit 
point types in the sandbox...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to