[
https://issues.apache.org/jira/browse/LUCENE-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546051#comment-13546051
]
Adrien Grand commented on LUCENE-4643:
--------------------------------------
bq. just because of the silliness in termvectors
Actually, the ability to block-encode negative values can be useful for other
use-cases, for example to encode the difference from an expected value (for
example you can compute an expected offset from the position and the average
number of chars per term).
An other thing to know is that if all values are positive, minValue is likely
to be 0. For example, let's say the actual min is 200 and the max is 2000.
Given that encoding the [0-2000] range requires as many bits per value as
encoding the [200-2000] range, I set minValue=0. This will require only one bit
in the token instead of two bytes (a VInt >= 2^7) for the minimum. So in the
end, even if one bit is wasted for the minimum value because of zig-zag
encoding, this is not too bad.
> PackedInts: convenience classes to write blocks of packed ints
> --------------------------------------------------------------
>
> Key: LUCENE-4643
> URL: https://issues.apache.org/jira/browse/LUCENE-4643
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-4643.patch, LUCENE-4643.patch
>
>
> It is often useful to divide a packed stream into fixed blocks which are all
> compressed independently:
> * if your sequence of ints is very large, you won't have to buffer
> everything into memory to compute the required number of bits per value,
> * the compression ratio will be better in case of rare extreme values.
> The only drawback compared to the original PackedInts API is that the stream
> cannot be directly used to deserialize a random-access PackedInts.Reader (but
> for sequential access, this is just fine).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]