[ 
https://issues.apache.org/jira/browse/LUCENE-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546051#comment-13546051
 ] 

Adrien Grand commented on LUCENE-4643:
--------------------------------------

bq. just because of the silliness in termvectors

Actually, the ability to block-encode negative values can be useful for other 
use-cases, for example to encode the difference from an expected value (for 
example you can compute an expected offset from the position and the average 
number of chars per term).

An other thing to know is that if all values are positive, minValue is likely 
to be 0. For example, let's say the actual min is 200 and the max is 2000. 
Given that encoding the [0-2000] range requires as many bits per value as 
encoding the [200-2000] range, I set minValue=0. This will require only one bit 
in the token instead of two bytes (a VInt >= 2^7) for the minimum. So in the 
end, even if one bit is wasted for the minimum value because of zig-zag 
encoding, this is not too bad.
                
> PackedInts: convenience classes to write blocks of packed ints
> --------------------------------------------------------------
>
>                 Key: LUCENE-4643
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4643
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-4643.patch, LUCENE-4643.patch
>
>
> It is often useful to divide a packed stream into fixed blocks which are all 
> compressed independently:
>  * if your sequence of ints is very large, you won't have to buffer 
> everything into memory to compute the required number of bits per value,
>  * the compression ratio will be better in case of rare extreme values.
> The only drawback compared to the original PackedInts API is that the stream 
> cannot be directly used to deserialize a random-access PackedInts.Reader (but 
> for sequential access, this is just fine).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to