[
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gilad Barkai updated LUCENE-4609:
---------------------------------
Attachment: LUCENE-4609.patch
Attached a PackedEncoder, which is based on {{PackedInts}}. Currently only the
approach of a 'per-document' bits-per-value is implemented.
I'm not convinced the header could be spared, as at the very least, the number
of bits to neglect at the end of the stream should be written. E.g if there are
2 bits per value, and there are 17 values, there's a need for 34 bits, but
everything is written in (at least) bytes, so 6 bits should be neglected.
Updated EncodingTest and EncodingSpeed, and found out that the compression
factor is not that good, probably due to large numbers which bumps the amount
of required bits to higher value.
Started to look into a semi-packed encoder, which could encode most values in a
packed manner, but could also add large values as, e.g., vints.
Example: for 6 bits per value, all values 0-62 are packed, while a packed value
of 63 (packed all 1' s) is a marker that the next value is written in a
non-packed manner (say vint, Elias delta, whole 32 bits.. ).
This should improve the compression factor when most ints are small, and only a
few are large.
Impact on encoding/decoding speed remains to be seen..
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Shai Erera
> Priority: Minor
> Attachments: LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the
> category ordinals. We have several such encoders, including VInt (default),
> and block encoders.
> It would be interesting to implement and benchmark a
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and
> the max value you can see and (2) one that decides for each doc on the
> optimal bitsPerValue, writes it as a header in the byte[] or something.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]