Hi devs,

As I was working on https://github.com/apache/lucene/issues/12513 I needed to 
compress positive integers which are used to locate postings etc.

To put it concretely, I will need to pack a few values per term contiguously 
and those values can have different bit-width. For example, consider that we 
need to encode docFreq and postingsStartOffset per term and docFreq takes 4 bit 
and the postingsStartOffset takes 6 bit. We expect to write the following for 
two terms.

```
Term1                                                      |     Term2

docFreq(4bit) | postingsStartOffset(6bit) | docFreq(4bit) | 
postingsStartOffset(6bit)

```

On the read path, I expect to locate the offest for a term first and followed 
by reading two values that have different bit-width.

In the spirit of not re-inventing necessarily, I tried to explore the existing 
PackedInts util classes and I believe there is no support for this at the 
moment. The biggest gap I found is that the existing classes expect to 
write/read values of same bit-width.

I'm writing to get feedback from yall to see if I missed anything.

Cheers,
Tony X

Reply via email to