On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler <u...@thetaphi.de> wrote:
>> Encoding a slice per character makes the code simpler, but increases
>> the size of the index... but perhaps not enough to worry about in
>> practice?
>
> This is correct. For 2bit and 4bit there is a lot of overhead by this, but
> there is no way round (any ideas how to fix this?). But 8bit is the most
> compact one. There needs to be more testing and benchmarking.

Separate bit slicing and String encoding.... they are independent.
If a,b,c,d are prefix codes designating precision, and w,x,y,z are
each 2 bits of the number, then

ax
bxy
cxyz
dxyzw

Everything after the prefix can be encoded in a single character in each case.

Lucene's prefix encoding of the index will remove some of the
redundancy... buy only for numbers that are very packed together.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to