On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler <u...@thetaphi.de> wrote: >> Encoding a slice per character makes the code simpler, but increases >> the size of the index... but perhaps not enough to worry about in >> practice? > > This is correct. For 2bit and 4bit there is a lot of overhead by this, but > there is no way round (any ideas how to fix this?). But 8bit is the most > compact one. There needs to be more testing and benchmarking.
Separate bit slicing and String encoding.... they are independent. If a,b,c,d are prefix codes designating precision, and w,x,y,z are each 2 bits of the number, then ax bxy cxyz dxyzw Everything after the prefix can be encoded in a single character in each case. Lucene's prefix encoding of the index will remove some of the redundancy... buy only for numbers that are very packed together. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org