On Sat, Feb 7, 2009 at 12:26 PM, Uwe Schindler <u...@thetaphi.de> wrote: >> To optimize index space, one would want to "right justify" the encoded >> number for any bit range to minimize variation on the left - this >> plays into lucene's prefix compression.
The prototype code I just posted in JIRA does this. For example, if we are encoding the bits 0xffffffffffffffff with a precision of only 8 bits, and using 7 bits per char, then it stores 0x01 0x7f instead of 0x7f 0x70 This means that a whole sequence of these values would take up closer to 1 byte of data instead of 2 in the index. > I am not sure, if this is the right way. Lucene's prefix compression is also > good for seeking fast to the term. If thousands of terms, only varying in > the last bits (because all bits before are zero), must be scanned to get to > the right one, it would get less performant. Every 128th term is stored in full in memory and a binary search is used to find the closest lower term. A linear scan is done from there. If anything it should be slightly faster to iterate over a more compact index. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org