On Mar 18, 2005, at 11:21 AM, John Patterson wrote:

Because Lucene deals with String's lexicographically ordered.

I thought lexographical ordering simply used the Unicode value of the chars and
so would also work with non alpa-numeric strings.

Lucene's index works with any String. But, when dealing with numbers and dates such that range queries work, they need to be formatted in a way that makes them orderable.


If you index the numbers 1 - 10, you have to pad them, otherwise you'll end up with 1, 10, 2, 3, ... and that will throw off sensible range queries.

Is there an issue you're encountering?

No issue - I will soon need to add a lot of unstored numerical data to my index
and I am worried that the size may increase a lot.

There is prefix compression used on term values. So you could pad numbers with lots of leading zeros and not incur much additional size... 000000000001, for example.


        Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to