Uwe and I talked a little bit about this at the ApacheCon. We figured that this will probably only improve a very small amount of ranges, so as Uwe recommended, this is probably not worth the effort and complexity.

Never mind, just an idea :)

-Michael

On 3/24/09 12:40 AM, Michael Busch wrote:
Let me give an example to explain my idea - I'm using dates in my
example, because it's easier to imagine :)

Let's say we have the following posting lists. There are 20 docs in the
index and an X means that a doc contains the corresponding term:

Jan                X   X
Feb X    X      X
Mar          X
Apr        X        X
May    X
Jun
Jul           XX
Aug       X      X
Sep   X
Oct               X
Nov  X      X
Dec     X             X

Then we index another term 'ALL'. It gets added for any document that has a numeric value in this bucket:

All XXXXXXXXXXXXXXXXX XX

If the query is [Jun TO Jul], then we process the query normally (ORing terms Jun and Jul). If the query is [Feb TO Nov], then we basically translate that into All AND NOT (Jan OR Dec).

Since you only evaluate the complement of the terms, you can (almost) double the worst case performance.

Downsides:
- you have to have another BitSet in memory to perform the AND NOT operation, so it needs more memory - this complement approach is only this simple for numeric fields where one document has only a single value; similar things are doable for multi-valued numeric fields, however more complex and possibly with less performance gain - you need to index an additional term per bucket, so the index size increases slightly

Does this make sense? Maybe this has even been discussed already?

-Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to