Let me give an example to explain my idea - I'm using dates in my
example, because it's easier to imagine :)
Let's say we have the following posting lists. There are 20 docs in the
index and an X means that a doc contains the corresponding term:
Jan X X
Feb X X X
Mar X
Apr X X
May X
Jun
Jul XX
Aug X X
Sep X
Oct X
Nov X X
Dec X X
Then we index another term 'ALL'. It gets added for any document that
has a numeric value in this bucket:
All XXXXXXXXXXXXXXXXX XX
If the query is [Jun TO Jul], then we process the query normally (ORing
terms Jun and Jul). If the query is [Feb TO Nov], then we basically
translate that into All AND NOT (Jan OR Dec).
Since you only evaluate the complement of the terms, you can (almost)
double the worst case performance.
Downsides:
- you have to have another BitSet in memory to perform the AND NOT
operation, so it needs more memory
- this complement approach is only this simple for numeric fields where
one document has only a single value; similar things are doable for
multi-valued numeric fields, however more complex and possibly with less
performance gain
- you need to index an additional term per bucket, so the index size
increases slightly
Does this make sense? Maybe this has even been discussed already?
-Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org