Uwe and I talked a little bit about this at the ApacheCon. We figured
that this will probably only improve a very small amount of ranges, so
as Uwe recommended, this is probably not worth the effort and complexity.
Never mind, just an idea :)
-Michael
On 3/24/09 12:40 AM, Michael Busch wrote:
Let me give an example to explain my idea - I'm using dates in my
example, because it's easier to imagine :)
Let's say we have the following posting lists. There are 20 docs in the
index and an X means that a doc contains the corresponding term:
Jan X X
Feb X X X
Mar X
Apr X X
May X
Jun
Jul XX
Aug X X
Sep X
Oct X
Nov X X
Dec X X
Then we index another term 'ALL'. It gets added for any document that
has a numeric value in this bucket:
All XXXXXXXXXXXXXXXXX XX
If the query is [Jun TO Jul], then we process the query normally
(ORing terms Jun and Jul). If the query is [Feb TO Nov], then we
basically translate that into All AND NOT (Jan OR Dec).
Since you only evaluate the complement of the terms, you can (almost)
double the worst case performance.
Downsides:
- you have to have another BitSet in memory to perform the AND NOT
operation, so it needs more memory
- this complement approach is only this simple for numeric fields
where one document has only a single value; similar things are doable
for multi-valued numeric fields, however more complex and possibly
with less performance gain
- you need to index an additional term per bucket, so the index size
increases slightly
Does this make sense? Maybe this has even been discussed already?
-Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org