Richard S. Hall wrote:
On Jan 25, 2007, at 5:30 PM, Alex Karasulu wrote:
BJ Hargrave wrote:
Another solution to this is to cache resolve state results across
invocations. You will only need to invalidate the cache if the set of
installed bundles changes or some other event triggers the resolver
to modify the state.
IMO this is the only option. Indices have value if the set of
properties you have are relatively constant. Let me explain:
If bundles represent entries with properties which you need to test
for filter evaluation then you need some kind of master table with an
id to properties mapping. Then you need to build indices mapping into
that master table where the key of the index is a property value, and
the value is the id into the master. For example you may have
something like this.
Master
======
1 | Bundle A Properties
2 | Bundle B Properties
3 | Bundle C Properties
4 | Bundle D Properties
5 | Bundle E Properties
ObjectClass Property Index
=================
Foo | 3
Bar | 4
XYZ Property Index
=================
... etc
Then search would use these indices to evaluate the filter.
The problem is the properties you'll encounter are unbounded (I maybe
wrong) so the number indices you'll need will be as large as the
number of unique properties used.
So I think BJ your idea may be the best option.
No, it is not for the other reasons I explained in response to BJ's
message.
I think you hit the nail on the head, we don't need to index all of the
properties, we only need to index the important ones, like package name
and bundle symbolic name, for starters. The set of important properties
should be pretty easy to figure out and could possibly be configurable
for those who have different use cases. For non-indexed properties, then
you can resort to the slow method.
Ok then that's not so bad. Because essentially ApacheDS has code we can
use to optimize the search by leveraging indices to reduce the search
set even when we do not have an index.
To illustrate this with an example take the following search:
(& (abc=foo) (xyz=bar) )
An optimizer will take a scan count of the indices if present and
annotate the leaf nodes of the AST representation of this filter which
looks like this:
& [10]
|
/ \
/ \
/ \
/ \
abc=foo xyz=bar
[20] [10]
If both indices are present then we get scan counts on both. The parent
node inherits the scan count of the lesser of the two child nodes with
AND operations. The search engine will loop through all the index
entries with xyz=bar using a cursor on index xyz and do lookups on the
abc index to see if each candidate also has xyz=bar. This way 10 loop
iterations with 10 lookups is cheaper than 20 loop iterations with 20
lookups. If one index is missing:
& [20]
|
/ \
/ \
/ \
/ \
abc=foo xyz=bar
[20] [INT_MAX]
In this case the engine will position a cursor on the abc index and
iterate through all the index entries where the key = foo. Then each
value corresponding to an id into the master table is used to lookup the
properties of the bundle. The xyz property is pulled out and compared
to see if we have a match. So with a missing index we have a partial
scan of the master table.
With both indices missing we have a full scan of the master table.
I think a partial scan is just fine ... I was just afraid of having too
many indices without bounds which would make this solution useless.
I might tinker a bit with this.
Alex