Richard S. Hall wrote:
On Jan 25, 2007, at 5:30 PM, Alex Karasulu wrote:

BJ Hargrave wrote:
Another solution to this is to cache resolve state results across invocations. You will only need to invalidate the cache if the set of installed bundles changes or some other event triggers the resolver to modify the state.

IMO this is the only option. Indices have value if the set of properties you have are relatively constant. Let me explain:

If bundles represent entries with properties which you need to test for filter evaluation then you need some kind of master table with an id to properties mapping. Then you need to build indices mapping into that master table where the key of the index is a property value, and the value is the id into the master. For example you may have something like this.

Master
======
1 | Bundle A Properties
2 | Bundle B Properties
3 | Bundle C Properties
4 | Bundle D Properties
5 | Bundle E Properties

ObjectClass Property Index
=================
Foo | 3
Bar | 4

XYZ Property Index
=================
... etc

Then search would use these indices to evaluate the filter.

The problem is the properties you'll encounter are unbounded (I maybe wrong) so the number indices you'll need will be as large as the number of unique properties used.

So I think BJ your idea may be the best option.

No, it is not for the other reasons I explained in response to BJ's message.

I think you hit the nail on the head, we don't need to index all of the properties, we only need to index the important ones, like package name and bundle symbolic name, for starters. The set of important properties should be pretty easy to figure out and could possibly be configurable for those who have different use cases. For non-indexed properties, then you can resort to the slow method.

Ok then that's not so bad. Because essentially ApacheDS has code we can use to optimize the search by leveraging indices to reduce the search set even when we do not have an index.

To illustrate this with an example take the following search:

(& (abc=foo) (xyz=bar) )

An optimizer will take a scan count of the indices if present and annotate the leaf nodes of the AST representation of this filter which looks like this:

       & [10]
       |
      / \
     /   \
    /     \
   /       \
abc=foo  xyz=bar
 [20]      [10]

If both indices are present then we get scan counts on both. The parent node inherits the scan count of the lesser of the two child nodes with AND operations. The search engine will loop through all the index entries with xyz=bar using a cursor on index xyz and do lookups on the abc index to see if each candidate also has xyz=bar. This way 10 loop iterations with 10 lookups is cheaper than 20 loop iterations with 20 lookups. If one index is missing:

       & [20]
       |
      / \
     /   \
    /     \
   /       \
abc=foo  xyz=bar
 [20]      [INT_MAX]

In this case the engine will position a cursor on the abc index and iterate through all the index entries where the key = foo. Then each value corresponding to an id into the master table is used to lookup the properties of the bundle. The xyz property is pulled out and compared to see if we have a match. So with a missing index we have a partial scan of the master table.

With both indices missing we have a full scan of the master table.

I think a partial scan is just fine ... I was just afraid of having too many indices without bounds which would make this solution useless.

I might tinker a bit with this.

Alex

Reply via email to