Re: Needed: LDAP expression evaluation optimization

Alex Karasulu Thu, 25 Jan 2007 17:50:23 -0800

Richard S. Hall wrote:

On Jan 25, 2007, at 5:30 PM, Alex Karasulu wrote:
BJ Hargrave wrote:
Another solution to this is to cache resolve state results acrossinvocations. You will only need to invalidate the cache if the set ofinstalled bundles changes or some other event triggers the resolverto modify the state.
IMO this is the only option. Indices have value if the set ofproperties you have are relatively constant. Let me explain:
If bundles represent entries with properties which you need to testfor filter evaluation then you need some kind of master table with anid to properties mapping. Then you need to build indices mapping intothat master table where the key of the index is a property value, andthe value is the id into the master. For example you may havesomething like this.
Master
======
1 | Bundle A Properties
2 | Bundle B Properties
3 | Bundle C Properties
4 | Bundle D Properties
5 | Bundle E Properties

ObjectClass Property Index
=================
Foo | 3
Bar | 4

XYZ Property Index
=================
... etc

Then search would use these indices to evaluate the filter.
The problem is the properties you'll encounter are unbounded (I maybewrong) so the number indices you'll need will be as large as thenumber of unique properties used.
So I think BJ your idea may be the best option.
No, it is not for the other reasons I explained in response to BJ'smessage.
I think you hit the nail on the head, we don't need to index all of theproperties, we only need to index the important ones, like package nameand bundle symbolic name, for starters. The set of important propertiesshould be pretty easy to figure out and could possibly be configurablefor those who have different use cases. For non-indexed properties, thenyou can resort to the slow method.

Ok then that's not so bad. Because essentially ApacheDS has code we canuse to optimize the search by leveraging indices to reduce the searchset even when we do not have an index.


To illustrate this with an example take the following search:

(& (abc=foo) (xyz=bar) )

An optimizer will take a scan count of the indices if present andannotate the leaf nodes of the AST representation of this filter whichlooks like this:


       & [10]
       |
      / \
     /   \
    /     \
   /       \
abc=foo  xyz=bar
 [20]      [10]

If both indices are present then we get scan counts on both. The parentnode inherits the scan count of the lesser of the two child nodes withAND operations. The search engine will loop through all the indexentries with xyz=bar using a cursor on index xyz and do lookups on theabc index to see if each candidate also has xyz=bar. This way 10 loopiterations with 10 lookups is cheaper than 20 loop iterations with 20lookups. If one index is missing:


       & [20]
       |
      / \
     /   \
    /     \
   /       \
abc=foo  xyz=bar
 [20]      [INT_MAX]

In this case the engine will position a cursor on the abc index anditerate through all the index entries where the key = foo. Then eachvalue corresponding to an id into the master table is used to lookup theproperties of the bundle. The xyz property is pulled out and comparedto see if we have a match. So with a missing index we have a partialscan of the master table.


With both indices missing we have a full scan of the master table.

I think a partial scan is just fine ... I was just afraid of having toomany indices without bounds which would make this solution useless.


I might tinker a bit with this.

Alex

Re: Needed: LDAP expression evaluation optimization

Reply via email to