Re: Update about subtree problems

Alex Karasulu Tue, 20 Jul 2010 01:53:51 -0700

On Tue, Jul 20, 2010 at 11:15 AM, Emmanuel Lecharny <elecha...@gmail.com>wrote:


>  Hi Howard,
>
> On 7/20/10 9:29 AM, Howard Chu wrote:Some side note :
>
>  after having done some perf tests on the evaluator, and applied some
>>> improvement, I can tell that depending on the number of subentries an
>>> entry is depending on, the cost of this evaluation can goes up to 50% of
>>> the search itself cost - not counting the network layer -. For instance,
>>> evaluating a subtreeSpecification with a min and a max, no chop, will be
>>> done up to 1 000 000 times per second on a 3 level DN (this is all
>>> dependent on the DN size)
>>>
>>
>> IMO, the considerations here are the same as for the O(1) rename. I.e.,
>> when you remove the entryDN from the entry in the DB, you have to calculate
>> the DN on the fly, and it certainly is a frequently referenced datum. You
>> make this cheap by caching the entryDN in memory, and it's very clear when a
>> cached DN must be invalidated - most of the time the cached value will not
>> change.
>>
> The DN cache is most certainly needed for faster operations. Building a DN
> on the fly for every entry is one of the most costly operation, so if we can
> speed it up with a cache, it's a net gain. Having the DN in the entry OTOH
> is not necessary a big gain : you still have to deserialize it if it's not
> in cache, and this is also costly.
>
> Obviously, all those considerations fell in a big dark hole if you have a
> decent entry cache, as the entries in memory already store the full DN...
> Any modification like a rename or a move will of course invalidate the
> entries in this cache.
>
> All in all, most of the case, you don't have to do all those
> computations...
>
> Regarding the subtree handling, it's different, as you can't spare the
> entry evaluation if the entries don't contain the reference to the subentry
> they depend upon. This evaluation can be costly, up to a point it's more
> expensive than fetching the entry itself.
>
> The rational being the choice I made 3 years ago (and which was reverted)
> to put the DN into the entry was just to speed up any search by avoiding
> costly computation at a price of costly unfrequent operations like Move or
> Rename (MODDN).
>
> If you have to move data in a Ldap base, User, then you have to pay the
> price !
>
>
Well yes but even renames cost the same as moves if the DN is in the entry.
Someone changing an ou=People to ou=Users containing 100 Million entries
should not expect to wait hours before it completes. Plus the atomicity
issue is seriously nasty. The DN embedded into the Entry was definitely not
the way to go. In fact Kiran and Seelmann's new RDN index to replace the DN
index saved us big time making these operations atomic, faster and safer.

-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: Update about subtree problems

Reply via email to