Re: Update about subtree problems
Emmanuel Lecharny wrote: On 7/20/10 7:46 PM, Howard Chu wrote: Emmanuel Lecharny wrote: Thinking about my previous mail, I may have an idea : what if we don't store the subentry's DN into the selected entries operational attributes, but the subentry's entryUUID ? This value *never* changes, we have an index on it, we can also easily build a cache for it. It will solve the problems I mentionned with the move and rename operations, I think. thoughts ? Sounds good. Pretty sure that AD uses UUID references everywhere too, for similar reasons. I've suggested a few times in the past that LDAP needs a UUIDAndProvisionalName syntax. The UUID would always be present; the DN may be present but may or may not be correct/up to date. (Of course, it may seem senseless to carry a DN around if it isn't always going to be correct, but the point is to reserve a space to put it for when you do know the correct name, that's all.) Funny enough, I was also thinking about a similat solution, but I would have named it something like UUIDOrProvisionalName :) So that you can store either an UUID or a DN. The problem now is that the existing AT subschemaSubentry and collectiveAttributeSubentries have a DN syntax... So I had to create two new AT to store an UUID. May be the RFC 3671/2 need an update ? I think the general idea, at least for AD, is that it's only an internal design feature. From the user perspective, all you see is the DN. Internally, you carry the entry UUID and you only look it up in your index when you need to return the DN to a client. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Re: Update about subtree problems
On 7/20/10 7:46 PM, Howard Chu wrote: Emmanuel Lecharny wrote: Thinking about my previous mail, I may have an idea : what if we don't store the subentry's DN into the selected entries operational attributes, but the subentry's entryUUID ? This value *never* changes, we have an index on it, we can also easily build a cache for it. It will solve the problems I mentionned with the move and rename operations, I think. thoughts ? Sounds good. Pretty sure that AD uses UUID references everywhere too, for similar reasons. I've suggested a few times in the past that LDAP needs a UUIDAndProvisionalName syntax. The UUID would always be present; the DN may be present but may or may not be correct/up to date. (Of course, it may seem senseless to carry a DN around if it isn't always going to be correct, but the point is to reserve a space to put it for when you do know the correct name, that's all.) Funny enough, I was also thinking about a similat solution, but I would have named it something like UUIDOrProvisionalName :) So that you can store either an UUID or a DN. The problem now is that the existing AT subschemaSubentry and collectiveAttributeSubentries have a DN syntax... So I had to create two new AT to store an UUID. May be the RFC 3671/2 need an update ? -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com
Re: Update about subtree problems
Emmanuel Lecharny wrote: Thinking about my previous mail, I may have an idea : what if we don't store the subentry's DN into the selected entries operational attributes, but the subentry's entryUUID ? This value *never* changes, we have an index on it, we can also easily build a cache for it. It will solve the problems I mentionned with the move and rename operations, I think. thoughts ? Sounds good. Pretty sure that AD uses UUID references everywhere too, for similar reasons. I've suggested a few times in the past that LDAP needs a UUIDAndProvisionalName syntax. The UUID would always be present; the DN may be present but may or may not be correct/up to date. (Of course, it may seem senseless to carry a DN around if it isn't always going to be correct, but the point is to reserve a space to put it for when you do know the correct name, that's all.) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Re: Update about subtree problems
Ok, let's think more in details about the last idea (using entryUUID instead of DN to refer to subentries). We may also need an index to map ( so that we can quickly find the impacted entries if we modify a subentry. We may need two caches : - one for subentries : - one for APs : The first cache is used when we need to quickly get a subentry The second cache is used when we move an entry, to know if it move from one AP to another one. Add operation : 1) we are adding a normal entry if it's under some APs, we evaluate it against all the associated subentries, and when it evaluates to true, we store the subentries entryUUID in the entry. We also update the index. 2) we are adding a subentry for each entries under the associated AP, see if they are part of the SS, and if so, inject the subentryUUID into the entry. Update the E> index. Delete operation : 1) we delete a normal entry nothing special to do 2) we delete a subentry we have to modify all the referencing entries and remove the subentry UUID from them. We use the index to get this list of entries. Then we can delete the subentry and update the index. One slight problem though : we should not be able to process any other modification on a subentry while it's being deleted. Rename operation : 1) We rename a normal entry this entry may have children, and some of those children may be AP or subentries. However, as we are using entryUUID to reference a subentry, this operation has no impact on the entries content. 2) We rename a subentry assuming that the subentry will remain in the same AP, we will just have to update the caches. Nothing else is needed. Move operation 1) We move a normal entry this entry might have APs in some of its children. They won't be impacted. However, as we may move the subtree from one AP area to another one, we will have to check two things : - first we have to list all the subentries the moved entries are referring to (we use the AP and subentry cache for that). this can be done once for the moved entry, as all the children will share the same subentries. - second, we have to find the new subentries the children will depend upon once moved. Now, we can remove the subentry references from the first set not present in the second in each selected entry (using the index), add the new subentry references into the selected entries (and we have to evaluate all the entries one by one), updating the index at the same time, and last, delete all the references from the old subentries, updating the index. 2) We move a subentry Moving the subentry is a simpler operation : we update all the entries containg a reference to the moved subentry (using the index to find them), and add a new reference to the moved subentry into all the selected entries. It's a delete and add operation, in fact, applied to the subentry. 3) We move an AP Moving an AP is like moving a normal entry : if we have APs higher in the tree (ie closer to the rootDSE than the moved AP), and if we don't inherit from the same AP, we then have to udpate the children entries by removing the reference to the old subentries and add references to the new subentries. Last operation, not the simplest Modify operation 1) We modify a normal entry No impact if we don't try to modify the references to the subentries (something only an admin can do...) 2) We modify a normal entry to make it a subentry It's like a subentry creation. 3) We modify a subentry Depending on what we modify in the subentry, the consequences will be different : - if we modify the cn, it has no impact - if we modify the OC, for instance by removing the "subentry' element, then it's a deletion operation. Anything else has no impact - if we modify the subtreeSpecifiction, we can consider that it's like a deletion followed by an addition, as we have to reevaluate and update all the selected entries 4) We modify a subentry to make it a normal entry It's a deletion operation (see case (3), second modification 5) We modify an AP to make it a normal entry It should be forbidden if the AP has some attached subentries, otherwise it has no impact. One more thing : how do we deal with aliases ? -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com
Re: Update about subtree problems
Thinking about my previous mail, I may have an idea : what if we don't store the subentry's DN into the selected entries operational attributes, but the subentry's entryUUID ? This value *never* changes, we have an index on it, we can also easily build a cache for it. It will solve the problems I mentionned with the move and rename operations, I think. thoughts ? -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com
Re: Update about subtree problems
Some more thoughts... Let's assume we keep the current implementation. What does it implies for a move or a rename operation ? If we are trying to move or rename a simple entry having children, we may have a hug problem. Let's draft a sample, where uper case letters are AP and lowercase are normal entries. I will describe the hierarchy using a file system notation rootDSE/ A/ a1 a2 B/ b1 b2/ C/ c1 c2 Now, let's try to rename b2... It will impact the subentries under C, which will be renamed. So it will also impact the selected entries under this renamed subentry. In which order should be proceed with the modifications ? Currently, the move is done by simply changing one element in the RDN index. But renaming a subentry implies that we change all the associated references. We know have to check in all the b2 children if there is an AP and some subentries, and for each of them, modify the operational attributes. Until this is done, we have pending references pointing to nowhere, because the pointed subentries have been renamed Problem ! It's exactly the same for a move operation... Not exactly a nice situation :/ -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com
Re: Update about subtree problems
On 7/20/10 10:53 AM, Alex Karasulu wrote: On Tue, Jul 20, 2010 at 11:15 AM, Emmanuel Lecharnywrote: If you have to move data in a Ldap base, User, then you have to pay the price ! Well yes but even renames cost the same as moves if the DN is in the entry. Someone changing an ou=People to ou=Users containing 100 Million entries should not expect to wait hours before it completes. Plus the atomicity issue is seriously nasty. The DN embedded into the Entry was definitely not the way to go. In fact Kiran and Seelmann's new RDN index to replace the DN index saved us big time making these operations atomic, faster and safer. Faster, I don't know, but IMO, it's irrelevant. Safer, that's absolutely certain. But then you have the exact same problem with the subtree implemented approach : no atomicity, slow rename/move operations, even for normal entries, and no atomicity... This is what bugs me : why should we apply a strategy for DN and the opposite one for subentries... -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com
Re: Update about subtree problems
On 7/20/10 10:47 AM, Alex Karasulu wrote: Hi Emmanuel, On Tue, Jul 20, 2010 at 2:34 AM, Emmanuel Lecharnywrote: Hi guys, just an update about the current situation. We discussed on this ML about options regarding how to handle subtrees in the most efficient way in the server. We currently have 3 possibilities : - the first one, which is partially implemented in trunk, consist on static references to subentries in all the entries selected by the subtreeSpecifications defined in each subentry. I'm supposing you're referring to static data references in partition Entries which reference the subentries affecting the Entry using the proper administrative operational attributes, rather than using static Java references in the Entry class correct? I'm reffering to the subentry DN which are stored in the entries, as operational attributes, right. - the second one suppose that we compute when needed if an entry is a selected entry, each time we access this entry - the third solution, which has not been discussed already, would be to have an index which stores the list of all the selected entries associated with each subentry. Let's first think about the third solution (this is the one implemented by OpenDS, btw). It's a costly one when you create a new subentry, or when you move one, as you have to update the index with all the impacted entries. Plus it also has a cost as you have to hit the index to get the list of subentries an entry is part of. Of course, we can use a cache, but I'm not sure that it will save a lot of time. In fact, AFAICT, you pay the price twice : when creating the subentry, and each time you access any entry. Ok, so option 3 does not seems to be reasonable. To be frank, we discussed about t with Pierre-Arnaud but not really considered this as a decent solution. We may be wrong though. There might be one other option. How about adding the references to the affecting subentries to the RDN index which replaced our old DN index? Essentially we add an additional field to store these attributes in the RDN value which we already have turned into a structure to hold the parent ID of entries. uh oh, sounds smart :) But sadly, the inference is the same : you will have to inject the references in *all* the RDN of *ll* the selected entries. It does not change the overall cost, which is still O(n). I think this is more valuable than a separate index because first we don't need fully bidirectional lookups. We will not be looking up entries by the subentry DN to get a list. We will be checking to see if each entry before returning it or modifying it is impacted by subentries. Second this RDN index will most likely be in memory. It's so central we should have it all cached anyway. Sort of like what AD does with it's global catalog. Yes, we need a cache for the RDN. This will be one of the major improvement we can make to the server once we get 2.0 out. So carrying this additional information in the RDN index hence the DN, will allow us to rapidly determine which subentries impact an Entry. see above : it will take longer to determinate if an entry is impacted, as you will have to check the RDN, when you already have the information in the op attr. Ok, only slightly longer. Now in terms of handling administrative changes we still have the same issue: a potentially massive numbers of updates will be unavoidable on the RDN index. However maybe the heavy caching here might help speed things up. No, because you still have to flush the data on disk. The problem is the same : O(n) operation, whatever we do. Don't if this is the case but Emmanuel your tests on JDBM performance with writes and caches etc might help here. Maybe this is a horrible solution but I don't know just a brain dump. I don't know how to transform a O(n) problem on O(1) either ;) Regarding option 1 and 2, none is perfect. Let's see the pros and cons for each of them Solution 1 (update when manipulating the subentry) -- pros : - *very* efficient operations on normal entries, as they already have a reference to all the subentries that select them. - no extra cost when doing a rename, a modify, an add or a delete of a normal entry. - all the subentries are stored in a cache loaded at startup, no need to fetch an index to get a subentry - evaluating if an entry is selected is done when the subentry is added or moved - move operations applied on normal entries are rare - any operation applied on a subentry are considered as admin operation, thus unfrequent and scheduled. Downtime is under control. cons : - *very* costly add/delete/move/rename operations applied on a subentry, as every selected entries will have to be modified - *very* costly move operation of a normal entry if it moves from one AP to another one, and when it has many children, as we have to update all the children twice : first to reove the old subentries' reference, the second one to add the
Re: Update about subtree problems
On Tue, Jul 20, 2010 at 11:15 AM, Emmanuel Lecharny wrote: > Hi Howard, > > On 7/20/10 9:29 AM, Howard Chu wrote:Some side note : > > after having done some perf tests on the evaluator, and applied some >>> improvement, I can tell that depending on the number of subentries an >>> entry is depending on, the cost of this evaluation can goes up to 50% of >>> the search itself cost - not counting the network layer -. For instance, >>> evaluating a subtreeSpecification with a min and a max, no chop, will be >>> done up to 1 000 000 times per second on a 3 level DN (this is all >>> dependent on the DN size) >>> >> >> IMO, the considerations here are the same as for the O(1) rename. I.e., >> when you remove the entryDN from the entry in the DB, you have to calculate >> the DN on the fly, and it certainly is a frequently referenced datum. You >> make this cheap by caching the entryDN in memory, and it's very clear when a >> cached DN must be invalidated - most of the time the cached value will not >> change. >> > The DN cache is most certainly needed for faster operations. Building a DN > on the fly for every entry is one of the most costly operation, so if we can > speed it up with a cache, it's a net gain. Having the DN in the entry OTOH > is not necessary a big gain : you still have to deserialize it if it's not > in cache, and this is also costly. > > Obviously, all those considerations fell in a big dark hole if you have a > decent entry cache, as the entries in memory already store the full DN... > Any modification like a rename or a move will of course invalidate the > entries in this cache. > > All in all, most of the case, you don't have to do all those > computations... > > Regarding the subtree handling, it's different, as you can't spare the > entry evaluation if the entries don't contain the reference to the subentry > they depend upon. This evaluation can be costly, up to a point it's more > expensive than fetching the entry itself. > > The rational being the choice I made 3 years ago (and which was reverted) > to put the DN into the entry was just to speed up any search by avoiding > costly computation at a price of costly unfrequent operations like Move or > Rename (MODDN). > > If you have to move data in a Ldap base, User, then you have to pay the > price ! > > Well yes but even renames cost the same as moves if the DN is in the entry. Someone changing an ou=People to ou=Users containing 100 Million entries should not expect to wait hours before it completes. Plus the atomicity issue is seriously nasty. The DN embedded into the Entry was definitely not the way to go. In fact Kiran and Seelmann's new RDN index to replace the DN index saved us big time making these operations atomic, faster and safer. -- Alex Karasulu My Blog :: http://www.jroller.com/akarasulu/ Apache Directory Server :: http://directory.apache.org Apache MINA :: http://mina.apache.org To set up a meeting with me: http://tungle.me/AlexKarasulu
Re: Update about subtree problems
Hi Emmanuel, On Tue, Jul 20, 2010 at 2:34 AM, Emmanuel Lecharny wrote: > Hi guys, > > just an update about the current situation. > > We discussed on this ML about options regarding how to handle subtrees in > the most efficient way in the server. We currently have 3 possibilities : > > - the first one, which is partially implemented in trunk, consist on static > references to subentries in all the entries selected by the > subtreeSpecifications defined in each subentry. > > I'm supposing you're referring to static data references in partition Entries which reference the subentries affecting the Entry using the proper administrative operational attributes, rather than using static Java references in the Entry class correct? > - the second one suppose that we compute when needed if an entry is a > selected entry, each time we access this entry > > - the third solution, which has not been discussed already, would be to > have an index which stores the list of all the selected entries associated > with each subentry. > > Let's first think about the third solution (this is the one implemented by > OpenDS, btw). It's a costly one when you create a new subentry, or when you > move one, as you have to update the index with all the impacted entries. > Plus it also has a cost as you have to hit the index to get the list of > subentries an entry is part of. Of course, we can use a cache, but I'm not > sure that it will save a lot of time. In fact, AFAICT, you pay the price > twice : when creating the subentry, and each time you access any entry. > > Ok, so option 3 does not seems to be reasonable. To be frank, we discussed > about t with Pierre-Arnaud but not really considered this as a decent > solution. We may be wrong though. > > There might be one other option. How about adding the references to the affecting subentries to the RDN index which replaced our old DN index? Essentially we add an additional field to store these attributes in the RDN value which we already have turned into a structure to hold the parent ID of entries. I think this is more valuable than a separate index because first we don't need fully bidirectional lookups. We will not be looking up entries by the subentry DN to get a list. We will be checking to see if each entry before returning it or modifying it is impacted by subentries. Second this RDN index will most likely be in memory. It's so central we should have it all cached anyway. Sort of like what AD does with it's global catalog. So carrying this additional information in the RDN index hence the DN, will allow us to rapidly determine which subentries impact an Entry. Now in terms of handling administrative changes we still have the same issue: a potentially massive numbers of updates will be unavoidable on the RDN index. However maybe the heavy caching here might help speed things up. Don't if this is the case but Emmanuel your tests on JDBM performance with writes and caches etc might help here. Maybe this is a horrible solution but I don't know just a brain dump. > Regarding option 1 and 2, none is perfect. > > Let's see the pros and cons for each of them > > Solution 1 (update when manipulating the subentry) > -- > pros : > - *very* efficient operations on normal entries, as they already have a > reference to all the subentries that select them. > - no extra cost when doing a rename, a modify, an add or a delete of a > normal entry. > - all the subentries are stored in a cache loaded at startup, no need to > fetch an index to get a subentry > - evaluating if an entry is selected is done when the subentry is added or > moved > - move operations applied on normal entries are rare > - any operation applied on a subentry are considered as admin operation, > thus unfrequent and scheduled. Downtime is under control. > > cons : > - *very* costly add/delete/move/rename operations applied on a subentry, as > every selected entries will have to be modified > - *very* costly move operation of a normal entry if it moves from one AP to > another one, and when it has many children, as we have to update all the > children twice : first to reove the old subentries' reference, the second > one to add the new references > - Adding or removing a subentry is a non atomic operation, as we have first > to update the selected entries one by one, and then to add/delete the > subentry > > Right we have to have some kind of atomicity control here. I wish we had MVCC to assist us here. > Solution 2 (evaluating when needed ) > > pros : > - adding/removing/moving/renaming a subentry is a O(1) operation > - moving normal entries selected by some subentries does not cost more than > a standard move > - easy to implement > > cons : > - each time we access an entry (search, compare), we will have to evaluate > the subtreeSpecifciation for all the subentries this entry is dependent on > - we will need another cache to
Re: Update about subtree problems
Hi Howard, On 7/20/10 9:29 AM, Howard Chu wrote:Some side note : after having done some perf tests on the evaluator, and applied some improvement, I can tell that depending on the number of subentries an entry is depending on, the cost of this evaluation can goes up to 50% of the search itself cost - not counting the network layer -. For instance, evaluating a subtreeSpecification with a min and a max, no chop, will be done up to 1 000 000 times per second on a 3 level DN (this is all dependent on the DN size) IMO, the considerations here are the same as for the O(1) rename. I.e., when you remove the entryDN from the entry in the DB, you have to calculate the DN on the fly, and it certainly is a frequently referenced datum. You make this cheap by caching the entryDN in memory, and it's very clear when a cached DN must be invalidated - most of the time the cached value will not change. The DN cache is most certainly needed for faster operations. Building a DN on the fly for every entry is one of the most costly operation, so if we can speed it up with a cache, it's a net gain. Having the DN in the entry OTOH is not necessary a big gain : you still have to deserialize it if it's not in cache, and this is also costly. Obviously, all those considerations fell in a big dark hole if you have a decent entry cache, as the entries in memory already store the full DN... Any modification like a rename or a move will of course invalidate the entries in this cache. All in all, most of the case, you don't have to do all those computations... Regarding the subtree handling, it's different, as you can't spare the entry evaluation if the entries don't contain the reference to the subentry they depend upon. This evaluation can be costly, up to a point it's more expensive than fetching the entry itself. The rational being the choice I made 3 years ago (and which was reverted) to put the DN into the entry was just to speed up any search by avoiding costly computation at a price of costly unfrequent operations like Move or Rename (MODDN). If you have to move data in a Ldap base, User, then you have to pay the price ! -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com
Re: Update about subtree problems
Emmanuel Lecharny wrote: Basically, sol.1 and sol.2 are both painful, but there is no perfect solution anyway. I thought about what is the best implementation all day long, and if yesterday I was convinced that the solution 2 was better. But the evaluation cost done on each entry every time a client does a search is quite high (let's say up to 20% of the search cost, for simple SS). That will not slow down the server that much, as most of the time is consummed in the network layer right now, but this won't last. Now I think that the current solution is may be a not so bad compromise, assuming we don't do a lot of move operation cross APs on normal entries. What bugs me here is that last year, for the opposite reason, the DN was removed from the entries stored in the master table, just t be able to do O(1) move operations. This is quite paradoxal ! I still think that move operations are very rare, and that storing the DN into the entries is a net gain for most of the operations, except for a move... From another perspective, at least in the OpenLDAP case, O(1) rename operations were only one of the benefits of the latter approach. The other was the huge improvement in scalability of the DN index, which was quite bloated otherwise. When you see the opportunity to get more than one benefit from a particular approach, then it becomes a more compelling choice... Some side note : after having done some perf tests on the evaluator, and applied some improvement, I can tell that depending on the number of subentries an entry is depending on, the cost of this evaluation can goes up to 50% of the search itself cost - not counting the network layer -. For instance, evaluating a subtreeSpecification with a min and a max, no chop, will be done up to 1 000 000 times per second on a 3 level DN (this is all dependent on the DN size) IMO, the considerations here are the same as for the O(1) rename. I.e., when you remove the entryDN from the entry in the DB, you have to calculate the DN on the fly, and it certainly is a frequently referenced datum. You make this cheap by caching the entryDN in memory, and it's very clear when a cached DN must be invalidated - most of the time the cached value will not change. You have potentially increased the cost of a read operation, but in practice the cost is zero, while the savings on the write side is significant. Last, not least : the current implementation is really incomplete. The Move, Rename and MoveAndRename operations are not correctly handled, with many entries not being updated. I'm going to fix them. I have also created a branch to play with subtree without breaking the trunk. I'm not sure I will continue to work on this branch if a decision is made to keep the first solution. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Re: Update about subtree problems
Hi Emmanuel, As we've seen together, that's not an easy subject and there's no real solution, just relative trade-offs. I really believe that a good discussion on the ML is necessary in order to be sure about what type of operation is to be preferred in those kind of situations. A community-wide consensus is really important and will help a lot simplifying the decisions in the future. Regards, Pierre-Arnaud On 20 juil. 2010, at 01:34, Emmanuel Lecharny wrote: > Hi guys, > > just an update about the current situation. > > We discussed on this ML about options regarding how to handle subtrees in the > most efficient way in the server. We currently have 3 possibilities : > > - the first one, which is partially implemented in trunk, consist on static > references to subentries in all the entries selected by the > subtreeSpecifications defined in each subentry. > > - the second one suppose that we compute when needed if an entry is a > selected entry, each time we access this entry > > - the third solution, which has not been discussed already, would be to have > an index which stores the list of all the selected entries associated with > each subentry. > > Let's first think about the third solution (this is the one implemented by > OpenDS, btw). It's a costly one when you create a new subentry, or when you > move one, as you have to update the index with all the impacted entries. Plus > it also has a cost as you have to hit the index to get the list of subentries > an entry is part of. Of course, we can use a cache, but I'm not sure that it > will save a lot of time. In fact, AFAICT, you pay the price twice : when > creating the subentry, and each time you access any entry. > > Ok, so option 3 does not seems to be reasonable. To be frank, we discussed > about t with Pierre-Arnaud but not really considered this as a decent > solution. We may be wrong though. > > Regarding option 1 and 2, none is perfect. > > Let's see the pros and cons for each of them > > Solution 1 (update when manipulating the subentry) > -- > pros : > - *very* efficient operations on normal entries, as they already have a > reference to all the subentries that select them. > - no extra cost when doing a rename, a modify, an add or a delete of a normal > entry. > - all the subentries are stored in a cache loaded at startup, no need to > fetch an index to get a subentry > - evaluating if an entry is selected is done when the subentry is added or > moved > - move operations applied on normal entries are rare > - any operation applied on a subentry are considered as admin operation, thus > unfrequent and scheduled. Downtime is under control. > > cons : > - *very* costly add/delete/move/rename operations applied on a subentry, as > every selected entries will have to be modified > - *very* costly move operation of a normal entry if it moves from one AP to > another one, and when it has many children, as we have to update all the > children twice : first to reove the old subentries' reference, the second one > to add the new references > - Adding or removing a subentry is a non atomic operation, as we have first > to update the selected entries one by one, and then to add/delete the subentry > > Solution 2 (evaluating when needed ) > > pros : > - adding/removing/moving/renaming a subentry is a O(1) operation > - moving normal entries selected by some subentries does not cost more than a > standard move > - easy to implement > > cons : > - each time we access an entry (search, compare), we will have to evaluate > the subtreeSpecifciation for all the subentries this entry is dependent on > - we will need another cache to lookup for the entry's related subentries > (this cache will work pretty much like the one we use for the NamingContexts) > > > Basically, sol.1 and sol.2 are both painful, but there is no perfect solution > anyway. > > I thought about what is the best implementation all day long, and if > yesterday I was convinced that the solution 2 was better. But the evaluation > cost done on each entry every time a client does a search is quite high > (let's say up to 20% of the search cost, for simple SS). That will not slow > down the server that much, as most of the time is consummed in the network > layer right now, but this won't last. > > Now I think that the current solution is may be a not so bad compromise, > assuming we don't do a lot of move operation cross APs on normal entries. > > What bugs me here is that last year, for the opposite reason, the DN was > removed from the entries stored in the master table, just t be able to do > O(1) move operations. This is quite paradoxal ! I still think that move > operations are very rare, and that storing the DN into the entries is a net > gain for most of the operations, except for a move... > > > Some side note : > after having done some