On 08/04/2017 02:08 PM, Ilias Stamatis wrote:
Okay, now that I have read and understood dbscan's code, I have a few more questions.

2017-08-03 10:10 GMT+03:00 Ludwig Krispenz <lkris...@redhat.com <mailto:lkris...@redhat.com>>:

    Hi, now that I know the context here are some more comments.
    If the purpose is to create a useful ldif file, which could
    eventually be used for import then formatting an entry correctly
    is not enough. Order of entries matters: parents need to come
    before children. We already handle this in db2ldif or replication
    total update.
    That said, whenever you write an entry you always have seen the
    parent and could stack the dn with the parentid and createt the dn
    without using the entryrdn index.
    You even need not to keep track of all the entry rdsn/dns - only
    the ones with children will be needed later, the presence of
    "numsubordinates"
    identifies a parent.


Is it guaranteed that parents are going to appear before children in id2entry.db?
no. that's what I said before, it is possible that parentid > entryid. It happens if an entry is moved by modrdn to aother subtree

If so, here's what could probably work:

- Start reading entries from id2entry sequentially.
- For each entry, if it has a numSubordinates attribute it means it is a parent for other entries. So we can store it's ID - DN pair in a hash map. - For entries that they have a parentid and so we need to figure out their parent's DN, we just look for hashmap[parentid].

To make it even more efficient (if really needed though, because it will make things more complicated) we can store the value of numSubordinates with each parent as well somehow in the map. Every time a parentid is looked in the map we can decrease the value of numSubordinates by 1. When it becomes 0, it means there are no more children of this ID so we can safely remove it from the map.

However, I don't know if we would really need this last thing. In a 100 million entry db how many parents would we expect to have approximately?

Also, do we have a hash map implemented somewhere?

If parents are not guaranteed to appear before children in id2entry.db, then we would have to alter the above strategy.

Thanks!



_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org

--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric 
Shander

_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org

Reply via email to