On Tue, Jun 13, 2017 at 2:48 PM, Emmanuel Lécharny <[email protected]> wrote:
> Hi guys ! > > > 3 months without touching this beast, I was quite busy enjoying the new > little me, vacations and API/ApacheDS release (which took me one full > week) ! So I have been able to put some thought on mental paper last weeks. > > > I came to realize that teh way we process updates is not completly > correct, and need to be amended. Here are two items that need to be > refactored : > > > BtreeOfBtrees > > ------------- > > This B-tree is supposed to hold all the existing revisions for all the > existing B-trees (until a B-tree/revision is not anymore in use). There > are two problems with this approach : > > o although it's necessary to be able to retrieve an old revision for a > given b-tree (because a read transaction is still alive and needs access > to it), we don't need to store that information on disk, because if the > system is stopped, then all those old revisions are simply useless - we > just need the latest revision for each B-tree -. That means we most > certainly don't need to keep a B-tree for that purpose, we can simply > have a Map<B-tree, revision> in memory that is serialiazed when a write > transaction is committed. The serialized data would just be teh offset > +1 > of the latest B-tree rootPage, so 8 bytes. That will be a very dense way > to store this information, and it means we will write way less pages on > disk, compared to a standard B-tree update (we are talking about 1 page > being written (for a 512 bytes page, which can hold 63 B-tree offsets), > compared to a B-tree header plus as many nodes and leaves written for an > updated B-tree. > > We would also save time at startup, as we won't have anymore to remove > all the dead versions. Not that it's critical, but if we can avoid doing > so, great. OTOH, we will have to fetch the info from the disk for each > B-tree we haven't yet accessed. Not such a hassle though (either we do > it for all teh B-trees at the beginning, or we wait until a B-tree is > accessed teh first time to do so). > > > o The more important problem is that we currently do updates on a B-tree > instance, like : > > btree.insert( writeTxn, key, value ); > > This is not good, because in a multi-threaded environment the btree may > be used by more than one thread. Of course, we will have some locks > pending for an update, but if the btree is used by readers, we can't > conveniently make it using one specific version, unless we make the > B-tree stateless and immutable (that means the btree instance doesn't > know anything about its current revision, which is handled by the > transaction). > > Another option would be to move the functionality to the RecordManager : > > recordManager.insert( writeTxn, "test", key, value ); > > this model is identical to what many other popular KV stores do and should definitely be the way to go IMO. Exposing the lower-layer (i.e BTree here) makes implementation unnecessarily complex. > Here, we tell the RM to fetch the b-tree by its name, and to add the <K,V>. > > This second flavor is easier to implement, because we have one single > access point. > > > OTOH, we can make teh first flavor working, but it's more complex, and > it requires some substancial modifications on teh way the B-tree is > implemented. > > > I still have to think more about it. > > > More to come later ! > > > > > -- > Emmanuel Lecharny > > Symas.com > directory.apache.org > > Kiran
