Re: [Mavibot] redux...

Kiran Ayyagari Thu, 15 Jun 2017 02:55:05 -0700

On Tue, Jun 13, 2017 at 2:48 PM, Emmanuel Lécharny <[email protected]>
wrote:


> Hi guys !
>
>
> 3 months without touching this beast, I was quite busy enjoying the new
> little me, vacations and API/ApacheDS release (which took me one full
> week) ! So I have been able to put some thought on mental paper last weeks.
>
>
> I came to realize that teh way we process updates is not completly
> correct, and need to be amended. Here are two items that need to be
> refactored :
>
>
> BtreeOfBtrees
>
> -------------
>
> This B-tree is supposed to hold all the existing revisions for all the
> existing B-trees (until a B-tree/revision is not anymore in use). There
> are two problems with this approach :
>
> o although it's necessary to be able to retrieve an old revision for a
> given b-tree (because a read transaction is still alive and needs access
> to it), we don't need to store that information on disk, because if the
> system is stopped, then all those old revisions are simply useless - we
> just need the latest revision for each B-tree -. That means we most
> certainly don't need to keep a B-tree for that purpose, we can simply
> have a Map<B-tree, revision> in memory that is serialiazed when a write
> transaction is committed. The serialized data would just be teh offset
>
+1

> of the latest B-tree rootPage, so 8 bytes. That will be a very dense way
> to store this information, and it means we will write way less pages on
> disk, compared to a standard B-tree update (we are talking about 1 page
> being written (for a 512 bytes page, which can hold 63 B-tree offsets),
> compared to a B-tree header plus as many nodes and leaves written for an
> updated B-tree.
>
> We would also save time at startup, as we won't have anymore to remove
> all the dead versions. Not that it's critical, but if we can avoid doing
> so, great. OTOH, we will have to fetch the info from the disk for each
> B-tree we haven't yet accessed. Not such a hassle though (either we do
> it for all teh B-trees at the beginning, or we wait until a B-tree is
> accessed teh first time to do so).
>
>
> o The more important problem is that we currently do updates on a B-tree
> instance, like :
>
>     btree.insert( writeTxn, key, value );
>
> This is not good, because in a multi-threaded environment the btree may
> be used by more than one thread. Of course, we will have some locks
> pending for an update, but if the btree is used by readers, we can't
> conveniently make it using one specific version, unless we make the
> B-tree stateless and immutable (that means the btree instance doesn't
> know anything about its current revision, which is handled by the
> transaction).
>
> Another option would be to move the functionality to the RecordManager :
>
>     recordManager.insert( writeTxn, "test", key, value );
>
> this model is identical to what many other popular KV stores do and should
definitely be the way to go IMO. Exposing the lower-layer (i.e BTree here)
makes
implementation unnecessarily complex.

> Here, we tell the RM to fetch the b-tree by its name, and to add the <K,V>.
>
> This second flavor is easier to implement, because we have one single
> access point.
>
>
> OTOH, we can make teh first flavor working, but it's more complex, and
> it requires some substancial modifications on teh way the B-tree is
> implemented.
>
>
> I still have to think more about it.
>
>
> More to come later !
>
>
>
>
> --
> Emmanuel Lecharny
>
> Symas.com
> directory.apache.org
>
> Kiran

Re: [Mavibot] redux...

Reply via email to