RE: JdbmPartition repair

2016-01-25 Thread Zheng, Kai
>> If we are to do that one day, we would rather use LMDB, which is way faster 
>> than SqlLite, proven, and small.

Agree. Looking at the benchmark result http://symas.com/mdb/microbench/, LMDB 
seems pretty good as well as LevelDB. One question, is it license (the OpenLDAP 
public license) compatible with ASF 2.0? 

Regards,
Kai

-Original Message-
From: Emmanuel Lécharny [mailto:elecha...@gmail.com] 
Sent: Monday, January 25, 2016 11:58 PM
To: Apache Directory Developers List <dev@directory.apache.org>
Subject: Re: JdbmPartition repair

Le 25/01/16 15:27, Zheng, Kai a écrit :
> Thanks a lot for the detailed and insightful explanation. I'm not able to 
> absorb it well because not familiar with the codes. It will serve as very 
> good materials when someday I need to look into the LDAP things. The details 
> make me believe it's very necessary to have a strong, mature, industry proven 
> backend for the LDAP server because the LDAP things are already kinds of 
> complex enough. We can't combine the LDAP logic with the storage engine, they 
> need to be separated, developed and tested separately. Looks like Mavibot is 
> going in this way and sounds good to me. What concerned me is that as we're 
> lacking enough resources on developing it, it may still take some time to 
> become mature and robust. 
Mavibot code base is small : 17 947 SLOCS


> But if we leverage some existing engine, then we can focus on the LDAP 
> stuffs and work on some advanced features, move on a little faster and 
> have releases like 2.x, 3.x and so on. Sqlite yes is C, but it's 
> supported on many platforms and Java can use it via JNI;
That would be a real pain. Linking som JNDI lib and make it a package is really 
something we would like to avoid like plague.

If we are to do that one day, we would rather use LMDB, which is way faster 
than SqlLite, proven, and small.

> it's a library, can be embedded in an application. You may dislike 
> JNI, but only a few of APIs are going to be wrapped for the usage, and 
> actually there're already wonderful wrappers for Java. Like 
> SnappyJava, the JNI layer along with the library can be bundled within 
> a jar file and get distributed exactly as a maven module. One thing 
> I'm not sure is how well the LDAP entries fit with the sql table 
> model,
Bottom line : very badly. Actually, using a SQL backend to store LDAP element 
is probably the worst possible solution. Simply because LDAP support 
multi-valued entries, something SAL databases don't support antively.

> but I guess there could be pretty much investigations in this direction. The 
> benefit would be, saving us amounts of developing and debugging time, robust 
> and high performance, transaction support and easy query. Some thoughts in 
> case any helps. Thanks.

Thanks. We have been evaluation all thos options for more than a decade now :-) 
OpenLDAP has gone the exact same path, for the exact same reasons.




RE: JdbmPartition repair

2016-01-25 Thread Zheng, Kai
Thanks a lot for the detailed and insightful explanation. I'm not able to 
absorb it well because not familiar with the codes. It will serve as very good 
materials when someday I need to look into the LDAP things. The details make me 
believe it's very necessary to have a strong, mature, industry proven backend 
for the LDAP server because the LDAP things are already kinds of complex 
enough. We can't combine the LDAP logic with the storage engine, they need to 
be separated, developed and tested separately. Looks like Mavibot is going in 
this way and sounds good to me. What concerned me is that as we're lacking 
enough resources on developing it, it may still take some time to become mature 
and robust. But if we leverage some existing engine, then we can focus on the 
LDAP stuffs and work on some advanced features, move on a little faster and 
have releases like 2.x, 3.x and so on. Sqlite yes is C, but it's supported on 
many platforms and Java can use it via JNI; it's a library, can be embedded in 
an application. You may dislike JNI, but only a few of APIs are going to be 
wrapped for the usage, and actually there're already wonderful wrappers for 
Java. Like SnappyJava, the JNI layer along with the library can be bundled 
within a jar file and get distributed exactly as a maven module. One thing I'm 
not sure is how well the LDAP entries fit with the sql table model, but I guess 
there could be pretty much investigations in this direction. The benefit would 
be, saving us amounts of developing and debugging time, robust and high 
performance, transaction support and easy query. Some thoughts in case any 
helps. Thanks.

Regards,
Kai

-Original Message-
From: Emmanuel Lécharny [mailto:elecha...@gmail.com] 
Sent: Monday, January 25, 2016 1:32 AM
To: Apache Directory Developers List <dev@directory.apache.org>
Subject: Re: JdbmPartition repair

Le 24/01/16 16:47, Zheng, Kai a écrit :
> Thanks Emmanuel for the sync and sharing. The approach looks pretty good, and 
> is often seen in many mature products. The repair process is triggered when 
> corruption is found while the server is running, or while restarting with a 
> specific option? Or the both? If the repairing stuff is not easy to 
> integrate, maybe a repair tool like the one Kiran worked out sounds good to 
> use? Or during startup time checking the data/index, if not fine then Java 
> system launching the tool process for the fixing? Just some thoughts, in case 
> some useful.

The corruption happens in some rare cases, and it's mostly due to some 
concurrent updates. Let me explain what happens in detail, and sorry of it's a 
big lengthly, it has to be.

We store entries in what we call the MasterTable. Entries are serialized, and 
each one of them has an associated ID (actually, we are using random UUID). So 
the master table is containing tuples of <UUID,
Entry>.
Each index refers to this MasterTable using the entry UUID. Typically, let's 
say an entry has an ObjectClass person, then the ObjectClass index will have a 
tuple <ObjectClass, Set> wher ethe set contains all the Entry's UUID of 
entrues that has the 'person' ObjectClass.

We also have one special index, the Rdn index. This one is more complex, 
because it is used for two things : refering to an entry from a RDN, and also 
keep a track of the hierarchy. If we have an entry which DN is 
ou=users,dc=example,dc=com, where dc=exmple,dc=com is the partition's root, 
then the RDN index will contain two tuples for the full DN : one for the entry 
itself, and one for the suffix. Actually, we don't store tuples like <Rdn, ID>, 
but a more complex structure, the ParentIdAndRdn.
The reason is that we may have many entries using the same RDN. For instance :

entry 1 : cn=jSmith,ou=users,dc=example,dc=com
entry 2 : cn=jSmith,ou=administrators,dc=example,dc=com

That this jSmith is one person or two is irrelevant. The thing is that we can't 
associate the RDN cn=jSmith with one single entry, so what we store is a tuple 
<entryId1, 

Re: JdbmPartition repair

2016-01-25 Thread Emmanuel Lécharny
Le 25/01/16 15:27, Zheng, Kai a écrit :
> Thanks a lot for the detailed and insightful explanation. I'm not able to 
> absorb it well because not familiar with the codes. It will serve as very 
> good materials when someday I need to look into the LDAP things. The details 
> make me believe it's very necessary to have a strong, mature, industry proven 
> backend for the LDAP server because the LDAP things are already kinds of 
> complex enough. We can't combine the LDAP logic with the storage engine, they 
> need to be separated, developed and tested separately. Looks like Mavibot is 
> going in this way and sounds good to me. What concerned me is that as we're 
> lacking enough resources on developing it, it may still take some time to 
> become mature and robust. 
Mavibot code base is small : 17 947 SLOCS


> But if we leverage some existing engine, then we can focus on the LDAP stuffs 
> and work on some advanced features, move on a little faster and have releases 
> like 2.x, 3.x and so on. Sqlite yes is C, but it's supported on many 
> platforms and Java can use it via JNI; 
That would be a real pain. Linking som JNDI lib and make it a package is
really something we would like to avoid like plague.

If we are to do that one day, we would rather use LMDB, which is way
faster than SqlLite, proven, and small.

> it's a library, can be embedded in an application. You may dislike JNI, but 
> only a few of APIs are going to be wrapped for the usage, and actually 
> there're already wonderful wrappers for Java. Like SnappyJava, the JNI layer 
> along with the library can be bundled within a jar file and get distributed 
> exactly as a maven module. One thing I'm not sure is how well the LDAP 
> entries fit with the sql table model, 
Bottom line : very badly. Actually, using a SQL backend to store LDAP
element is probably the worst possible solution. Simply because LDAP
support multi-valued entries, something SAL databases don't support
antively.

> but I guess there could be pretty much investigations in this direction. The 
> benefit would be, saving us amounts of developing and debugging time, robust 
> and high performance, transaction support and easy query. Some thoughts in 
> case any helps. Thanks.

Thanks. We have been evaluation all thos options for more than a decade
now :-) OpenLDAP has gone the exact same path, for the exact same reasons.




Re: JdbmPartition repair

2016-01-24 Thread Shawn McKinney
Emmanuel, thanks for keeping us informed.  I agree that corruption of data is a 
show stopper in terms of a product’s viability.  Can we recreate this issue or 
is it intermittent?  How can we help?

Shawn

> On Jan 24, 2016, at 7:39 AM, Emmanuel Lécharny  wrote:
> 
> Hi guys,
> 
> we have many users complaining about a corrupted JDBM database. As of
> today, we don't have another solution than telling them to reload their
> data, which is all be comfortable. First because it might take ages
> (reloading data is very slow) and also because they might not have a backup.
> 
> Although this is not a frequent scenario, when it happens, it really
> take down any credibility ApacheDS can have.
> 
> Here, we all know that Mavibot will be the solution, but until it's
> available with transaction support, we have to propose a tool that
> restore - of possible - the database.
> 
> Hopefully, Kiran has worked on a tool that does that : the
> partition-plumber. The idea is to intergrate this tool into ApacheDS in
> order to allow users to restore their database in  a simple way. Here is
> what I propose :
> 
> - first, a way to start ApacheDS in a repair mode. That will drip all
> the indexes, and recreate them based on the master table. It might take
> some time, but it will be way better than any solution, and in any case,
> will be faster than a full reload, consider that we will bypass many
> checks. I suggest an option : apacheds -repair. When the server is
> started with that option, the server will restart after having cleaned
> up the database
> - the way to implement it is to add a method in the Partition interface
> : repair(). Not all the partition will need it, so only JdbmPartition
> will actually iumplement it.
> - that method will simply delete (or copy, if we want a backup) all the
> existing indexes (system and users). We then will recreate the indexes
> based on the master table content. There is still a remote risk that the
> master table can be corrupted, but it's unlikely, or at least very rare.
> Actually, the Rdn index is the one which get corrupted most of the time,
> because it get updated many times for each addition, move, rename or
> delete operations.
> 
> I'm currently working on that, and it should be done fast enough (say,
> in less than a week, or even quicker if I have enough time this sundy
> and at night).
> 
> The next step, and I'm also working on that, is to finish Mavibot. The
> problem is that it's a complex piece of code, and it's hard to work on
> it when I just have a couple of hours on evening or during the week-end.
> I'm sorry for that. But we eventually will get it ready !
> 
> 
> Thanks !
> 



Re: JdbmPartition repair

2016-01-24 Thread Emmanuel Lécharny
Le 24/01/16 16:47, Zheng, Kai a écrit :
> Thanks Emmanuel for the sync and sharing. The approach looks pretty good, and 
> is often seen in many mature products. The repair process is triggered when 
> corruption is found while the server is running, or while restarting with a 
> specific option? Or the both? If the repairing stuff is not easy to 
> integrate, maybe a repair tool like the one Kiran worked out sounds good to 
> use? Or during startup time checking the data/index, if not fine then Java 
> system launching the tool process for the fixing? Just some thoughts, in case 
> some useful.

The corruption happens in some rare cases, and it's mostly due to some
concurrent updates. Let me explain what happens in detail, and sorry of
it's a big lengthly, it has to be.

We store entries in what we call the MasterTable. Entries are
serialized, and each one of them has an associated ID (actually, we are
using random UUID). So the master table is containing tuples of .
Each index refers to this MasterTable using the entry UUID. Typically,
let's say an entry has an ObjectClass person, then the ObjectClass index
will have a tuple  wher ethe set contains all
the Entry's UUID of entrues that has the 'person' ObjectClass.

We also have one special index, the Rdn index. This one is more complex,
because it is used for two things : refering to an entry from a RDN, and
also keep a track of the hierarchy. If we have an entry which DN is
ou=users,dc=example,dc=com, where dc=exmple,dc=com is the partition's
root, then the RDN index will contain two tuples for the full DN : one
for the entry itself, and one for the suffix. Actually, we don't store
tuples like , but a more complex structure, the ParentIdAndRdn.
The reason is that we may have many entries using the same RDN. For
instance :

entry 1 : cn=jSmith,ou=users,dc=example,dc=com
entry 2 : cn=jSmith,ou=administrators,dc=example,dc=com

That this jSmith is one person or two is irrelevant. The thing is that
we can't associate the RDN cn=jSmith with one single entry, so what we
store is a tuple 

RE: JdbmPartition repair

2016-01-24 Thread Zheng, Kai
Thanks Emmanuel for the sync and sharing. The approach looks pretty good, and 
is often seen in many mature products. The repair process is triggered when 
corruption is found while the server is running, or while restarting with a 
specific option? Or the both? If the repairing stuff is not easy to integrate, 
maybe a repair tool like the one Kiran worked out sounds good to use? Or during 
startup time checking the data/index, if not fine then Java system launching 
the tool process for the fixing? Just some thoughts, in case some useful.

I'm not very sure to rewrite JDBM though I know there're plenty of reasons to 
do so, as most software rewritings do. But if we have start with new, 
implementing something like B+ tree, that needs transaction support, I'm 
wondering if we could do it by leveraging already industry proven backend, 
because developing such backend may take long time effort and pretty much of 
resources. I'm wondering if Sqlite could serve the purpose well or not, or how 
it can be wrapped or adapted for the usage here. Again just a quick thought and 
in case somewhat useful.

Regards,
Kai

-Original Message-
From: Emmanuel Lécharny [mailto:elecha...@gmail.com] 
Sent: Sunday, January 24, 2016 9:40 PM
To: Apache Directory Developers List 
Subject: JdbmPartition repair

Hi guys,

we have many users complaining about a corrupted JDBM database. As of today, we 
don't have another solution than telling them to reload their data, which is 
all be comfortable. First because it might take ages (reloading data is very 
slow) and also because they might not have a backup.

Although this is not a frequent scenario, when it happens, it really take down 
any credibility ApacheDS can have.

Here, we all know that Mavibot will be the solution, but until it's available 
with transaction support, we have to propose a tool that restore - of possible 
- the database.

Hopefully, Kiran has worked on a tool that does that : the partition-plumber. 
The idea is to intergrate this tool into ApacheDS in order to allow users to 
restore their database in  a simple way. Here is what I propose :

- first, a way to start ApacheDS in a repair mode. That will drip all the 
indexes, and recreate them based on the master table. It might take some time, 
but it will be way better than any solution, and in any case, will be faster 
than a full reload, consider that we will bypass many checks. I suggest an 
option : apacheds -repair. When the server is started with that option, the 
server will restart after having cleaned up the database
- the way to implement it is to add a method in the Partition interface
: repair(). Not all the partition will need it, so only JdbmPartition will 
actually iumplement it.
- that method will simply delete (or copy, if we want a backup) all the 
existing indexes (system and users). We then will recreate the indexes based on 
the master table content. There is still a remote risk that the master table 
can be corrupted, but it's unlikely, or at least very rare.
Actually, the Rdn index is the one which get corrupted most of the time, 
because it get updated many times for each addition, move, rename or delete 
operations.

I'm currently working on that, and it should be done fast enough (say, in less 
than a week, or even quicker if I have enough time this sundy and at night).

The next step, and I'm also working on that, is to finish Mavibot. The problem 
is that it's a complex piece of code, and it's hard to work on it when I just 
have a couple of hours on evening or during the week-end.
I'm sorry for that. But we eventually will get it ready !


Thanks !



Re: JdbmPartition repair

2016-01-24 Thread Emmanuel Lécharny
Le 24/01/16 15:07, Shawn McKinney a écrit :
> Emmanuel, thanks for keeping us informed.  I agree that corruption of data is 
> a show stopper in terms of a product’s viability.  Can we recreate this issue 
> or is it intermittent?  How can we help?

It's hard to reproduce the issue, as it's really depending on many
random conditions...