RE: Official Apache Directory Project Proposal Submission

Alex Karasulu Thu, 11 Sep 2003 19:07:49 -0700

Wow you guys are getting real deep.  Perhaps we should take these technical
conversations over to the ldapd dev list for now.  This place is for
incubator stuff not tech stuff so we'll continue at ldapd-devel after this
bridge email.

BTW I think BerkeleyDB was faster than the jdbc based implementation but not
jump out at you faster like we thought.  The backend design using bdb is
pretty much the same as jdbm.  The difference was the JNI performance
degradation - yes all that copying from crossing the java/c barrier back and
forth slowed us down.  Bdb is great for C but a pure Java implementation
like Jdbm is best.

Once we tried Jdbm we were very happy with the results.  Performance went
through the roof.  And btw an RDBMS is a hog for an LDAP server backing
store.  All the SQL overhead makes it so.  I have to agree with the OpenLDAP
folks - they know what they're talking about.

Alex

-----Original Message-----
From: Robb Penoyer [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 11, 2003 8:57 PM
To: [EMAIL PROTECTED]
Subject: Re: Official Apache Directory Project Proposal Submission

Hi Jim,

The original pre-release versions of LDAPd were implemented with a 
BerkeleyDB backend, with custom index management etc, much like the 
openldap articale you reference. Those early designs, did have a contracted 
backend store interface defined (thank-you Mr magic - Alex), and indeed 
there is a basic SQL backend implementation in alpha 0.7  Taking into 
perspective the second reference to IBM you provided. We measured the 
performance of a very strongly tuned Oracle database against the BerkleyDB 
implementation and found virtually no performance difference. Albeit, these 
were not formal tests, but they were exactly the same. (hardware, test 
cases etc).

We moved to a default backend based upon the JDBM implementation Alex 
referenced earlier. The performance improvement was staggering, to say the 
least. The nature of LDAP is for high performance read operations, a pure 
indexing mechanism turned out to be ideal. With JDBM everything amounts to 
an index in relational terms. This would absolutely become a problem for a 
standard transactional application, primarily because the biggest struggle 
for our JDBM implementation is how to handle duplicate entries - it costs 
us performance.

Here is where I see it fall out:
   RDBMS : hard to beat for pure transactional power. If you have to store, 
retrieve, update and delete, all the operations will generally work within 
the same performance envelope.

   OODBMS: hard to beat for pure synergy between logic layer and storage, I 
admit a personal weakness with these types of databases.

   Heriarchical Databases: great for maintaining a complete picture of 
complex models (many to many parent child relationships) This is likely 
where stored procedures prove important in RDBMS technology (as you pointed 
out).

I'll say it again, LDAP is NOT a database, it just needs one. That's what 
IBM was saying. the openldap folks retrofitted BerkeleyDB. We went a 
different way. The only real distinction on this front, is that we 
recognixed very early, that the nature of the backend store will dictate 
the performance of your LDAP server "in the context it is designed for". 
Meaning, it is likely that an RDBMS backend associated with LDAP will give 
a better overall performance for a heavily modified directory information 
tree, but will never outperform the raw search capabilities of a purely 
indexed backend for searches. So we leave it up to the implementors. At one 
time we spoke about actually measuring all this stuff and providing 
guidelines - it's something we would love to get work going on --- hint.

Turning over to how LDAP could impact database technologies - let's bring 
it up a layer above the data store. A solid LDAP implementation to protocol 
compliance requires some truly industrial strength mechanisms (beyond data 
storage): schema management, access controls, protocol encoding/decoding, 
search optimizers, providers and on and on and on. If this sounds similar 
to a database implementation, stop wondering. The core of an LDAP server 
performs the same basic functions as a database management system.

What if you added the missing pieces, for example a SQL parser, a JDBC 
driver, a transaction manager, stored procs - but kept the LDAP protocol 
requirements of  forwarding and authoritative areas. You are now in a 
scenario where one LDAP server is representing a database. What kind of 
database, the one you chose, Berkeley, Oracle, SQL Server (yuck), DB2, JDBM.

Now add more LDAP protocol stuff, replication and referrals. You can now 
had a set of LDAP servers acting in concert as 1 database. What storage 
mechanism, how about one RDBMS, one JDBM, and one oodbms. Each configured 
with a schema designed specifically to take advantage of their performance 
characteristics. You now have the best of all worlds - one interface. 
pretty cool huh?

Robb

At 01:49 AM 9/12/2003 +0200, you wrote:
>Alex and Brian,
>
>Regarding the relationship between RDBMS and LDAP...
>
>I believe this document says why RDBMS is wrong for LDAP:
>
>http://www.openldap.org/faq/data/cache/378.html
>
>On the other hand IBM have implemented LDAP in DB2. See:
>
>http://www.research.ibm.com/journal/sj/392/shi.html
>
>Since reading that I have got quite carried away designing
>and implementing heirarchical structures in RDBMS's.
>Mainly designing actually, but the demo referenced from
>my signature below implements heirarchical categories
>of contacts using the same principles.
>
>I understand this project is not just about the protocol but
>about the directory. It seems to me that it is very valuable
>to have a single DBMS that supports both relational and
>heirarchical structures as efficiently as possible. (In fact
>I would suggest not just heirarchies but directed graphs.
>I.e. a child can have one or more parents.)
>
>If the IBM way (that I adapted) turns out to be one of the
>best (following project design) then one thing that is important
>is that you can efficiently add, remove and traverse nodes
>in a tree represented by lots of small RDB records.
>This becomes important for deep heirarchies. I guess
>stored procedures might help in an standard RDBMS.
>
>I might be interested in getting involved.
>
>Regards,
>
>Jim Wright
>
>--
>Recently completed - Child Brain Injury Trust Admin System
>http://cbitdemo.paneris.org/
>
>Urgently seeking paid work
>Java, Linux, XML and much more.
>http://be.webz.cz/
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Official Apache Directory Project Proposal Submission

Reply via email to