Re: back-mdb status

Howard Chu Thu, 15 Sep 2011 00:47:01 -0700

Howard Chu wrote:

A couple new results for back-mdb as of today.

                        first           second          slapd size
back-hdb, 10K cache     3m6.906s        1m39.835s       7.3GB
back-hdb, 5M cache      3m12.596s       0m10.984s       46.8GB
back-mdb                0m19.420s       0m16.625s       7.0GB

back-mdb                0m15.041s       0m12.356s       7.8GB


Next, the time to execute multiple instances of this search was measured,
using 2, 4, 8, and 16 ldapsearch instances running concurrently.
                                average result time
                2               4               8               16
back-hdb, 5M    0m14.147s       0m17.384s       0m45.665s       17m15.114s

>> back-mdb       0m16.701s       0m16.688s       0m16.621        0m16.955s
> back-mdb   0m12.009s       0m11.978s       0m12.048s       0m12.506s

This result for back-hdb just didn't make any sense. Going back, I discoveredthat I'd made a newbie mistake - my slapd was using the libdb-4.7.so thatDebian bundled, instead of the one I had built in /usr/local/lib. Apparentlymy LD_LIBRARY_PATH setting that I usually have in my .profile was commentedout when I was working on some other stuff.

While loading a 5 million entry DB for SLAMD testing, I went back andrechecked these results and got much more reasonable numbers for hdb. Mostlikely the main difference is that Debian builds BDB with its defaultconfiguration for mutexes, which is a hybrid that begins with a spinlock andeventually falls back to a pthread mutex. Spinlocks are nice and fast, butonly for a small number of processors. Since they use atomic instructions thatare meant to lock the memory bus, the coherency traffic they generate is quiteheavy, and it increases geometrically with the number of processors involved.

I always build BDB with an explicit --with-mutex=POSIX/pthreads to avoid thespinlock code. Linux futexes are decently fast, and scale much better as thenumber of processors goes up.

With slapd linked against my build of BDB 4.7, and using the 5 million entrydatabase instead of the 3.2M entry database I used before, the numbers makemuch more sense.


slapadd -q times
back-hdb        real 66m09.831s  user 115m52.374s  sys 5m15.860s
back-mdb        real 29m33.212s  user  22m21.264s  sys 7m11.851s

ldapsearch scanning the entire DB
                first           second          slapd size      DB size
back-hdb, 5M    4m15.395s       0m16.204s       26GB            15.6GB
back-mdb        0m14.725s       0m10.807s       10GB            12.8GB

multiple concurrent scans
                                average result time
                2               4               8               16
back-hdb, 5M    0m24.617s       0m32.171s       1m04.817s       3m04.464s
back-mdb        0m10.789s       0m10.842s       0m10.931s       0m12.023s

You can see that up to 4 searches, probably the BDB spinlock would have beenfaster. Above that, you need to get rid of the spinlocks. If I had realized Iwas using the default BDB build I could of course have configured the BDBenvironment with set_tas_spins in the DB_CONFIG file. We used to always setthis to 1, overriding the BDB default of (50 * number of CPUs), before we justdecided to omit them entirely at configure time.

But I think this also illustrates another facet of MDB - reducing configcomplexity, so there's a much smaller range of mistakes that can be made.

re: the slapd process size - in my original test I configured a 32GB BDBenvironment cache. This particular LDIF only needed an 8GB cache, so that'sthe size I used this time around. The 47GB size reported was the Virtual sizeof the process, not the Resident size. That was also a mistake, all of theother numbers are Resident size.

When contemplating the design of MDB I had originally estimated that we couldsave somewhere around 2-3x the RAM compared to BDB. With slapd running 2.7xlarger with BDB than MDB on this test DB, that estimate has been proved to becorrect.

The MVCC approach has also proved its value, with no bottlenecks for readersand response time essentially flat as number of CPUs scales up.

There are still problem areas that need work, but it looks like we're on theright track, and what started out as possibly-a-good-idea is delivering on thepromise.


--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: back-mdb status

Reply via email to