[Bacula-users] Bacula Status

Kern Sibbald Sun, 04 Feb 2007 01:29:59 -0800

Hello,

Sorry, this is much longer than I had planned.  If you have large FileSets, 
please read at least the last 3 or 4 paragraphs.


I would like to say that I am relatively happy with release 2.0.x.  As with 
every major release, there are always "teething" problems in the first few 
versions, but my feeling is that this release has had fewer and far less 
serious bugs than previous versions.

In case you are not aware, version 2.0.2 is the most recent release.  I now 
have a few more bug fixes in the CVS, but I don't consider them serious 
enough to make an additional release just for them.

As you probably know, my two personal topics for the next release are:
1. Bacula GUI, now going by the user preferred name "bat".
2. Performance improvements.

1. GUI
bat is progressing nicely, but a bit slower than I expected due to item #2 
(see below).  It now communicates with the Director, and the GUI interface is 
all defined using designer (GUI layout program) forms.  Though my 
implementation is not very elegant, I've also managed to implement the 
console and a dummy restore "page" with designer in separate subdirectories.
The restore page does nothing, but it does put up an interface that Eric sent 
me, which is the same as the brestore GUI. 

Having the pages implemented with designer and in separate subdirectories will 
vastly simplify adding new pages (i.e. functionality) and allowing multiple 
developers to work at the same time.

I'm currently adding code to obtain the default values from Bacula (jobs, 
pools, storage, ...) so that it can display dialogs similar to the gnome 
console, where the dialog "knows" which jobs can be selected, ...

2. Performance
I have planed on implementing the following things:
 - Immediate disconnect of the FD from the SD after sending all
   files to the SD.  This will permit Laptops to send the data, then 
   disconnect, even if there is spooling or attribute insertion to be
   done.
   
   This is now implemented in the CVS.

- Database performance improvements in several different areas,
  the most important being faster insertion of attributes.  Eric and
  Marc are working on this, and have submitted a working patch
  that is not yet integrated, but that gives significant speed improvements
  especially for PostgreSQL

  Another area is faster pruning, for which I have a patch, but it is
  not yet tested.

- Transmitting attributes to the Director in a separate thread while
   spooling/despooling the data.  This remains to be started.

- Improving the performance of building the in-memory restore tree.
  You probably don't know that in 2005 (yea two years ago), I wrote
   a red-black binary tree class for Bacula with the intention of using it
   for the in-memory restore tree file lists.  I completed the code, but 
   never integrated it into the tree routines (if I remember correctly, this
   was because the tree traversal routines were hand crafted linked lists)
   Since then, I have converted the tree lists to be Bacula dlist classes,
   (doubly linked list) with a "fake" binary sort, which improved performance
   significantly.   The red-back binary trees remained unused awaiting 
   integration.

  An amazing thing recently happened.  Rudolf Cejka, being hit by directories
  with lots of files, implemented AA binary tree routines that he calls tlist
  to replace the Bacula dlist routines in the restore code.  It turns out that
  AA trees are a simplified form of red-black trees that give similar
  performance, but not quite as consistent as red-black trees (AA trees
  remain better balanced than rb trees, but that costs a bit more but 
  speeds up searches).

  Rudolf needed to make only trivial changes
  to something like 5 lines of code to integrate his tlist routines.   I would 
  like to integrate his tlist code since AA tree handling is much simpler than
  RB trees.  However, while we are working out the licensing issues, I
  corrected one bug that Rudolf found in my RB tree routines, and one other
  minor design change and integrated them in the CVS HEAD.

  Last night I did some performance testing with the new RB binary tree 
  code in restore.  I was hoping for a 10 times improvement.  My test was
  rather stupid -- I created a directory containing two subdirectories, one
  has 419,549 files (with simple names like a.0 - a.9999, ab.xx ...) and the
  second directory has the same number of files with the same names.
  This is not really representative of what one would have on most systems
  (though it may simulate certain mailbox directories).  So the two  
  subdirectories have approx 840,000 files.

  I then backed up the directory containing the two subdirectories (SQLite2
  DB) with a full save and did a restore, but stopped the process once all
  data was loaded into the in-memory tree.  I.e.
  restore
  5  (current)
  1  (select client)
  quit  (quit after in-memory tree is built)

  Now the amazing part:

  For Bacula 2.0.2, which uses the dlist routines, it took 58 minutes to load
  the in-memory tree (including the time for SQLite to lookup the records).

  For Bacula 2.1.2, which used my rblist routines, it took 10.05 seconds to
  load the in-memory tree.

  A speed up 513 times.  I certainly expected an improvement, but not that
  much.  

At this point, I would appreciate it if some of you could pull down the CVS 
code (if you have problems with that let me know, and I will post a .tar.gz 
file to my web site) and test it with some large number of files to restore 
using real data.  Several of you have FileSets of over 1,000,000 files, and I 
will be very interested to see what this code produces.  I look forward to 
your feedback.

If the code proves stable (it passes all my file based regression scripts), I 
will probably release it in 2.0.3 or 2.0.4.

Many thanks to Rudolf for showing me how simple it was to integrate the RB 
tree code.

Best regards,

Kern

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

[Bacula-users] Bacula Status

Reply via email to