On Wednesday 06 June 2007 11:28, Andre Noll wrote: > On 10:40, Kern Sibbald wrote: > > > > So I think it would be _really_ nice to store information about > > > deleted files and directories in the database which would make it > > > possible to get rid of all deleted files and directories automatically > > > during restore. > > > > > > The dar backup tool for example has this feature. Are there any > > > plans to include such a feature also in bacula? > > > > Yes, but no one is currently working on it. There have been a number of > > emails on this subject on the bacula-users' list recently. > > In February you said Robert will be working on this project. Do you > have any pointers to this work? I would be interested to look at the > strategy for implementing this and at the work that has been done so > far, if any.
Robert quit the project, so currently there is no one assigned to it. I would be *extremely* happy to see someone interested in this project. If your offer is for algorithm help, please see Algorithms below. If your offer above includes programming (i.e. C or C++ programmer), and you are interested in working on it, please let me know (either off list or if you wish copying the bacula-devel list) and we can discuss the project. I recommend starting by reading the Developer notes in the Developer's Guide that is on the web site. It will give you a broad overview of developing for the Bacula project. This project and the project to store only one copy of a file (Base project) are closely related because they both require *much* more communication between the Dir and the FD -- essentially the Dir must send the current state as known in the catalog to the client, which can then determine which files to backup. Algorithms: This requires potentially sending a *lot* of data (i.e. millions of filenames and attribute data), which will require hash coding the names for performance reasons. If we want to handle up to 20 million filenames as we are starting to see on some systems, we will probably at some point need a good file paging algorithm. Some years ago, I wrote hasing routines specifically for this, but they have never been used yet, and so I am now looking at bringing them up to date -- in particular adding a Bloom filter to improve performance (I am currently researching Bloom filters). Where I could use a bit of advice is: Now: - Reviewing my hash table code (particularly the hash function) src/lib/htable.h src/lib/htable.c - Proposing how to size a Bloom filter (n bits) and number of hash functions. - Proposing what hash functions to use for the Bloom filter. Later: - Review overall strategy. Since these two projects (de-duplication of files, tracking new and deleted files) are quite hot topics lately, over the next week, I will write up a sort of proposal for implementation outlining my general ideas for how to implement them within the existing Bacula framework (i.e. without too many modifications to the database, ...). Thanks for your interest in this. Best regards, Kern ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users