IMHO, BackupPC could use a combination of index data in the database and files on FS. The database can be anything - sqlite3, mysql, even ODBC-compliant.. That would speed up some checks, I beleive

2006/3/8, David Brown <[EMAIL PROTECTED]>:
On Tue, Mar 07, 2006 at 09:23:36AM -0600, Carl Wilhelm Soderstrom wrote:

> I'm experimenting with an external firewire drive enclosure, and I formatted
> it with 3 different filesystems, then used bonnie++ to generate 10GB of
> sequential data, and 1,024,000 small files between 1000 and 100 bytes in
> size.
>
> I tried it with xfs, reiserfs, and ext3; and contrary to a lot of hype out
> there, ext3 seems to have won the race for random file reads and deletes
> (which is what BackupPC seems to be really heavy on).

Unfortunately, the resultant filesystem has very little resemblance to the
file tree that backuppc writes.  I'm not sure if there is any utility that
creates this kind of tree, and I would argue that backuppc shouldn't be
either, since it is so hard on the filesystem.

Basically, you need to first create a deep tree (like a filesystem), and
then hardlink all of those files into something like a pool, in a very
different order than they were put into the tree.

Then, create another tree, except some of the files should be fresh, and
some should be hardlinks back to the pool (or to the first tree).  Then the
new files should be linked into the pool.

Programs like backuppc are the only thing I know that creates these, and
the performance in a given filesystem of this tree isn't really going to
correlate much to that filesystems performance on any other task.  Most
filesystems optimize assuming that files will tend to be in the directory
that they were created in.  Creating this massive pool of links to files in
diverse places completely breaks these optimizations.

Honestly, you probably won't ever find a filesystem that handles the
backuppc pool very well.  I think the solution is to change backuppc to not
create multiple trees, but to store the filesystem tree in some kind of
database, and just store the files themselves in the pool.  Database
engines are optimized to be able to handle multiple indexing into the data,
whereas filesystems are not (and aren't likely to be, either).

As far as implementation of this pool-only storage, it is important to
create the file in the proper directory first, which means the has must be
known before it can be written.  Of course, if there is a database, there
is no reason to make the filenames part of the hash, and not just
sequential integers, using a unique key in the database table.

Dave


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/



--
Alexey Parshin,
http://www.sptk.net

Reply via email to