On 8/30/09, Jeffrey J. Kosowsky <backu...@kosowsky.org> wrote:
> Les Mikesell wrote at about 14:26:47 -0500 on Friday, August 28, 2009:
>  > Jim Wilcoxson wrote:
>  > > Michael - I have a new LInux/FreeBSD backup program, HashBackup, in
>  > > beta that I believe will handle a large backuppc server.  In tests, it
>  > > will backup a single directory with 15M (empty) files/hardlinks, with
>  > > 32000 hard links to each file, and can do the initial and  incremental
>  > > backups on this directory in about 45 minutes on a 2005 AMD box with
>  > > 1GB of memory.
>  > >
>  > > HashBackup can also send backups offsite via FTP, ssh accounts, or to
>  > > Amazon S3.  I'd be very interested in feedback if anyone would like to
>  > > try it on their BackupPC server.
>  > >
>  > > The beta site is:
>  > >
>  > > http://sites.google.com/site/hashbackup
>  > >
>  > > Of course, you're welcome to contact me via email with questions.
>  >
>  > What kind of speed would you expect from this on real files?  I let it
>  > run about 20 hours and it had only made it halfway through a pool of
>  > around 600 gigs (where an image copy of the partition takes a bit over 2
>  > hours).   Should incrementals be faster if it ever makes it though the
>  > first run?
>  >
> I would be interested in knowing more about how hashbackup works.
> Earlier Holger and I (and perhaps others) had a thread about how to
> use the special structure of the Backuppc pool and pc directories to
> speed backups. In particular we know that all the relevant inodes
> (other than zero length files and directory entries) occur exactly
> once in the pool tree. Similarly, every non-zero length file in the pc
> directory (other than the log files and info files at the top level)
> corresponds to exactly one entry in the pool directory. Also, for
> incrementals, we know that inodes in general don't change except for
> the limited case of chain renumbering which itself could be
> potentially tracked.
> If hashbackup doesn't use this special structure to its advantage then
> I would indeed expect it to be substantially slower than a simple
> low-level filesystem copy. On the other hand, if the structure is used
> to advantage then a copy could conceivably be done with limited
> overhead and roughly speaking with O(n) or at most O(n log n) scaling.

I did some reading today about BackupPC's storage layout and design.
I haven't finished yet, but one thing stuck out:

"BackupPC_link reads the NewFileList written by BackupPC_dump and
inspects each new file in the backup."

To speed up incrementals, HashBackup could make use the NewFileList.
Right now, HB is a general-purpose backup program: it doesn't use any
application-specific knowledge or data.  In the version 1 series, I
plan to make use of application-specific knowledge to backup
databases, virtual machines, mail servers, and so on.  Reading the
NewFileList might be a way to speed up an incremental backup of the
BackupPC pool, though incremental scans are fairly quick already.

Another thing about BackupPC is that by my reading, new files are
first written to the PC area, then pool links are created by
BackupPC_link.  This suggests that backing up the pool last might
improve performance, because it is likely to be more fragmented.
Right now, HB will backup cpool first, then pc, then pool, in that
order.  It might be better to backup pc first, then cpool and pool.
I'm not sure how much of a difference it would make, if any, because
it's hard to predict disk layouts in any filesystem.


Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
BackupPC-users mailing list
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to