charlesboyo <backuppc-fo...@backupcentral.com> wrote on 08/31/2011 
05:53:43 AM:

> I'm using BackupPC to take daily backups of a maildir totaling 250 
> GB with average file sizes of 500 MB (text mailboxes, these files 
> change everyday).
> Currently, my setup take full backups once a week and incremental 
> backups every day between the full backups. The servers are directly
> connected with a cross-cable, allowing 100 Mbps.

I have a very similar setup with several servers.  They are often 
connected using 100Mb/s just because the clients haven't upgraded to Gb 
switches.  Also, they back up IBM Lotus Domino servers.  In Domino, each 
mail user has their own mail database which is typically Gigabytes big 
(except with this thing called DAMO, but even then they're still hundreds 
of MB big).  This is pretty comparable to your environment, though my 
*total* size is not usually 250GB of just mail data...  I have file 
servers that are bigger, but not mail servers.

(I have some servers that back up Microsoft Exchange servers.  This is 
even worse:  one monolithic file for the *ENTIRE* mailstore.  U G L Y... 
And incrementals *ARE* fulls!  :) )

> However, these backups take about 8 hours to complete, averaging 8 
> Mbps and the BackupPC server is CPU-bound through-out the entire 
> process.

Fulls or incrementals or both?  If truly 90% of your files are changing 
daily, I'm going to assume both.  There will be *very* little difference 
between a full backup and an incremental.

> Thus I have reason to suspect the rsync overhead as being guilty.
> Note that I have disabled hard links, implemented checksum caching, 
> increased the block size to 512 KB and enable --whole-file to no avail.

I have done zero tuning of the rsync command:  I use 100% stock BackupPC 
command line for it.

> 1. since over 90% of the files change every day and "incremental" 
> backups involve transferring the whole file to the BackupPC server, 
> won't it make better sense to just run a full backup everyday?

Incremental backups end up with a whole new file, but when using rsync it 
does not do it by transferring the whole file.  The rsync protocol works 
on sending just the changed parts of the file.  HOWEVER, the whole file is 
read on *BOTH* ends of the connection, so it doesn't save you a *BIT* of 
disk I/O:  it only saves you NETWORK I/O.  Seeing as you have only 100Mb/s 
between them, that will improve performance, but not tremendously 
dramatically, and like you have found it exacts a CPU hit in order to do 
this.

You may find that trading CPU performance for network performance may not 
be a good trade in your case.  Having said that, I run BackupPC on about 
the slowest systems you can actually buy new today:  VIA EPIA EN 1500 
system boards with 512MB RAM.  Terrible performance, but meet my BackupPC 
needs just *fine*.

Hard numbers on the nearest Domino server to me:  60GB total backed up for 
full, 18GB for incremental (this is a DAOS server).  Fulls take about 150 
minutes, incrementals take about 40.  1/4 the data, 1/4 the time.  And 
that's on the miserable hardware I described.

Scaling that up to your sizes, that would take about 600 minutes, or 10 
hours.  So, the 8 hours that you're seeing sounds reasonable.

The number one question I have is:  is this really a problem?  If you have 
a backup window that allows this, I would not worry about it.  If you do 
*not*, then rsync might not be for you.

To address a couple of things said in other replies:

1) Avoiding building a file list is pointless.  It takes my servers just a 
couple of minutes.  It may certainly use RAM, but that is only an issue if 
you have millions of files.  And in that case, simply add more RAM.  I'm a 
glutton for punishment running with 512MB of RAM (and actually, I use 2GB 
in new servers now:  I just like to twist Les' tail!  :) ).

2) Les' point about the format of the files (one monolithic file for each 
mailbox vs. one file per e-mail) is dead on.  That allows 99% of the files 
to remain untouched once they're backed up *once*.  That will *vastly* 
reduce the backup times.  (That DAOS thing does a similar thing for Domino 
by breaking out attachments into individual files, and hashing and pooling 
them in a manner very similarly to a BackupPC pool, BTW.  Before DAOS, my 
fulls and incrementals were indistinguishable, now they're 4:1 size-wise. 
Plus a 50% reduction in total disk usage.  But I digress.)

However, be aware that now you substitute the "my backups are taking a 
long time and don't pool" problem with a "now I have to manage several 
*MILLION* files!" problem.  fsck can become a major issue in that 
case--with 250GB of e-mail, even ls can be a major issue!  Both have 
advantages and disadvantages.  Just be aware that it's not a clear win 
either way.

And you might not have a choice, making the argument moot.


Now, for tar.  Take my information with a grain of salt:  I have *never* 
run tar with BackupPC...

> 2. from Pavel's questions, he observed that BackupPC is unable to 
> recover from interrupted tar transfer. Such interruptions simply 
> cannot happen in my case. Should I switch to tar?

Is that a trick question?  "This cannot happen.  Should I do this?"  Umm. 
No -- GIVEN the conditions you yourself set.  :)

http://en.wikipedia.org/wiki/Tautology_%28logic%29

> And in the 
> unlikely event that the transferred does get interrupted, what 
> mechanisms do I need to implement to resume/recover from the failure?

To repeat another response:  restart the backup...

> 3. What is the recommended process for switching from rsync to tar -
> since the format/attributes are reportedly incompatible? I would 
> like to preserve existing compressed backups as much as possible.

Your old backups should be 100% fine.  They will remain in the pool just 
fine, etc.  I do not believe that files transferred by rsync will pool 
with files transferred by tar (due to the attribute issue you mention); 
however, for you that's a moot point:  90% of your files don't pool, 
anyway.

As an aside, BackupPC (well, the pooling) buys you virtually *nothing* in 
your application.  With a fast enough network connection, rsync buys 
everyone almost *nothing*, too.  You are using two tools that have very 
distinct advantages, but you're using them in an environment that largely 
ignores their advantages.

This is not a *bad* thing.  Every single one of my backup servers is based 
on BackupPC, and all but maybe 2 shares are backed up using rsync.  (The 
only exceptions I can think of are where I'm backing up data on a NAS, and 
I can't or won't run rsyncd on the NAS so I have to use SMB).  Whether 
it's an advantage or disadvantage, that's the setup I use.  I vastly 
prefer consistency over performance.  But I can live with 8 hour backup 
windows.

If you can't, then you may have to make different decisions.  That's the 
fun of being the Administrator! :)

Timothy J. Massey
 
Out of the Box Solutions, Inc. 
Creative IT Solutions Made Simple!
http://www.OutOfTheBoxSolutions.com
tmas...@obscorp.com 
 
22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796 
------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to