Mark Coetser <[email protected]> wrote on 09/17/2012 03:08:49 AM:
> Hi
>
> backuppc 3.1.0-9.1
> rsync 3.0.7-2
>
> OK I have a fairly decent spec backup server with 2 gigabit e1000 nics
> bonned together and running in bond mode 0 all working 100%. If I run
> plain rsync between the backup server and a backup client both connected
> on gigabit lan I can get sync speeds of +/- 300mbit/s but using backuppc
> and rsync the max speed I get is 20mbit and the backup is taking
> forever. Currently I have a full backup thats been running for 3461:23
> minutes where as the normal rsync would have taken a few hours to
complete.
>
> The data is users maildirs and its about 2.6Tb and I am not using rsync
> over ssh, I have the rsync daemon running on the client and have setup
> the .pl as follows.
I have several very similar configurations. Here's an example:
Atom D510 (1.66GHz x 2 Cores)
4GB RAM
CentOS 6 64-bit
4 x 2TB Seagate SATA drives in RAID-6 configuration
I get almost 200 MB/s transfer rate from this array...
2 x Intel e1000 NICs in bonded mode.
In the past, the biggest server I backed up was around 1TB. Personally, I
prefer to keep each server image under 1TB if I can help it. Everything
is easiser that way: not just file-level backups with BackupPC but image
level as well, and there's less downtime (or less time with noticaeable
slowdown if it is up) when having to take such images.
With servers <1TB, rsync-based BackupPC full backups are slow, but get
done in a reasonable amount of time: 8-12 hours, and I can live with
that. It is usually kind of beneficial: if I start a backup in the
middle of the day it does not hammer the client I'm backing up noticeably.
(Lemons, lemonade... :) )
However, I have recently inherited a server that is >3TB big, and 97%
full, too! Backups of that system take 3.5 *days* to complete. I *can't*
live with that. I need better performance.
I was going to write a very similar e-mail to what you wrote as well! So
maybe we can work this together.
All of your configuration looks pretty straightforward to me (except the
mounts: I'm not sure why you have them if you're using rsyncd). Mine are
quite similar.
No matter the size of the system, I seem to top out at about 50GB/hour for
full backups. Here is a perfectly typical example:
Full Backup: 769.3 minutes for 675677.3MB of data. That works out to be
878MB/min, or about 15MB/s. For a system with an array that can move
200MB/s, and a network system that can move at least 70MB/s.
Now, let's look at the "big" server:
Full backup: 5502.8 minutes for 2434613.6MB of data. That's even worse:
442MB/min. And 5502.8 minutes is three and a half *DAYS*.
First, a quick look at the client will show that we can eliminate it
completely. I have checked the performance of several of them while a
backup is running. The client is not CPU or I/O or memory bound
whatsoever. Here is a typical example: a Windows Server 2008. Task
Manager shows minimal everything: between 0% and 20% CPU usage (with most
time below 5%), and more than 1GB of 2GB RAM free (with 1300MB of cached
memory). Network utilization is absolutely flatlined! A quick sanity
check of the server's physical drive lights show that the drive activity
is in brief fits and starts. This system is *clearly* not being taxed. By
the way, this contrasts to the beginning of the backup, when rsync is
building the file list. The rsync daemon's CPU usage bounces around with
peaks over 70%, and the drives are blinking constantly during this
process--so the server is perfectly capable of doing something when it's
asked to!
The server side, though, shows something completely different. Here is a
few lines from dstat:
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
33 2 64 1 0 0| 22M 47k| 0 0 | 0 0 |1711 402
43 3 49 6 0 0| 40M 188k| 35k 1504B| 0 0 |2253 632
45 4 49 1 0 1| 50M 36k| 38k 1056B| 0 0 |2660 909
46 4 50 0 0 0| 46M 0 | 55k 1754B| 0 0 |2540 622
45 4 50 1 0 0| 45M 12k| 120B 314B| 0 0 |2494 708
43 3 50 3 0 0| 42M 0 | 77k 1584B| 0 0 |2613 958
41 4 47 8 0 0| 50M 268k| 449B 356B| 0 0 |2333 704
46 3 50 1 0 0| 42M 36k| 26k 1122B| 0 0 |2583 771
45 4 50 1 0 0| 40M 0 | 30k 726B| 0 0 |2499 681
It looks like everything is under-utilized. For example, I'm getting a
measly 40-50MB of read performance from my array of four drives, and
*nothing* is going out over the network. My physical drive and network
lights echo this: they are *not* busy. My interrupts are certainly
manageable and context switches are very low. Even my CPU numbers look
tremendous: nearly no time in wait, and about 50% CPU idle!
Ah, but there's a problem with that. This is a dual-core system. Any
time you see a dual-core system that is stuck at 50% CPU utilization, you
can bet big that you have a single process that is using 100% of the CPU
of a single core, and the other core is sitting there idle. That's
exactly what's happening here.
Notice what top shows us:
top - 13:21:27 up 49 min, 1 user, load average: 2.07, 1.85, 1.67
Tasks: 167 total, 2 running, 165 sleeping, 0 stopped, 0 zombie
Cpu(s): 43.7%us, 3.6%sy, 0.0%ni, 50.5%id, 2.1%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 3924444k total, 3774644k used, 149800k free, 9640k buffers
Swap: 0k total, 0k used, 0k free, 3239600k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1731 backuppc 20 0 357m 209m 1192 R 95.1 5.5 35:58.08 BackupPC_dump
1679 backuppc 20 0 360m 211m 1596 D 92.1 5.5 32:54.18 BackupPC_dump
My load average is 2, and you can see those two processes: two instances
of BackupPC_dump. *Each* of them are using 100% of the CPU given to them,
but they're both using the *same* CPU (core), which is why I have 50%
idle!
Mark Coetser, can you see what top shows for the CPU utilization for your
system while doing a backup? Don't just look at the single "idle" or
"user" numbers: look at each BackupPC process as well, and let us know
what they are--and how many physical (and hyper-threaded) cores you have.
Additional info can be found in /proc/cpuinfo if you don't know the
answers.
To everyone: is there a way to get Perl to allow each of these items to
run on *different* processes? From my quick Google it seems that the
processes must be forked using Perl modules designed for this purpose. At
the moment, this is beyond my capability. Am I missing an easier way to
do this?
And one more request: for those of you out there using rsync, can you
give me some examples where you are getting faster numbers? Let's say,
full backups of 100GB hosts in roughly 30-35 minutes, or 500GB hosts in
two or three hours? That's about four times faster than what I'm seeing,
and would work out to be 50-60MB/s, which seems like a much more realistic
speed. If you are seeing such speed, can you give us an idea of your
hardware configuration, as well as an idea of the CPU utilization you're
seeing during the backups? Also, are you using compression or checksum
caching? If you need help collecting this info, I'd be happy to help you.
To cover a couple of other frequently suggested items, here's what I've
examined to improve this:
Yes, I have noatime. From fstab: UUID=<snipped> /data ext4
defaults,noatime 1 2
Noatime only makes a difference when you are I/O bound--which ideally a
BackupPC server would be. In my case, it made very little difference. I'm
not I/O bound.
I am using EXT4. I have gotten very similar performance with EXT3. Have
not tried XFS or JFS, but would *really* prefer to keep my backups on the
extremely well-known and supported EXT series.
I am using compression on this BackupPC server. Obviously, this may
contribute to the CPU consumption. My old servers did not have
compression, but had terrible VIA C3 single-core processors. And their
backup performance was quite similar. I figured with the Atom D510 I'd be
OK with compression. But maybe not. I'll try to see if I can do some
testing with some smaller hosts without compression and see what happens.
As for checksum caching: As I mentioned, I think the strength of leaving
it off is very valuable. But I look forward to seeing the performance
others are getting and how they compare to see at what performance cost
this protection is coming.
Thank you very much for your help!
Timothy J. Massey
Out of the Box Solutions, Inc.
Creative IT Solutions Made Simple!
http://www.OutOfTheBoxSolutions.com
[email protected]
22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
BackupPC-users mailing list
[email protected]
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/