Hi,
Please excuse me if I am using this wrong, in all my years in IT, it seems this
is the first time I have used a mailing list for support. (I'm usually pretty
good at the whole RTFM thing)
We have a backup box (FC6) that is running backups from a lot of windows
servers using rSync.
We are not running the latest version of BackupPC. I am reluctant to update
this unless I have a good idea that it's going to help since we have a lot of
automated scripts that manage the BackupPC config files and we will need to
review them all. If this is the route we need to go then no problem but as I
understand it, my problem is specifically related to rSync. (please correct me
if I am wrong.)
We have been running this for well over a year now (maybe a few years, my
memory fails me) and iirc this problem has only started showing up over the
past 6-12 months. We do not have any automatic updates on the BackupPC box so
nothing there should have really changed.
Out backup box resides on what we call our CORE network. The servers it backs
up are all on remote network which are connected to the CORE network with
VPN's. (VPN's are running over DSL) Backups do take a long time to run (~10
hours or so) due to the amount of data.
The problem we are seeing is that Backups are randomly failing.
The log file on BackupPC showing something like this:
Connected to xxx.xxx.xxx.xxx:873, remote version 29
Negotiated protocol version 26
Connected to module kale-susl
Sending args: --server --sender --numeric-ids --perms --owner --group -D
--links --times --block-size=2048 --recursive --ignore-times . .
Xfer PIDs are now 5220
[ skipped 971 lines ]
Read EOF: Connection reset by peer
Tried again: got 0 bytes
finish: removing in-process file Data/Apps/GoldMine/GMBase/ScriptsW.MDX
Child is aborting
Parent read EOF from child: fatal error!
Done: 923 files, 3041217353 bytes
Got fatal error during xfer (Child exited prematurely)
Backup aborted (Child exited prematurely)
The log on the windows server is:
2008/11/18 17:46:05 [3252] connect from UNKNOWN (xxx.xxx.xxx.xxx)
2008/11/18 17:46:05 [3252] rsync on . from [EMAIL PROTECTED] (xxx.xxx.xxx.xxx)
2008/11/18 17:46:05 [3252] building file list
2008/11/18 18:03:14 [3252] rsync: writefd_unbuffered failed to write 4092 bytes
[sender]: Connection reset by peer (104)
2008/11/18 18:03:14 [3252] rsync error: error in rsync protocol data stream
(code 12) at /home/lapo/packaging/tmp/rsync-2.6.9/io.c(1122) [sender=2.6.9]
I have been trying to work this out for month or two now.
The problems seem to be random, but more common on specific servers.
There is nothing special about these specific servers - they seem just random
but persistant.
Originally, we were running the recommended rSync package for BackupPC.
After looking into the problem over the past month, I have seen a lot of posts
suggesting there this was a common problem with a particular build of rSync.
I have updated rSync on the backupPC box, "rpm -q rsync" currently replies...
rsync-2.6.9-5.fc8 (yes, the only updated rpm i could find was an fc8 one)
On a few select servers (including the one that generated the above logs) I
setup cygwin directly and added rSync to it with the installer wizard.
I selected rSync 2.6.9 rather than 3.x.x as i assumed this would b required for
compatibility.
These seem to be the only recommendations I can find for fixing this problem.
(updating rSync)
Sadly, it has not helped me so far.
The connection between the BackupPC server and the example server used for the
above logs is VPN like the rest of the servers but this server is local and the
VPN operates over local Ethernet links. (ie. Stable links.)
I have tried and tried to verify as much as I can that there are no network/VPN
dropouts at the times that this is failing and im pretty sure there are not.
It sometimes fails within 3 minutes of the job starting, other times after
hours. I know I have had a remote desktop session open to the server and been
actively using it at the time it failed and I noticed absolutely no disturbance
in my RD session. (which you would expect at least a short pause if there was
a brief disruption to the connection.)
I am at a loss and I am really hoping that someone will be able to show me a
way to further my research into what is causing this problem so that I can
hopefully isolate the issue and resolve it.
I have said above that this seems to be most prominent on specific servers
(about 7 of them) but it IS happening occasionally on all of our servers.
We do get drops on our VPN's from time to time but we have a monitoring system
in place that alerts us immediately about this.
These drops probably average at about 1 per every 3 months, per VPN. The rSync
fails are daily, and usually several per day.
Any help would be very much appreciated.
Kind Regards,
James Sefton
Phase 5 Communications Ltd. (UK)
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/