Hi,

Please excuse me if I am using this wrong, in all my years in IT, it seems this 
is the first time I have used a mailing list for support.  (I'm usually pretty 
good at the whole RTFM thing)

We have a backup box (FC6) that is running backups from a lot of windows 
servers using rSync.
We are not running the latest version of BackupPC.  I am reluctant to update 
this unless I have a good idea that it's going to help since we have a lot of 
automated scripts that manage the BackupPC config files and we will need to 
review them all.  If this is the route we need to go then no problem but as I 
understand it, my problem is specifically related to rSync.  (please correct me 
if I am wrong.)

We have been running this for well over a year now (maybe a few years, my 
memory fails me) and iirc this problem has only started showing up over the 
past 6-12 months.  We do not have any automatic updates on the BackupPC box so 
nothing there should have really changed.

Out backup box resides on what we call our CORE network.  The servers it backs 
up are all on remote network which are connected to the CORE network with 
VPN's.  (VPN's are running over DSL)  Backups do take a long time to run (~10 
hours or so) due to the amount of data.

The problem we are seeing is that Backups are randomly failing.
The log file on BackupPC showing something like this:

Connected to xxx.xxx.xxx.xxx:873, remote version 29
Negotiated protocol version 26
Connected to module kale-susl
Sending args: --server --sender --numeric-ids --perms --owner --group -D 
--links --times --block-size=2048 --recursive --ignore-times . .
Xfer PIDs are now 5220
[ skipped 971 lines ]
Read EOF: Connection reset by peer
Tried again: got 0 bytes
finish: removing in-process file Data/Apps/GoldMine/GMBase/ScriptsW.MDX
Child is aborting
Parent read EOF from child: fatal error!
Done: 923 files, 3041217353 bytes
Got fatal error during xfer (Child exited prematurely)
Backup aborted (Child exited prematurely)

The log on the windows server is:


2008/11/18 17:46:05 [3252] connect from UNKNOWN (xxx.xxx.xxx.xxx)

2008/11/18 17:46:05 [3252] rsync on . from [EMAIL PROTECTED] (xxx.xxx.xxx.xxx)

2008/11/18 17:46:05 [3252] building file list

2008/11/18 18:03:14 [3252] rsync: writefd_unbuffered failed to write 4092 bytes 
[sender]: Connection reset by peer (104)

2008/11/18 18:03:14 [3252] rsync error: error in rsync protocol data stream 
(code 12) at /home/lapo/packaging/tmp/rsync-2.6.9/io.c(1122) [sender=2.6.9]


I have been trying to work this out for month or two now.
The problems seem to be random, but more common on specific servers.
There is nothing special about these specific servers - they seem just random 
but persistant.

Originally, we were running the recommended rSync package for BackupPC.

After looking into the problem over the past month, I have seen a lot of posts 
suggesting there this was a common problem with a particular build of rSync.

I have updated rSync on the backupPC box, "rpm -q rsync" currently replies...

rsync-2.6.9-5.fc8     (yes, the only updated rpm i could find was an fc8 one)

On a few select servers (including the one that generated the above logs) I 
setup cygwin directly and added rSync to it with the installer wizard.
I selected rSync 2.6.9 rather than 3.x.x as i assumed this would b required for 
compatibility.

These seem to be the only recommendations I can find for fixing this problem. 
(updating rSync)
Sadly, it has not helped me so far.

The connection between the BackupPC server and the example server used for the 
above logs is VPN like the rest of the servers but this server is local and the 
VPN operates over local Ethernet links. (ie. Stable links.)

I have tried and tried to verify as much as I can that there are no network/VPN 
dropouts at the times that this is failing and im pretty sure there are not.  
It sometimes fails within 3 minutes of the job starting, other times after 
hours.  I know I have had a remote desktop session open to the server and been 
actively using it at the time it failed and I noticed absolutely no disturbance 
in my RD session.  (which you would expect at least a short pause if there was 
a brief disruption to the connection.)

I am at a loss and I am really hoping that someone will be able to show me a 
way to further my research into what is causing this problem so that I can 
hopefully isolate the issue and resolve it.

I have said above that this seems to be most prominent on specific servers 
(about 7 of them) but it IS happening occasionally on all of our servers.
We do get drops on our VPN's from time to time but we have a monitoring system 
in place that alerts us immediately about this.
These drops probably average at about 1 per every 3 months, per VPN.  The rSync 
fails are daily, and usually several per day.

Any help would be very much appreciated.

Kind Regards,

James Sefton
Phase 5 Communications Ltd. (UK)
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to