I let the most recent backup 'finish' on its own. It becomes a partial
backup in the host backup summary page with the following error:
Read EOF:
Tried again: got 0 bytes
finish: removing in-process file path/to/filename.ext
Can't write 4 bytes to socket
Child is aborting
Done: 229002 files, 82767774899 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Note, this is well after the default clientTimeout of 72000(secs) and
the in-process file it specified to be removing is only 114MB so I don't
think it was due to hanging on a large file.
Type Filled Level Start Date Duration/mins Age/days
...
full yes 0 12/22 20:00 205.4 49.9
full yes 0 12/29 21:00 136.2 42.8
full yes 0 1/5 21:00 336.4 35.8
incr no 1 1/10 21:00 0.1 30.8
incr no 1 1/11 22:01 0.1 29.8
partial yes 0 1/28 02:00 17136.1 13.6
Looking a little further in the past, the results of the other node's
partial backup are a little bit different:
Remote[1]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at
rsync.c(543) [sender=3.0.7]
Can't write 32780 bytes to socket
Read EOF: Connection reset by peer
Tried again: got 0 bytes
finish: removing in-process file path/to/filename.ext
Child is aborting
Done: 32547 files, 30060082211 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
The file it choked on was only 25MB.
Has nobody else had issues with one of their server's remote backups
never finishing? The odd thing to me is that if I bring the remote
backup server local to take a full backup of the server, subsequent
remote backups of that server succeed (Note, other servers run remote
backups without these issues). Any help here is appreciated. Maybe I'm
just overlooking something simple, but I haven't made any progress on
this issue for some time and I've searched the mailing list for help
without finding a solution.
Could this possibly be an issue with an older version (we're running
BackupPC version 3.0.0)? Could this possibly be related to tcp
segmentation offload (set to 'on' for both backup client and backup
server)? Could it be compatibility issues between rsync versions? The
backup servers are running 2.6.9 protocol version 29 and both of the
clients are running 3.0.7 protocol version 30. AFAIK the newer version
would be backwards compatible, no? Is this setup confusing -- have I
explained the issue well enough?
Scott
On 2/7/2011 2:46 PM, Scott Saunders wrote:
I've got a couple of servers running in a 2 node master/slave cluster
using pacemaker(corosync)/drbd. Like other servers, I've got them
configured to backup to a local BackupPC server as well as a remote (VPN
over T1) BackupPC server (rsync over ssh for both). However, with the
cluster, only the master node has the partition mounted that is to be
backed up, so the backups for the slave node will always fail. This is
ok, but maybe there is a better way to do this? Anyway, to get the
backups started I brought the remote backup server local to take a full
backup (because ~300GB). After a fail over of the master node to the
slave node the slave becomes the new master, gets the partition mounted
and thus has something to backup. The local backups work without a
problem on the new master. The remote backups act like they are working
on the new master, but never actually finish. I've let them go more than
a week, which is well past the default client timeout which has actually
never taken effect with these two boxes. This erroneous behavior
persists when failing back over to the original master. The only way I
get the remote backups going again is to bring the remote server local
for a full backup. Any subsequent remote backups work after this until a
fail over of the cluster occurs. Remote backups for other servers in the
past have been performed without these issues. Any ideas as to why there
are issues with the remote backup in this setup? And what I might try to
get the backups running again on the master node after a fail over
without having to bring the remote server local every time?
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
BackupPC-users mailing list
[email protected]
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
BackupPC-users mailing list
[email protected]
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/