This is a resend. The original went missing apparently.
On Thu, Oct 25, 2007 at 02:10:12PM -0500, Les Mikesell wrote:
> John Rouillard wrote:
> > 2007-10-25 10:54:00 Aborting backup up after signal PIPE
> > 2007-10-25 10:54:01 Got fatal error during xfer (aborted by signal=PIPE)
>
> This means the problem is on the other end of the link - or at least
> that the ssh driving it exited.
Hmm, ok, what happens if I add -v to the remote rsync args? Will the
extra output in the rsync stream screw things up? Maybe I can use:
rsync .... -v ... 2> /tmp/rsync.log
to get debugging at the rsync level without sending the debugging
output to BackupPC.
I'll also try adding -o ServerAliveInterval=30 and -vvv to see if that
improves the reliability of the ssh session and generates output,
since -v sends debugging output to stderr and I can grab that with:
ssh -v concord 2> /tmp/log
Does BackupPC need to use stderr to the remote system for anything?
> > lastlog got digests fdb1c560d9ba822ab4ffa635d4b5f67f vs
> > fdb1c560d9ba822ab4ffa635d4b5f67f
> > create 400 0/0 65700 lastlog
> > Can't write 33932 bytes to socket
> > Sending csums, cnt = 16, phase = 1
> > Read EOF: Connection reset by peer
>
> The process on the remote side is gone at this point.
I'll buy that, but I expect some death message. A dying gasp if you
will.
> >If I am reading this right, the last file handled before the signal is
> >/var/log/lastlog which is << 2GB (65K approx). When the signal occurs,
> >I guess /var/log/ldap is the file in progress.
> >
> >The ldap file is 22GB in size:
> >
> > [EMAIL PROTECTED] log]$ ls -l ldap
> > -rw------- 1 root root 22978928497 Oct 25 18:46 ldap
> >
> >Could the size be the issue?
>
> Yes, it sounds very likely that whatever is sending the file on the
> remote side can't handle files larger than 2 gigs.
I just did an "sudo rsync -e ssh ops02.mht1:/var/log/ldap ." and it
completed without a problem. All 22 GB of the file transfered fine
8-(. However now I have the same sigpipe issue on another host, that
has been backing up fine (3 full and 3 incremental) until now:
incr backup started back to 2007-10-25 17:28:40 (backup #6) for
directory /var/spool/nagios
Running: /usr/bin/ssh -q -x -l backup ops01.mht1.renesys.com sudo
/usr/bin/rsync --server --sender --numeric-ids --perms --owner --group
-D --links --hard-links --times --block-size=2048 --recursive
--one-file-system --checksum-seed=32761 . /var/spool/nagios/
Xfer PIDs are now 24197
Rsync command pid is 24197
Got remote protocol 28
Negotiated protocol version 28
Checksum caching enabled (checksumSeed = 32761)
Got checksumSeed 0x7ff9
Got file list: 11 entries
Child PID is 24213
Xfer PIDs are now 24197,24213
Sending csums, cnt = 11, phase = 0
create d2775 306/200 4096 .
create p 660 306/521 0 nagios.cmd
create d2775 306/200 4096 tmp
tmp/host-perfdata got digests 46a0099d178d1b97aa39e454ae083d3f vs
46a0099d178d1b97aa39e454ae083d3f
Skipping tmp/service-perfdata.0000.bz2 (same attr)
Skipping tmp/service-perfdata.0001.gz (same attr)
Skipping tmp/service-perfdata.4.gz (same attr)
Skipping tmp/service-perfdata.5.gz (same attr)
Sending csums, cnt = 0, phase = 1
create 664 306/200 916165956 tmp/host-perfdata
tmp/nagios_daemon_pids got digests 7bfc0cffe0f114dd6eea7514c44422cd vs
7bfc0cffe0f114dd6eea7514c44422cd
create 664 306/200 6 tmp/nagios_daemon_pids
tmp/old_list got digests 0e258a7527fe053eea032e6d58f1de7c vs
0e258a7527fe053eea032e6d58f1de7c
create 664 306/200 48 tmp/old_list
Read EOF:
Tried again: got 0 bytes
Can't write 4 bytes to socket
finish: removing in-process file tmp/service-perfdata
delete 644 306/200 343155581 tmp/service-perfdata.0001.gz
delete 664 306/200 343250131 tmp/service-perfdata.5.gz
delete 644 306/200 186949772 tmp/service-perfdata.0000.bz2
delete 664 306/200 341890997 tmp/service-perfdata.4.gz
delete 644 306/200 1427879157 tmp/service-perfdata
Child is aborting
Done: 4 files, 916199608 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Is there anything I can do to get better diagnostics. If rsync
--server --sender exits with an error, how well does the File::RsyncP
module do grabbing stderr (or stdout which it would see as a breaking
of the protocol) and sending it back to the xfer log?
Is there a flag/option I can set in File::RsyncP?
(Time to perldoc File::RsyncP I guess.)
> >Also is there a way to tail the xfer logs in realtime while the daemon
> >is controling the backup? So I don't have to wait for the backup to
> >finish?
>
> You aren't going to see a problem in the log file - the other end is
> crashing.
Well I have two backups still running (3+ hours later) and I am trying
to find out what file they are stuck on. Nothing that I can see should
be hanging the rsync this long compared to when I run an rsync
directly.
--
-- rouilj
John Rouillard
System Administrator
Renesys Corporation
603-643-9300 x 111
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
BackupPC-users mailing list
[email protected]
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/