On Thu, Oct 25, 2007 at 02:10:12PM -0500, Les Mikesell wrote:
> John Rouillard wrote:
> >I have:
> >
> >   $Conf{ClientTimeout} = 72000;
> >
> >which is 20 hours and the sigpipe is occurring before then.
> 
> You'd see sigalarm instead of sigpipe if you had a timeout.

Something like this I assume:

  full backup started for directory /usr/local
  Running: /usr/bin/ssh -q -x -l backup vpn01.psm1.renesys.com sudo
  /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group
  -D --links --hard-links --times --block-size=2048 --recursive
  --one-file-system --checksum-seed=32761 --ignore-times . /usr/local/
  Xfer PIDs are now 7909
  Got remote protocol 28
  Negotiated protocol version 28
  Checksum caching enabled (checksumSeed = 32761)
  Xfer PIDs are now 7909,7924
    create d 755       0/0        4096 .
    create d 755       0/0        4096 bin
    pool   l 777       0/0          15 bin/envdir -> /command/envdir
    pool   l 777       0/0          18 bin/envuidgid ->
  /command/envuidgid
    pool   l 777       0/0          15 bin/fghack -> /command/fghack
    pool   l 777       0/0          17 bin/multilog -> /command/multilog
    pool   l 777       0/0          17 bin/pgrphack -> /command/pgrphack
    pool   l 777       0/0          22 bin/readproctitle -> 
/command/readproctitle
    pool   l 777       0/0          16 bin/setlock -> /command/setlock
    pool   l 777       0/0          18 bin/setuidgid -> /command/setuidgid
    pool   l 777       0/0          18 bin/softlimit -> /command/softlimit
    pool   l 777       0/0          18 bin/supervise -> /command/supervise
    pool   l 777       0/0          12 bin/svc -> /command/svc
    pool   l 777       0/0          13 bin/svok -> /command/svok
    pool   l 777       0/0          15 bin/svscan -> /command/svscan
    pool   l 777       0/0          19 bin/svscanboot -> /command/svscanboot
    pool   l 777       0/0          15 bin/svstat -> /command/svstat
    pool   l 777       0/0          15 bin/tai64n -> /command/tai64n
    pool   l 777       0/0          20 bin/tai64nlocal -> /command/tai64nlocal
    create d 755       0/0        4096 etc
    create d 755       0/0        4096 games
    create d 755       0/0        4096 include
    create d 755       0/0        4096 lib
    create d 755       0/0        4096 libexec
    create d 755       0/0        4096 man
    create d 755       0/0        4096 man/man1
    create d 755       0/0        4096 sbin
    create d 755       0/0        4096 share
    create d 755       0/0        4096 share/info
    create d 755       0/0        4096 share/man
    create d 755       0/0        4096 share/man/man1
    create d 755       0/0        4096 share/man/man2
    create d 755       0/0        4096 share/man/man3
    create d 755       0/0        4096 share/man/man4
    create d 755       0/0        4096 share/man/man5
    create d 755       0/0        4096 share/man/man6
    create d 755       0/0        4096 share/man/man7
    create d 755       0/0        4096 share/man/man8
    create d 755       0/0        4096 share/man/man9
    create d 755       0/0        4096 share/man/mann
    create d 755       0/0        4096 src
    create d 755       0/1       12288 src/fastforward-0.51
  finish: removing in-process file .
  Child is aborting
  Done: 17 files, 283 bytes
  Got fatal error during xfer (aborted by signal=ALRM)
  Backup aborted by user signal
 
Also I straced the rsync process on the remote system while it was hung
(I assume on whatever occurred after the src/fastforward-0.51)
directory and got:

  [EMAIL PROTECTED] ~]$ ps -ef | grep 6909
  root      6909  6908  0 Oct25 ?        00:00:00 /usr/bin/rsync
  --server --sender --numeric-ids --perms --owner --group -D --links
  --hard-links --times --block-size=2048 --recursive --one-file-system
  --checksum-seed=32761 --ignore-times . /usr/local/
  rouilj   10603 10349  0 05:36 pts/0    00:00:00 grep 6909
  [EMAIL PROTECTED] ~]$ strace -p 6909
  attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
  [EMAIL PROTECTED] ~]$ sudo strace -p 6909
  Process 6909 attached - interrupt to quit
  select(1, [0], [], NULL, {42, 756000})  = 0 (Timeout)
  select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
  select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
  select(1, [0], [], NULL, {60, 0} <unfinished ...>
  Process 6909 detached

And similar results on the server side process. Maybe a deadlock
somewhere? The ssh pipe appeared open. I set it up to forward traffic
and was able to pass traffic from the server to the client.

-- 
                                -- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-643-9300 x 111

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
BackupPC-users mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to