On Fri, May 25, 2001 at 12:14:17PM -0500, Phil Howard wrote:
> I switched to 2.4.6 a while back, but have only been making heavy
> use of rsync the past couple of months, and have been running into
> a few problems that may be bugs.  I looked at the bug tracker, but
> it was too cumbersome to use effectively.  I don't know if these
> are real bugs or just configuration mistakes.  Maybe you can tell
> me.
> 
> The host OS is Linux 2.4.X (X is 0 on some and 2 on others) and
> Slackware 7.1 in all cases.
> 
> Here are the things I'm running into that I did not have in 2.4.4:
> 
> 1 =============================================================================
> Write failed: Cannot allocate memory
> unexpected EOF in read_timeout
> unexpected EOF in read_timeout
> 
> I've seen this happen when there was over 256 meg available space
> between ram and swap, so it should not have failed as a result of
> not being able to get a reasonable amount from the system.  This
> also occurs randomly; if I wipe the files back out and run it all
> again, it often does not happen the next time, or if it does, it
> is not at the same point.  Also, the number of files being copied
> is smallish (not more than 100), and it happens even when no file
> is larger than about 4 meg and the total transfer is no larger than
> 20 meg (so even if it pre-loaded every file into ram, there should
> be enough space).  The happens even if the target directory starts
> empty.
> 
> This is done through ssh and I do not recall it happening when using
> an rsync daemon.


This must be coming from ssh, not rsync.  There is no string "Write failed"
in the rsync source code, but there is in both ssh 1.2.27 and openssh 2.9p1.



> 2 =============================================================================
> When syncronizing a very large number of files, all files in a large
> partition, rsync frequently hangs.  It's about 50% of the time, but
> seems to be a function of how much work there was to be done.  That
> is, if I run it soon after it just ran, it tends to not hang, but if
> I run it after quite some time (and lots of stuff to syncronize) it
> tends to hang.  It appears to have completed all the files, but I
> don't get any stats.  There are 3 rsync processes sitting idle with
> no files open in the source or target trees.
> 
> At last count there were 368827 files and 8083 symlinks in 21749
> directories.
> 
> df shows:
> /dev/hda4             42188460  38303916   3884544  91% /home
> /dev/hdb4             42188460  38301972   3886488  91% /mnt/hdb/home
> 
> df -i shows:
> /dev/hda4            2662400  398419 2263981   15% /home
> /dev/hdb4            2662400  398462 2263938   15% /mnt/hdb/home
> 
> The df numbers are not exact because change is constantly happening
> on this active server.  Drives hda and hdb are identical and are
> partitioned alike.
> 
> The command line is echoed from the script that runs it:
> 
> rsync -axv --stats --delete /home/. /mnt/hdb/home/.  
>1>'/home/root/backup-hda-to-hdb/home.log' 2>&1


Use the -W option to disable the rsync algorithm.  We really ought to make
that the default when both the source and destination are local.


> The log file shows a file list gone all the way to the last file
> and lsof done after the hang shows:
> 
> rsync     26651    root  cwd    DIR        3,2     4096         24 /root
> rsync     26651    root  rtd    DIR        3,2     4096          2 /
> rsync     26651    root  txt    REG       3,10   187443       8758 
>/usr/local/bin/rsync
> rsync     26651    root  mem    REG        3,2    79276       4239 /lib/ld-2.1.3.so
> rsync     26651    root  mem    REG        3,2  1013224       4249 /lib/libc-2.1.3.so
> rsync     26651    root  mem    REG        3,2    40360       4274 
>/lib/libnss_compat-2.1.3.so
> rsync     26651    root  mem    REG        3,2    75500       4272 
>/lib/libnsl-2.1.3.so
> rsync     26651    root    0u   CHR     136,14                  16 /dev/pts/14
> rsync     26651    root    1w   REG        3,4    56898    1778435 
>/home/root/backup-hda-to-hdb/home.log
> rsync     26651    root    2w   REG        3,4    56898    1778435 
>/home/root/backup-hda-to-hdb/home.log
> rsync     26651    root    4u  unix 0xcb0f4040           135813770 socket
> rsync     26651    root    5u  unix 0xcb0f4cc0           135813771 socket
> rsync     26652    root  cwd    DIR       3,68     4096          2 /mnt/hdb/home
> rsync     26652    root  rtd    DIR        3,2     4096          2 /
> rsync     26652    root  txt    REG       3,10   187443       8758 
>/usr/local/bin/rsync
> rsync     26652    root  mem    REG        3,2    79276       4239 /lib/ld-2.1.3.so
> rsync     26652    root  mem    REG        3,2  1013224       4249 /lib/libc-2.1.3.so
> rsync     26652    root  mem    REG        3,2    40360       4274 
>/lib/libnss_compat-2.1.3.so
> rsync     26652    root  mem    REG        3,2    75500       4272 
>/lib/libnsl-2.1.3.so
> rsync     26652    root    1u  unix 0xcb93b9a0           135813772 socket
> rsync     26652    root    2w   REG        3,4    56898    1778435 
>/home/root/backup-hda-to-hdb/home.log
> rsync     26652    root    3u  unix 0xca8edcc0           135814161 socket
> rsync     26652    root    5u  unix 0xcc9969a0           135814163 socket
> rsync     26704    root  cwd    DIR       3,68     4096          2 /mnt/hdb/home
> rsync     26704    root  rtd    DIR        3,2     4096          2 /
> rsync     26704    root  txt    REG       3,10   187443       8758 
>/usr/local/bin/rsync
> rsync     26704    root  mem    REG        3,2    79276       4239 /lib/ld-2.1.3.so
> rsync     26704    root  mem    REG        3,2  1013224       4249 /lib/libc-2.1.3.so
> rsync     26704    root  mem    REG        3,2    40360       4274 
>/lib/libnss_compat-2.1.3.so
> rsync     26704    root  mem    REG        3,2    75500       4272 
>/lib/libnsl-2.1.3.so
> rsync     26704    root    0u  unix 0xce9cd9a0           135813769 socket
> rsync     26704    root    2w   REG        3,4    56898    1778435 
>/home/root/backup-hda-to-hdb/home.log
> rsync     26704    root    6u  unix 0xcecaccc0           135814164 socket
> 
> If I kill 26704, nothing particularly happens.  If I kill 26652,
> then I get "unexpected EOF in read_timeout" (not surprising), which
> tells me 26651 was waiting for something from 26652 which was not
> happening.  Maybe 26652 was waiting for something from 26651 first.
> A deadly embrace?  It seems possible.


No, the receiving side of an rsync transaction splits itself into two
processes for the sake of pipelining: one to generate checksums and one to
accept updates.  When you're sending and receiving to the same machine then
you've got one sender and 2 receivers.


> I'm also curious why 26704 has no fd 1.

I don't know.  When I tried it all 3 processes had an fd 1.


> 3 =============================================================================
> @ERROR: max connections (16) reached - try again later
> 
> This occurs after just one connection is active.  It behaves as if
> I had specified "max connections = 1".  On another server I set it
> to 40, and it showed:
> 
> @ERROR: max connections (40) reached - try again later
> 
> so it obvious is parsing and keeping the value I configure, but it
> isn't using it correctly.
> 
> Also, if I ^C the client, then I get this error every time until I
> restart the daemon (running in standalone daemon mode, not inetd).
> So it seems like it counts clients wrong.  But I can't get more
> that 1 right after restarting the server, so it's a little more
> than that somewhere.

I don't know, I never used max connections.  Could indeed be a bug.
The code looks pretty tricky.  It's trying to lock pieces of the file
/var/run/rsyncd.lock in order for independent processes to coordinate. 
Are you running as root (the lsof above suggests you are)?  If not, you
probably need to specify another file that your daemon has access to in the
"lock file" option.  Otherwise it would probably help for you to run some
straces.

- Dave Dykstra

Reply via email to