Brian Tao wrote:
> I have a Solaris 2.6 server that rsyncs between NFS-mounted
> filesystems and local disk (no rsh or ssh involved, no remote rsyncd
> server... the NFS servers are Netapps).  2.4.3 will consistently hang
> at seemingly random points through a sync, but 2.3.2 works dandily.

First of all: when rsyncing over NFS, the rsync algorithm actually makes
for worse performance because it not only reads entire "from" and "to"
files to calculate checksums but also sends the checksums and the entire
"to" file.  Use the -W option to disable the rsync algorithm.

It is very strange that 2.4.3 is different than 2.3.2 with this as the hack
that was taken out in 2.4.x was only supposed to affect ssh, and I don't
believe blocking i/o could make any difference because I think it only
affects the child process that rsync starts to invoke the other side (such
as rsh or ssh) and that wouldn't be used for local copies.


>     I can provide output from truss, snoop, netstat, etc. and rebuild
> the binaries with symbol information and provide core dumps if someone
> thinks that will help.

The author of rsync Andrew Tridgell said this when he released 2.4.3:

    Finally a plea. When reporting a freeze _please_ include the following
    or we can't help:

    - what is the state of the send/receive queues shown with netstat on
      the two ends.
    - what system call is each of the 3 processes stuck in. Use truss on
      solaris or strace on Linux.

    that info gives us the basic knowledge to categorise the problems and
    work out a fix. Many people seem to be assuming there is just one bug
    that needs fixing. That is defiately _not_ the case. There are
    numerous bugs that each give the symptoms of a freeze. The bugs are at
    the TCP level, the syscall level, the application transport level (rsh
    and ssh) and in rsync.  rsync 2.4.3 is my best effort to address the  
    problems that are fixable within rsync, but without the above info I  
    don't have the basic knowledge to tell which part of the system the   
    bug is in. That means I have to guess - and guesswork leads to errors.


Hopefully he'll have a chance to look at it.  He has put a --blocking-io
option into the CVS archive which defaults on when using rsh.


On Tue, Jun 27, 2000 at 02:53:15AM +1000, Adye, TJ (Tim)  wrote:
> 
> Now that I've started to use rsync for local file copies, I'm seeing this
> too. I copy between two local Solaris 2.6 RAID filesystems (not even NFS to
> confuse the issue). rsync tends to hang just before finishing the copy. When
> I rerun it, rsync -v just prints out a few directory names (I guess it still
> needed to update the directory modification dates or something) and
> completes normally. Running it a third time shows that I am then up to date.
> I've seen this behaviour a couple of times with large copies and a colleague
> (running on a different system) saw it several times before.
> 
> Quite apart from having to kill the program, this is annoying as I don't get
> the statistics. I haven't tried 2.3.2. Does anyone have any ideas?


This is even stranger.  Again, I think the -W option should be used for a
local copy and it may help.  It seems to me that it should be the default
for local copies.

- Dave Dykstra

Reply via email to