Ah ha...

Looks like I may give rsync-2.4.3 a try and see how it goes over using
ssh.  I am the one of the people who was having problems with clients and
servers both using rsync-2.4.1 over ssh-1.2.27 (it was hanging...we are
using various revisions of FreeBSD and BSDI.)

Going back to rsync-2.3.1 on the clients cleared up this problem,
with rsync-2.4.1 is still being run on the server...

I'll give rsync-2.4.3 a try on both the client and server side, with
ssh-1.2.27 and see how it goes.  I'll post a followup on my results
whenever I'm done to indicate whether or not it worked.

-Chris Tracy
 (Telerama Internet -/- Network Administrator -/- www.telerama.com)

On Wed, 14 Jun 2000, Dave Dykstra wrote:

> (I am adding the rsync mailing list to the Cc)
> 
> > Wout van Albada <[EMAIL PROTECTED]> wrote on the ssh mailing list:
> > > 
> > > Hi,
> > > 
> > > I think I encountered a serious bug in ssh 1.2.27. There seems to be
> > > a race condition where the ssh daemon (sshd) drops data when it has
> > > to send it over a slow line. I sent this bug report to
> > > [EMAIL PROTECTED]
> > > and [EMAIL PROTECTED] on 27/03 but have heard nothing from either so
> > > far.
> > > 
> > > I'll try to clarify what happens:
> > > 
> > > There are two machines, server and client. Both machines run Solaris.
> > > The client makes an ssh connection to the server to download a file:
> > > 
> > > server% ls -l /tmp/DATA
> > > -rw-r--r--   1 wout  staff     200000 Mar 23 11:20 DATA
> > > 
> > > client% ssh server cat /tmp/DATA > /tmp/DATA
> > > client% ls -l /tmp/DATA
> > > -rw-r--r--   1 wout  staff     194560 Mar 24 17:10 /tmp/DATA
> > > 
> > > This would copy a file '/tmp/DATA' from server to /tmp/DATA on client.
> > > In this particular case file DATA was 200000 bytes. The size has
> > > to be larger then the buffers used inside sshd.
> > > 
> > > When the command is run, most data is sent over the line as it should
> > > be. However, when the 'cat' process dies, sshd receives a SIGCHLD and
> > > then fails to read the data left in the pipe to the 'cat' program.
> > > 
> > > To be more precise, sshd only reads the data left in the pipe to 'cat'
> > > if it has space for it in the outgoing buffer (the buffer that is used
> > > to store data going back to the client).
> > > 
> > > So the following happens (all in serverloop.c):
> > > 
> > >  1. For a while sshd reads data from the 'cat' command. This data is
> > >     transmitted to the client, where it is put in /tmp/DATA.
> > >  2. cat writes the final data to the pipe to sshd and exits.
> > >  3. sshd receives a SIGCHLD and sets child_terminated and
> > >     child_just_terminated to 1.
> > >  4. sshd falls out of the select() (line 413) it was in
> > >     (it usually receives the signal during the select() call).
> > >     select() returns -1 because it was interrupted by the signal.
> > >  5. sshd empties readset and writeset (lines 415-422 serverloop.c).
> > >  6. The if statements on lines 426 and 446 fail.
> > >  7. sshd does its usual stuff and then calls
> > >     wait_until_can_do_something().
> > >  8. The call to packet_not_very_much_data_to_write() on line 353
> > >     returns false (because the outgoing buffer contains more than
> > >     16384 bytes). This causes fdout and fderr not to be set in the
> > >     readset file descriptor set (lines 355-358).
> > >  9. select() on line 413 returns 0 again (due to slow network
> > >     connection to client). This time the if statement on line
> > >     426 succeeds (child_just_terminated has been set to 0 earlier).
> > > 10. Descriptor fdout, fderr and fdin are closed (lines 432-442)
> > >     causing the data available to fdout never being read.
> > > 
> > > The change I made to fix this is in a patch (diff on original
> > > serverloop.c and modified serverloop.c) you will find attached
> > > to this mail. It changes lines 432-439. Instead of blindly closing
> > > the fdout and fderr descriptors when select() returns 0, it only
> > > closes them if the fdout_eof and fderr_eof flags have been set,
> > > respectively. The bug was that the code in lines 426-443 assumed
> > > that select() always provides information on fdout and fderr, which
> > > is not the case as they had not been set in the readset.
> > > 
> > > For completeness, I also attach the 'sshd -d' output for a faulty
> > > session (original sshd 1.2.27, data is lost) and output for a session
> > > after having applied my patch.
> > > 
> > > Please let me know what you make of this.
> > > 
> > > Wout van Albada
> > > Software Engineer
> > > 
> > > [EMAIL PROTECTED]
> > > 
> > > --- serverloop.c.ORIG   Sun Mar 26 13:20:14 2000
> > > +++ serverloop.c        Sun Mar 26 13:25:15 2000
> > > @@ -429,14 +429,14 @@
> > >        if (cleanup_context)
> > >          pty_cleanup_proc(cleanup_context);
> > > 
> > > -      if (fdout != -1)
> > > +      if (fdout != -1 && fdout_eof) {
> > >          close(fdout);
> > > -      fdout = -1;
> > > -      fdout_eof = 1;
> > > -      if (fderr != -1)
> > > +       fdout = -1;
> > > +      }
> > > +      if (fderr != -1 && fderr_eof) {
> > >          close(fderr);
> > > -      fderr = -1;
> > > -      fderr_eof = 1;
> > > +        fderr = -1;
> > > +      }
> > >        if (fdin != -1)
> > >          close(fdin);
> > >        fdin = -1;
> ..
> 
> On Mon, Jun 12, 2000 at 08:02:43AM -0700, Rick Moen wrote on ssh mailing list:
> > begin  Ville Herva quotation:
> > 
> > > At least rsync-2.4.x has known problems when ran over ssh pipe. See rsync
> > > mailing list archive [http://rsync.samba.org/listproc/rsync/] for details.
> > 
> > It's a select() deadlock.  Ton Hospel posted to the GCC mailing list a 
> > GPLed wrapper for SSH that fixes it.  I keep a copy at 
> > http://linuxmafia.com/pub/linux/security/ssh-rsync-wrapper
> 
> Here's the status of the most recent releases of rsync:
>     2.4.3 - sets O_NONBLOCK on stdin and stdout.  There haven't been
>       reports that it still hangs ssh, but there have been numerous
>       reports that it gets rsync protocol errors ("unexpected tag" is the
>       one most often reported).  I wonder if ssh can't completely handle
>       being in non-blocking mode, and I wonder if Wout's patch would solve
>       those problems. 
>     2.4.2 - similar to 2.4.3 except that it didn't work with rsh so it was
>       shortlived.
>     2.4.1 - switched to using socketpairs instead of pipes and removed
>       complicated buffering scheme that worked around ssh hangs.  
>       Numerous hangs of ssh on Solaris at least were reported.
>     2.4.0 - similar to 2.4.1 except it had some serious bug so it was
>       very short lived.
>     2.3.2 - uses pipes, not socketpairs, and has complicated buffering
>       scheme that seems to work pretty well to avoid ssh hangs.  Still
>       the preferred version for most people.
>       
> 
> Wout, what version of rsync were you using when you developed your patch?
> My guess is that it would be most necessary for 2.4.1, and it sounds like
> it will do a better job than turning on O_NONBLOCK.
> 
> I also wonder if OpenSSH has the same problem.
> 
> - Dave Dykstra
> 

Reply via email to