On 12/09/2015 01:06 PM, Marc Aurele La France wrote:
> Greetings.
> 
> The following four commits are some of the changes that have been made
> to the tty layer since kernel version 3.11:
> 
> 1) f95499c3030fe1bfad57745f2db1959c5b43dca8
>     n_tty: Don't wait for buffer work in read() loop
> 
> 2) f8747d4a466ab2cafe56112c51b3379f9fdb7a12
>     tty: Fix pty master read() after slave closes
> 
> 3) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73
>     pty, n_tty: Simplify input processing on final close
> 
> 4) 1a48632ffed61352a7810ce089dc5a8bcd505a60
>     pty: Fix input race when closing
> 
> Commit "4)" corrected an issue whereby EIO could be prematurely
> returned on a read of one end of a master/slave pty pair after the
> other had been completely closed.  Yet, I would argue that EAGAIN
> should not be returned either when there actually is data to be
> returned.  This whether or not the other end has been completely
> closed.
> 
> Indeed, the previous code (before commit "1)") checked the other end
> of the pty pair for any data before returning EAGAIN.  This mimics the
> behaviour of other System-V variants (Solaris, AIX, etc.)
                                                      ^^^^
What other SysV systems were tested?

> in this
> regard and ensured that EAGAIN really did mean no data was available
> at the time of the call.
> 
> Portable OpenSSH, since release 4.6p1 in 2007, relies on this being
> the case and has been broken since commit "1)" introduced spurious
> EAGAIN returns (i.e. as of 3.12 kernels).  The scenario at hand is
> as follows.
> 
> After sshd has been SIGCHLD'ed about the shell's termination, it
> continues to read the master pty until an error occurs.  This error
> will be EIO if no process has the slave pty open.  Otherwise (for
> example when the shell spawned long-running processes in the
> background before terminating), that error is expected to be EAGAIN.
> sshd cannot continue to read until an EIO in all cases, because doing
> so causes the session to hang until all processes have closed the
> slave pty, which is not the desired behaviour.  Thus a spurious EAGAIN
> return causes sshd to lose data, whether or not the slave pty is
> completely closed.

Ah, the games userspace will be up to :)


> I've been using the following script to reproduce the problem.  It
> loops until the issue is detected.
> 
>       #! /bin/bash
> 
>       LOG=sshlog-`date "+%F.%T"`
> 
>       touch ${LOG}
> 
>       while test -z "`grep -n '^Connection' ${LOG} | grep -v '0:Connection'`"
>       do
>        ssh -p 22 -tt root@localhost \
>         '/bin/bash -c "/bin/ping -c4 8.8.8.8"' 2>&1 | \
>         tee -a ${LOG}
>       done
> 
> It should be noted that the problem is extremely rare, but still
> occurs, on real hardware.  This bug is easier to replicate in a
> virtual machine such as those that can be created using Google Cloud.
> 
> The patch below is a suggested fix.  It was developed using a 4.3.0
> kernel and should apply, modulo fuzz, to any release >= 4.0.5.  My
> suggested fix is modeled after commit "2)" mentionned above.  Given
> commit "2)" was later reworked by commit "3)", I fully expect my fix
> to be reworked as well.
> 
> I volunteer to backport the fix this ends up being to any stable
> release >= 3.12 deemed needed.
> 
> Please Reply-To-All.
> 
> Thanks and have a great day.
> 
> Marc.
> 
> Reported-by: Volth <open...@volth.com>
> BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52
> BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492
> Signed-off-by: Marc Aurele La France <t...@tuyoix.net>
> 
> --- a/drivers/tty/n_tty.c
> +++ a/drivers/tty/n_tty.c
> @@ -2281,20 +2281,26 @@ static ssize_t n_tty_read(struct tty_struct *tty, 
> struct file *file,
>                       if (!timeout)
>                               break;
>                       if (file->f_flags & O_NONBLOCK) {
> -                             retval = -EAGAIN;
> -                             break;
> -                     }
> -                     if (signal_pending(current)) {
> -                             retval = -ERESTARTSYS;
> -                             break;
> -                     }
> -                     up_read(&tty->termios_rwsem);
> +                             up_read(&tty->termios_rwsem);
> +                             flush_work(&tty->port->buf.work);
> +                             down_read(&tty->termios_rwsem);
> +                             if (!input_available_p(tty, 0)) {
> +                                     retval = -EAGAIN;
> +                                     break;
> +                             }
> +                     } else {
> +                             if (signal_pending(current)) {
> +                                     retval = -ERESTARTSYS;
> +                                     break;
> +                             }
> +                             up_read(&tty->termios_rwsem);

No sense in doing this just for O_NONBLOCK; might as well do it before
all the condition tests.

Which renders the earlier fixes for the slave end closing superfluous,
so might as well rip those out.

n_tty_poll() will need to be fixed as well, because if one application
used read() with O_NONBLOCK to expect to block until i/o became available,
then I guarantee some other application uses poll() with no timeout
for the same purpose.

Regards,
Peter Hurley

> 
> -                     timeout = wait_woken(&wait, TASK_INTERRUPTIBLE,
> -                                          timeout);
> +                             timeout = wait_woken(&wait, TASK_INTERRUPTIBLE,
> +                                                  timeout);
> 
> -                     down_read(&tty->termios_rwsem);
> -                     continue;
> +                             down_read(&tty->termios_rwsem);
> +                             continue;
> +                     }
>               }
> 
>               if (ldata->icanon && !L_EXTPROC(tty)) {
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to