Al, On Fri, 2017-04-07 at 05:12 +0100, Al Viro wrote: > On Fri, Apr 07, 2017 at 01:50:53PM +1000, Michael Neuling wrote: > > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c > > index bdf0e6e899..a2a9832a42 100644 > > --- a/drivers/tty/n_tty.c > > +++ b/drivers/tty/n_tty.c > > @@ -1668,11 +1668,17 @@ static int > > n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, > > char *fp, int count, int flow) > > { > > - struct n_tty_data *ldata = tty->disc_data; > > + struct n_tty_data *ldata; > > int room, n, rcvd = 0, overflow; > > > > down_read(&tty->termios_rwsem); > > > > + ldata = tty->disc_data; > > + if (!ldata) { > > + up_read(&tty->termios_rwsem); > > I very much doubt that it's correct. It shouldn't have been called after > the n_tty_close(); apparently it has been. ->termios_rwsem won't serialize > against it, and something apparently has gone wrong with the exclusion there. > At the very least I would like to see what's to prevent n_tty_close() from > overlapping the exection of this function - if *that* is what broke, your > patch will only paper over the problem.
It does seem like I'm papering over a problem. Would you be happy with the patch if we add a WARN_ON_ONCE()? I think the problem is permanent rather than a race/transient with the disc_data being NULL as if we read it again later, it's still NULL. Benh and I looked at this a bunch and we did notice tty_ldisc_reinit() was being called called without the tty lock in one location. We tried the below patch but it didn't help (not an upstreamable patch, just a test). There has been a few attempts are trying to fix this but none have worked for me: https://lkml.org/lkml/2017/3/23/569 and https://patchwork.kernel.org/patch/9114561/ I'm not that familiar with the tty layer (and I value my sanity) so I'm struggling to root cause it by myself. Mikey diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 734a635e73..121402ff25 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1454,6 +1454,9 @@ static void tty_driver_remove_tty(struct tty_driver *driver, struct tty_struct * driver->ttys[tty->index] = NULL; } +extern int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout); +extern void tty_ldisc_unlock(struct tty_struct *tty); + /* * tty_reopen() - fast re-open of an open tty * @tty - the tty to open @@ -1466,6 +1469,7 @@ static void tty_driver_remove_tty(struct tty_driver *driver, struct tty_struct * static int tty_reopen(struct tty_struct *tty) { struct tty_driver *driver = tty->driver; + int rc = 0; if (driver->type == TTY_DRIVER_TYPE_PTY && driver->subtype == PTY_TYPE_MASTER) @@ -1479,10 +1483,12 @@ static int tty_reopen(struct tty_struct *tty) tty->count++; + tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT); if (!tty->ldisc) - return tty_ldisc_reinit(tty, tty->termios.c_line); + rc = tty_ldisc_reinit(tty, tty->termios.c_line); + tty_ldisc_unlock(tty); - return 0; + return rc; } /** diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c index d0e84b6226..3b13ff11c5 100644 --- a/drivers/tty/tty_ldisc.c +++ b/drivers/tty/tty_ldisc.c @@ -334,7 +334,7 @@ static inline void __tty_ldisc_unlock(struct tty_struct *tty) ldsem_up_write(&tty->ldisc_sem); } -static int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout) +int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout) { int ret; @@ -345,7 +345,7 @@ static int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout) return 0; } -static void tty_ldisc_unlock(struct tty_struct *tty) +void tty_ldisc_unlock(struct tty_struct *tty) { clear_bit(TTY_LDISC_HALTED, &tty->flags); __tty_ldisc_unlock(tty);