Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-31 Thread Dmitry Safonov
On Fri, 2018-08-31 at 13:21 +0200, Jiri Slaby wrote:
> On 08/31/2018, 01:17 PM, Tetsuo Handa wrote:
> > Also, noisy messages like
> > 
> >   pts pts4033: tty_release: tty->count(10529) != (#fd's(7) +
> > #kopen's(0))
> > 
> > are gone.
> 
> fwiw, thanks to 1/4…
> 
> Dmitry, could you note the message above to the commit log of 1/4, so
> that we know the patch fixed that and downstream can pick it up...
> 

Sure, will do.

-- 
Thanks,
 Dmitry


Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-31 Thread Jiri Slaby
On 08/31/2018, 01:17 PM, Tetsuo Handa wrote:
> Also, noisy messages like
> 
>   pts pts4033: tty_release: tty->count(10529) != (#fd's(7) + #kopen's(0))
> 
> are gone.

fwiw, thanks to 1/4…

Dmitry, could you note the message above to the commit log of 1/4, so
that we know the patch fixed that and downstream can pick it up...

-- 
js
suse labs


Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-31 Thread Tetsuo Handa
On 2018/08/31 15:51, Jiri Slaby wrote:
> On 08/29/2018, 05:19 PM, Tetsuo Handa wrote:
>> On 2018/08/29 11:23, Dmitry Safonov wrote:
>>> tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup()
>>> nor set_ldisc() nor tty_ldisc_release() as they use tty lock.
>>> But it races with anyone who expects line discipline to be the same
>>> after hoding read semaphore in tty_ldisc_ref().
>>>
>>> We've seen the following crash on v4.9.108 stable:
>>>
>>> BUG: unable to handle kernel paging request at 2260
>>> IP: [..] n_tty_receive_buf_common+0x5f/0x86d
>>> Workqueue: events_unbound flush_to_ldisc
>>> Call Trace:
>>>  [..] n_tty_receive_buf2
>>>  [..] tty_ldisc_receive_buf
>>>  [..] flush_to_ldisc
>>>  [..] process_one_work
>>>  [..] worker_thread
>>>  [..] kthread
>>>  [..] ret_from_fork
>>>
>>> I think, tty_ldisc_reinit() should be called with ldisc_sem hold for
>>> writing, which will protect any reader against line discipline changes.
>>>
>>> Note: I failed to reproduce the described crash, so obiviously can't
>>> guarantee that this is the place where line discipline was switched.
>>
>> This will be same with a report at
>> https://syzkaller.appspot.com/bug?id=f08670354701fa64cc0dd3c0128a491bdb16adcc
>>  .
>>
>> syzbot is now testing a patch from Jiri Slaby.
> 
> Yes, my patch passed, so could you add:
> Reported-by: syzbot+3aa9784721dfb90e9...@syzkaller.appspotmail.com
> 
> (not adding tested-by as this particular patch was not tested, but
> shoiuld work the same way.)
> 
> thanks,
> 

Tested with all 4 patches applied using syzbot-provided reproducer and
my simplified reproducer. No crashes and no lockdep warnings.
Also, noisy messages like

  pts pts4033: tty_release: tty->count(10529) != (#fd's(7) + #kopen's(0))

are gone. Very nice. Thank you.


Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-30 Thread Jiri Slaby
On 08/29/2018, 05:19 PM, Tetsuo Handa wrote:
> On 2018/08/29 11:23, Dmitry Safonov wrote:
>> tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup()
>> nor set_ldisc() nor tty_ldisc_release() as they use tty lock.
>> But it races with anyone who expects line discipline to be the same
>> after hoding read semaphore in tty_ldisc_ref().
>>
>> We've seen the following crash on v4.9.108 stable:
>>
>> BUG: unable to handle kernel paging request at 2260
>> IP: [..] n_tty_receive_buf_common+0x5f/0x86d
>> Workqueue: events_unbound flush_to_ldisc
>> Call Trace:
>>  [..] n_tty_receive_buf2
>>  [..] tty_ldisc_receive_buf
>>  [..] flush_to_ldisc
>>  [..] process_one_work
>>  [..] worker_thread
>>  [..] kthread
>>  [..] ret_from_fork
>>
>> I think, tty_ldisc_reinit() should be called with ldisc_sem hold for
>> writing, which will protect any reader against line discipline changes.
>>
>> Note: I failed to reproduce the described crash, so obiviously can't
>> guarantee that this is the place where line discipline was switched.
> 
> This will be same with a report at
> https://syzkaller.appspot.com/bug?id=f08670354701fa64cc0dd3c0128a491bdb16adcc 
> .
> 
> syzbot is now testing a patch from Jiri Slaby.

Yes, my patch passed, so could you add:
Reported-by: syzbot+3aa9784721dfb90e9...@syzkaller.appspotmail.com

(not adding tested-by as this particular patch was not tested, but
shoiuld work the same way.)

thanks,
-- 
js
suse labs


Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-29 Thread Benjamin Herrenschmidt
On Wed, 2018-08-29 at 13:34 +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> Cc-ing Benjamin on this.
> 
> On (08/29/18 03:23), Dmitry Safonov wrote:
> > BUG: unable to handle kernel paging request at 2260
> > IP: [..] n_tty_receive_buf_common+0x5f/0x86d
> > Workqueue: events_unbound flush_to_ldisc
> > Call Trace:
> >  [..] n_tty_receive_buf2
> >  [..] tty_ldisc_receive_buf
> >  [..] flush_to_ldisc
> >  [..] process_one_work
> >  [..] worker_thread
> >  [..] kthread
> >  [..] ret_from_fork
> 
> Seems that you are not the first one to hit this NULL deref.
> 
> > I think, tty_ldisc_reinit() should be called with ldisc_sem hold for
> > writing, which will protect any reader against line discipline changes.
> 
> Per https://lore.kernel.org/patchwork/patch/777220/
> 
> : Note that we noticed one path that called reinit without the ldisc lock
> : held for writing, we added that, but it didn't fix the problem.
> 
> And I guess that Ben meant the same reinit path which you patched:

This is too old for me to remember buit yes, there definitely was a bug
there...

> > @@ -1267,15 +1267,20 @@ static int tty_reopen(struct tty_struct *tty)
> > if (test_bit(TTY_EXCLUSIVE, &tty->flags) && !capable(CAP_SYS_ADMIN))
> > return -EBUSY;
> >  
> > -   tty->count++;
> > +   retval = tty_ldisc_lock(tty, 5 * HZ);
> > +   if (retval)
> > +   return retval;
> >  
> > +   tty->count++;
> > if (tty->ldisc)
> > -   return 0;
> > +   goto out_unlock;
> >  
> > retval = tty_ldisc_reinit(tty, tty->termios.c_line);
> > if (retval)
> > tty->count--;
> >  
> > +out_unlock:
> > +   tty_ldisc_unlock(tty);
> > return retval;
> >  }
> 
>   -ss



Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-29 Thread Dmitry Safonov
On Wed, 2018-08-29 at 16:40 +0200, Jiri Slaby wrote:
> On 08/29/2018, 04:23 AM, Dmitry Safonov wrote:
> > tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup()
> > nor set_ldisc() nor tty_ldisc_release() as they use tty lock.
> > But it races with anyone who expects line discipline to be the same
> > after hoding read semaphore in tty_ldisc_ref().
> > 
> > We've seen the following crash on v4.9.108 stable:
> > 
> > BUG: unable to handle kernel paging request at 2260
> > IP: [..] n_tty_receive_buf_common+0x5f/0x86d
> > Workqueue: events_unbound flush_to_ldisc
> > Call Trace:
> >  [..] n_tty_receive_buf2
> >  [..] tty_ldisc_receive_buf
> >  [..] flush_to_ldisc
> >  [..] process_one_work
> >  [..] worker_thread
> >  [..] kthread
> >  [..] ret_from_fork
> > 
> > I think, tty_ldisc_reinit() should be called with ldisc_sem hold
> > for
> > writing, which will protect any reader against line discipline
> > changes.
> > 
> > Note: I failed to reproduce the described crash, so obiviously
> > can't
> > guarantee that this is the place where line discipline was
> > switched.
> > 
> > Cc: Greg Kroah-Hartman 
> > Cc: Jiri Slaby 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Dmitry Safonov 
> > ---
> >  drivers/tty/tty_io.c | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> > index 5e5da9acaf0a..3ef8b977b167 100644
> > --- a/drivers/tty/tty_io.c
> > +++ b/drivers/tty/tty_io.c
> > @@ -1267,15 +1267,20 @@ static int tty_reopen(struct tty_struct
> > *tty)
> > if (test_bit(TTY_EXCLUSIVE, &tty->flags) &&
> > !capable(CAP_SYS_ADMIN))
> > return -EBUSY;
> >  
> > -   tty->count++;
> > +   retval = tty_ldisc_lock(tty, 5 * HZ);
> 
> Why 5 secs? This would cause random errors on machines under heavy
> load.

Yeah, I think MAX_SCHEDULE_TIMEOUT will make more sense here..
Not sure, why I decided to go with 5*HZ instead.
Will resend with new timeout, if everything else looks good to you.
(having in mind my argument for count++ in 1/4)

> 
> > +   if (retval)
> > +   return retval;
> >  
> > +   tty->count++;
> > if (tty->ldisc)
> > -   return 0;
> > +   goto out_unlock;
> >  
> > retval = tty_ldisc_reinit(tty, tty->termios.c_line);
> > if (retval)
> > tty->count--;
> >  
> > +out_unlock:
> > +   tty_ldisc_unlock(tty);
> > return retval;
> 
> So what about:
> tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
> if (!tty->ldisc)
> ret = tty_ldisc_reinit(tty, tty->termios.c_line);
> tty_ldisc_unlock(tty);
> 
> if (!ret)
> tty->count++;
> 
> return ret;
> 

-- 
Thanks,
 Dmitry


Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-29 Thread Tetsuo Handa
On 2018/08/29 11:23, Dmitry Safonov wrote:
> tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup()
> nor set_ldisc() nor tty_ldisc_release() as they use tty lock.
> But it races with anyone who expects line discipline to be the same
> after hoding read semaphore in tty_ldisc_ref().
> 
> We've seen the following crash on v4.9.108 stable:
> 
> BUG: unable to handle kernel paging request at 2260
> IP: [..] n_tty_receive_buf_common+0x5f/0x86d
> Workqueue: events_unbound flush_to_ldisc
> Call Trace:
>  [..] n_tty_receive_buf2
>  [..] tty_ldisc_receive_buf
>  [..] flush_to_ldisc
>  [..] process_one_work
>  [..] worker_thread
>  [..] kthread
>  [..] ret_from_fork
> 
> I think, tty_ldisc_reinit() should be called with ldisc_sem hold for
> writing, which will protect any reader against line discipline changes.
> 
> Note: I failed to reproduce the described crash, so obiviously can't
> guarantee that this is the place where line discipline was switched.

This will be same with a report at
https://syzkaller.appspot.com/bug?id=f08670354701fa64cc0dd3c0128a491bdb16adcc .

syzbot is now testing a patch from Jiri Slaby.


Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-29 Thread Jiri Slaby
On 08/29/2018, 04:40 PM, Jiri Slaby wrote:
> On 08/29/2018, 04:23 AM, Dmitry Safonov wrote:
>> tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup()
>> nor set_ldisc() nor tty_ldisc_release() as they use tty lock.
>> But it races with anyone who expects line discipline to be the same
>> after hoding read semaphore in tty_ldisc_ref().
>>
>> We've seen the following crash on v4.9.108 stable:
>>
>> BUG: unable to handle kernel paging request at 2260
>> IP: [..] n_tty_receive_buf_common+0x5f/0x86d
>> Workqueue: events_unbound flush_to_ldisc
>> Call Trace:
>>  [..] n_tty_receive_buf2
>>  [..] tty_ldisc_receive_buf
>>  [..] flush_to_ldisc
>>  [..] process_one_work
>>  [..] worker_thread
>>  [..] kthread
>>  [..] ret_from_fork
>>
>> I think, tty_ldisc_reinit() should be called with ldisc_sem hold for
>> writing, which will protect any reader against line discipline changes.
>>
>> Note: I failed to reproduce the described crash, so obiviously can't
>> guarantee that this is the place where line discipline was switched.

...

> So what about:
> tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
> if (!tty->ldisc)
> ret = tty_ldisc_reinit(tty, tty->termios.c_line);
> tty_ldisc_unlock(tty);
> 
> if (!ret)
> tty->count++;
> 
> return ret;

I forgot to add that I debugged a different NULL ptr deref to this very
same root cause today (set_termios called with NULL tty->disc_data). So
really, tty_reinit's ldisc change must be protected by the ldisc_sem,
otherwise other threads will see tty->ldisc, but not tty->disc_data.

thanks,
-- 
js
suse labs


Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-29 Thread Jiri Slaby
On 08/29/2018, 04:23 AM, Dmitry Safonov wrote:
> tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup()
> nor set_ldisc() nor tty_ldisc_release() as they use tty lock.
> But it races with anyone who expects line discipline to be the same
> after hoding read semaphore in tty_ldisc_ref().
> 
> We've seen the following crash on v4.9.108 stable:
> 
> BUG: unable to handle kernel paging request at 2260
> IP: [..] n_tty_receive_buf_common+0x5f/0x86d
> Workqueue: events_unbound flush_to_ldisc
> Call Trace:
>  [..] n_tty_receive_buf2
>  [..] tty_ldisc_receive_buf
>  [..] flush_to_ldisc
>  [..] process_one_work
>  [..] worker_thread
>  [..] kthread
>  [..] ret_from_fork
> 
> I think, tty_ldisc_reinit() should be called with ldisc_sem hold for
> writing, which will protect any reader against line discipline changes.
> 
> Note: I failed to reproduce the described crash, so obiviously can't
> guarantee that this is the place where line discipline was switched.
> 
> Cc: Greg Kroah-Hartman 
> Cc: Jiri Slaby 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Dmitry Safonov 
> ---
>  drivers/tty/tty_io.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 5e5da9acaf0a..3ef8b977b167 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -1267,15 +1267,20 @@ static int tty_reopen(struct tty_struct *tty)
>   if (test_bit(TTY_EXCLUSIVE, &tty->flags) && !capable(CAP_SYS_ADMIN))
>   return -EBUSY;
>  
> - tty->count++;
> + retval = tty_ldisc_lock(tty, 5 * HZ);

Why 5 secs? This would cause random errors on machines under heavy load.

> + if (retval)
> + return retval;
>  
> + tty->count++;
>   if (tty->ldisc)
> - return 0;
> + goto out_unlock;
>  
>   retval = tty_ldisc_reinit(tty, tty->termios.c_line);
>   if (retval)
>   tty->count--;
>  
> +out_unlock:
> + tty_ldisc_unlock(tty);
>   return retval;

So what about:
tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
if (!tty->ldisc)
ret = tty_ldisc_reinit(tty, tty->termios.c_line);
tty_ldisc_unlock(tty);

if (!ret)
tty->count++;

return ret;

thanks,
-- 
js
suse labs


Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-29 Thread Dmitry Safonov
Hi Sergey,

On Wed, 2018-08-29 at 13:34 +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> Cc-ing Benjamin on this.

Thanks!

> On (08/29/18 03:23), Dmitry Safonov wrote:
> > BUG: unable to handle kernel paging request at 2260
> > IP: [..] n_tty_receive_buf_common+0x5f/0x86d
> > Workqueue: events_unbound flush_to_ldisc
> > Call Trace:
> >  [..] n_tty_receive_buf2
> >  [..] tty_ldisc_receive_buf
> >  [..] flush_to_ldisc
> >  [..] process_one_work
> >  [..] worker_thread
> >  [..] kthread
> >  [..] ret_from_fork
> 
> Seems that you are not the first one to hit this NULL deref.
> 
> > I think, tty_ldisc_reinit() should be called with ldisc_sem hold
> > for
> > writing, which will protect any reader against line discipline
> > changes.
> 
> Per https://lore.kernel.org/patchwork/patch/777220/
> 
> : Note that we noticed one path that called reinit without the ldisc
> lock
> : held for writing, we added that, but it didn't fix the problem.

Probably, it's worth to know what exactly has he tried and what was the
backtrace he got in the result..
Hopefully, we'll hear more.

It might be also worth to review tty_ldisc_deinit(), I thought it's
safe to destroy ldisc there under tty lock during tty release, but may
be that is another non-safe place.

-- 
Thanks again,
 Dmitry


Re: [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-28 Thread Sergey Senozhatsky
Hi,

Cc-ing Benjamin on this.

On (08/29/18 03:23), Dmitry Safonov wrote:
> BUG: unable to handle kernel paging request at 2260
> IP: [..] n_tty_receive_buf_common+0x5f/0x86d
> Workqueue: events_unbound flush_to_ldisc
> Call Trace:
>  [..] n_tty_receive_buf2
>  [..] tty_ldisc_receive_buf
>  [..] flush_to_ldisc
>  [..] process_one_work
>  [..] worker_thread
>  [..] kthread
>  [..] ret_from_fork

Seems that you are not the first one to hit this NULL deref.

> I think, tty_ldisc_reinit() should be called with ldisc_sem hold for
> writing, which will protect any reader against line discipline changes.

Per https://lore.kernel.org/patchwork/patch/777220/

: Note that we noticed one path that called reinit without the ldisc lock
: held for writing, we added that, but it didn't fix the problem.

And I guess that Ben meant the same reinit path which you patched:

> @@ -1267,15 +1267,20 @@ static int tty_reopen(struct tty_struct *tty)
>   if (test_bit(TTY_EXCLUSIVE, &tty->flags) && !capable(CAP_SYS_ADMIN))
>   return -EBUSY;
>  
> - tty->count++;
> + retval = tty_ldisc_lock(tty, 5 * HZ);
> + if (retval)
> + return retval;
>  
> + tty->count++;
>   if (tty->ldisc)
> - return 0;
> + goto out_unlock;
>  
>   retval = tty_ldisc_reinit(tty, tty->termios.c_line);
>   if (retval)
>   tty->count--;
>  
> +out_unlock:
> + tty_ldisc_unlock(tty);
>   return retval;
>  }

-ss


[PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

2018-08-28 Thread Dmitry Safonov
tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup()
nor set_ldisc() nor tty_ldisc_release() as they use tty lock.
But it races with anyone who expects line discipline to be the same
after hoding read semaphore in tty_ldisc_ref().

We've seen the following crash on v4.9.108 stable:

BUG: unable to handle kernel paging request at 2260
IP: [..] n_tty_receive_buf_common+0x5f/0x86d
Workqueue: events_unbound flush_to_ldisc
Call Trace:
 [..] n_tty_receive_buf2
 [..] tty_ldisc_receive_buf
 [..] flush_to_ldisc
 [..] process_one_work
 [..] worker_thread
 [..] kthread
 [..] ret_from_fork

I think, tty_ldisc_reinit() should be called with ldisc_sem hold for
writing, which will protect any reader against line discipline changes.

Note: I failed to reproduce the described crash, so obiviously can't
guarantee that this is the place where line discipline was switched.

Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: sta...@vger.kernel.org
Signed-off-by: Dmitry Safonov 
---
 drivers/tty/tty_io.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 5e5da9acaf0a..3ef8b977b167 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1267,15 +1267,20 @@ static int tty_reopen(struct tty_struct *tty)
if (test_bit(TTY_EXCLUSIVE, &tty->flags) && !capable(CAP_SYS_ADMIN))
return -EBUSY;
 
-   tty->count++;
+   retval = tty_ldisc_lock(tty, 5 * HZ);
+   if (retval)
+   return retval;
 
+   tty->count++;
if (tty->ldisc)
-   return 0;
+   goto out_unlock;
 
retval = tty_ldisc_reinit(tty, tty->termios.c_line);
if (retval)
tty->count--;
 
+out_unlock:
+   tty_ldisc_unlock(tty);
return retval;
 }
 
-- 
2.13.6