Re: net: BUG still has locks held in unix_stream_splice_read

2016-11-17 Thread Hannes Frederic Sowa
On 17.11.2016 22:44, Cong Wang wrote:
> On Sun, Oct 9, 2016 at 8:14 PM, Al Viro  wrote:
>> E.g what will happen if some code does a read on AF_UNIX socket with
>> some local mutex held?  AFAICS, there are exactly two callers of
>> freezable_schedule_timeout() - this one and one in XFS; the latter is
>> in a kernel thread where we do have good warranties about the locking
>> environment, but here it's in the bleeding ->recvmsg/->splice_read and
>> for those assumption that caller doesn't hold any locks is pretty
>> strong, especially since it's not documented anywhere.
>>
>> What's going on there?
> 
> Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
> converts schedule_timeout() to its freezable version, it was probably correct
> at that time, but later, commit 2b514574f7e88c8498027ee366
> ("net: af_unix: implement splice for stream af_unix sockets") breaks its
> requirement for a freezable sleep:
> 
> commit 0f9548ca10916dec166eaf74c816bded7d8e611d
> 
> lockdep: check that no locks held at freeze time
> 
> We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
> deadlock if the lock is later acquired in the suspend or hibernate path
> (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
> cgroup_freezer if a lock is held inside a frozen cgroup that is later
> acquired by a process outside that group.
> 
> So probably we just need to revert commit 2b15af6f95 now.
> 
> I am going to send a revert for at least -net and -stable, since Dmitry
> saw this warning again.

I am not an expert on freezing but this looks around right from the
freezer code. Awesome, thanks a lot for spotting this one!





Re: net: BUG still has locks held in unix_stream_splice_read

2016-11-17 Thread Cong Wang
On Sun, Oct 9, 2016 at 8:14 PM, Al Viro  wrote:
> E.g what will happen if some code does a read on AF_UNIX socket with
> some local mutex held?  AFAICS, there are exactly two callers of
> freezable_schedule_timeout() - this one and one in XFS; the latter is
> in a kernel thread where we do have good warranties about the locking
> environment, but here it's in the bleeding ->recvmsg/->splice_read and
> for those assumption that caller doesn't hold any locks is pretty
> strong, especially since it's not documented anywhere.
>
> What's going on there?

Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably correct
at that time, but later, commit 2b514574f7e88c8498027ee366
("net: af_unix: implement splice for stream af_unix sockets") breaks its
requirement for a freezable sleep:

commit 0f9548ca10916dec166eaf74c816bded7d8e611d

lockdep: check that no locks held at freeze time

We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
deadlock if the lock is later acquired in the suspend or hibernate path
(e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
cgroup_freezer if a lock is held inside a frozen cgroup that is later
acquired by a process outside that group.

So probably we just need to revert commit 2b15af6f95 now.

I am going to send a revert for at least -net and -stable, since Dmitry
saw this warning again.


Re: net: BUG still has locks held in unix_stream_splice_read

2016-11-17 Thread Dmitry Vyukov
On Mon, Oct 10, 2016 at 10:01 AM, Dmitry Vyukov  wrote:
> On Mon, Oct 10, 2016 at 5:14 AM, Al Viro  wrote:
>> On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote:
>>> On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
>>> > I suspect this is:
>>> >
>>> > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
>>> > Author: Al Viro 
>>> > Date:   Sat Sep 17 21:02:10 2016 -0400
>>> > skb_splice_bits(): get rid of callback
>>> > since pipe_lock is the outermost now, we don't need to drop/regain
>>> > socket locks around the call of splice_to_pipe() from 
>>> > skb_splice_bits(),
>>> > which kills the need to have a socket-specific callback; we can just
>>> > call splice_to_pipe() and be done with that.
>>>
>>> Unlikely, since that particular commit removes unlocking/relocking ->iolock
>>> around the call of splice_to_pipe().  Original would've retaken the same
>>> lock on the way out; it's not as if we could leave the syscall there.
>>>
>>> It might be splice-related, but I don't believe that you've got the right
>>> commit here.
>>
>> It's not that commit
>
> It's highly likely. Sorry for falsely pointing to your commit.
>
>
>> , all right - it's "can't call unix_stream_read_generic()
>> with any locks held" stepped onto a couple of commits prior by
>> "splice: lift pipe_lock out of splice_to_pipe()".  Could somebody explain
>> what is that about?
>>
>> E.g what will happen if some code does a read on AF_UNIX socket with
>> some local mutex held?  AFAICS, there are exactly two callers of
>> freezable_schedule_timeout() - this one and one in XFS; the latter is
>> in a kernel thread where we do have good warranties about the locking
>> environment, but here it's in the bleeding ->recvmsg/->splice_read and
>> for those assumption that caller doesn't hold any locks is pretty
>> strong, especially since it's not documented anywhere.
>>
>> What's going on there?
>
> I never saw that warning before. There is some possibility that fuzzer
> has discovered some new paths, but it's much more likely that
> something has changed recently (the stack looks quite simple -- just a
> splice from unix socket). And my previous pull was like a week ago.

Ping. Just hit it again on 4.9-rc5


[ BUG: syz-executor/15922 still has locks held! ]
4.9.0-rc5+ #43 Not tainted
-
1 lock held by syz-executor/15922:
 #0: [ 1441.143288]  (
[< inline >] pipe_lock_nested fs/pipe.c:66
[] pipe_lock+0x5b/0x70 fs/pipe.c:74

stack backtrace:
CPU: 3 PID: 15922 Comm: syz-executor Not tainted 4.9.0-rc5+ #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 88004a98f568 834c2a19 0003 110009531e40
 ed0009531e38 41b58ab3 895758b0 834c272b
 0003 880035256640 0003 88006d122cd8
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
 [< inline >] print_held_locks_bug kernel/locking/lockdep.c:4296
 [] debug_check_no_locks_held+0x125/0x140
kernel/locking/lockdep.c:4302
 [< inline >] try_to_freeze include/linux/freezer.h:65
 [< inline >] freezer_count include/linux/freezer.h:127
 [< inline >] freezable_schedule_timeout include/linux/freezer.h:192
 [] unix_stream_data_wait+0x4fd/0x910 net/unix/af_unix.c:2223
 [] unix_stream_read_generic+0x11e2/0x2240
net/unix/af_unix.c:2332
 [] unix_stream_splice_read+0x27f/0x400
net/unix/af_unix.c:2506
 [] sock_splice_read+0xbe/0x100 net/socket.c:772
 [] do_splice_to+0x10f/0x170 fs/splice.c:897
 [< inline >] do_splice fs/splice.c:1185
 [< inline >] SYSC_splice fs/splice.c:1409
 [] SyS_splice+0xfaa/0x16a0 fs/splice.c:1392
 [] entry_SYSCALL_64_fastpath+0x23/0xc6


Re: net: BUG still has locks held in unix_stream_splice_read

2016-10-10 Thread Dmitry Vyukov
On Mon, Oct 10, 2016 at 5:14 AM, Al Viro  wrote:
> On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote:
>> On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
>> > I suspect this is:
>> >
>> > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
>> > Author: Al Viro 
>> > Date:   Sat Sep 17 21:02:10 2016 -0400
>> > skb_splice_bits(): get rid of callback
>> > since pipe_lock is the outermost now, we don't need to drop/regain
>> > socket locks around the call of splice_to_pipe() from 
>> > skb_splice_bits(),
>> > which kills the need to have a socket-specific callback; we can just
>> > call splice_to_pipe() and be done with that.
>>
>> Unlikely, since that particular commit removes unlocking/relocking ->iolock
>> around the call of splice_to_pipe().  Original would've retaken the same
>> lock on the way out; it's not as if we could leave the syscall there.
>>
>> It might be splice-related, but I don't believe that you've got the right
>> commit here.
>
> It's not that commit

It's highly likely. Sorry for falsely pointing to your commit.


> , all right - it's "can't call unix_stream_read_generic()
> with any locks held" stepped onto a couple of commits prior by
> "splice: lift pipe_lock out of splice_to_pipe()".  Could somebody explain
> what is that about?
>
> E.g what will happen if some code does a read on AF_UNIX socket with
> some local mutex held?  AFAICS, there are exactly two callers of
> freezable_schedule_timeout() - this one and one in XFS; the latter is
> in a kernel thread where we do have good warranties about the locking
> environment, but here it's in the bleeding ->recvmsg/->splice_read and
> for those assumption that caller doesn't hold any locks is pretty
> strong, especially since it's not documented anywhere.
>
> What's going on there?

I never saw that warning before. There is some possibility that fuzzer
has discovered some new paths, but it's much more likely that
something has changed recently (the stack looks quite simple -- just a
splice from unix socket). And my previous pull was like a week ago.


Re: net: BUG still has locks held in unix_stream_splice_read

2016-10-09 Thread Al Viro
On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote:
> On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
> > I suspect this is:
> > 
> > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
> > Author: Al Viro 
> > Date:   Sat Sep 17 21:02:10 2016 -0400
> > skb_splice_bits(): get rid of callback
> > since pipe_lock is the outermost now, we don't need to drop/regain
> > socket locks around the call of splice_to_pipe() from skb_splice_bits(),
> > which kills the need to have a socket-specific callback; we can just
> > call splice_to_pipe() and be done with that.
> 
> Unlikely, since that particular commit removes unlocking/relocking ->iolock
> around the call of splice_to_pipe().  Original would've retaken the same
> lock on the way out; it's not as if we could leave the syscall there.
> 
> It might be splice-related, but I don't believe that you've got the right
> commit here.

It's not that commit, all right - it's "can't call unix_stream_read_generic()
with any locks held" stepped onto a couple of commits prior by
"splice: lift pipe_lock out of splice_to_pipe()".  Could somebody explain
what is that about?

E.g what will happen if some code does a read on AF_UNIX socket with
some local mutex held?  AFAICS, there are exactly two callers of
freezable_schedule_timeout() - this one and one in XFS; the latter is
in a kernel thread where we do have good warranties about the locking
environment, but here it's in the bleeding ->recvmsg/->splice_read and
for those assumption that caller doesn't hold any locks is pretty
strong, especially since it's not documented anywhere.

What's going on there?


Re: net: BUG still has locks held in unix_stream_splice_read

2016-10-09 Thread Al Viro
On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
> I suspect this is:
> 
> commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
> Author: Al Viro 
> Date:   Sat Sep 17 21:02:10 2016 -0400
> skb_splice_bits(): get rid of callback
> since pipe_lock is the outermost now, we don't need to drop/regain
> socket locks around the call of splice_to_pipe() from skb_splice_bits(),
> which kills the need to have a socket-specific callback; we can just
> call splice_to_pipe() and be done with that.

Unlikely, since that particular commit removes unlocking/relocking ->iolock
around the call of splice_to_pipe().  Original would've retaken the same
lock on the way out; it's not as if we could leave the syscall there.

It might be splice-related, but I don't believe that you've got the right
commit here.


Re: net: BUG still has locks held in unix_stream_splice_read

2016-10-09 Thread Dmitry Vyukov
Hello,

While running syzkaller fuzzer on commit
b66484cd74706fa8681d051840fe4b18a3da40ff (Oct 7), I am getting:

[ BUG: syz-executor/15138 still has locks held! ]
4.8.0+ #29 Not tainted
-
1 lock held by syz-executor/15138:
 #0:  (>mutex/1){+.+.+.}, at: [< inline >]
pipe_lock_nested fs/pipe.c:66
 #0:  (>mutex/1){+.+.+.}, at: []
pipe_lock+0x5b/0x70 fs/pipe.c:74

stack backtrace:
CPU: 1 PID: 15138 Comm: syz-executor Not tainted 4.8.0+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
 880044d4fa38 82d383c9  fbfff1097248
 88005a44a3c0 88005a44a3c0 dc00 88005a44a3c0
 8800541fb9b8 880044d4fa58 81463cd5 
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0x12e/0x185 lib/dump_stack.c:51
 [< inline >] print_held_locks_bug kernel/locking/lockdep.c:4296
 [] debug_check_no_locks_held+0x125/0x140
kernel/locking/lockdep.c:4302
 [< inline >] try_to_freeze include/linux/freezer.h:65
 [< inline >] freezer_count include/linux/freezer.h:127
 [< inline >] freezable_schedule_timeout include/linux/freezer.h:192
 [< inline >] unix_stream_data_wait net/unix/af_unix.c:2223
 [] unix_stream_read_generic+0x1317/0x1b70
net/unix/af_unix.c:2332
 [] unix_stream_splice_read+0x15b/0x1d0
net/unix/af_unix.c:2506
 [] sock_splice_read+0xbe/0x100 net/socket.c:775
 [] do_splice_to+0x10f/0x170 fs/splice.c:908
 [< inline >] do_splice fs/splice.c:1196
 [< inline >] SYSC_splice fs/splice.c:1420
 [] SyS_splice+0x114c/0x15b0 fs/splice.c:1403
 [] entry_SYSCALL_64_fastpath+0x23/0xc6


I suspect this is:

commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
Author: Al Viro 
Date:   Sat Sep 17 21:02:10 2016 -0400
skb_splice_bits(): get rid of callback
since pipe_lock is the outermost now, we don't need to drop/regain
socket locks around the call of splice_to_pipe() from skb_splice_bits(),
which kills the need to have a socket-specific callback; we can just
call splice_to_pipe() and be done with that.


net: BUG still has locks held in unix_stream_splice_read

2016-10-09 Thread Dmitry Vyukov
Hello,

While running syzkaller fuzzer on commit
b66484cd74706fa8681d051840fe4b18a3da40ff (Oct 7), I am getting:

[ BUG: syz-executor/15138 still has locks held! ]
4.8.0+ #29 Not tainted
-
1 lock held by syz-executor/15138:
 #0:  (>mutex/1){+.+.+.}, at: [< inline >]
pipe_lock_nested fs/pipe.c:66
 #0:  (>mutex/1){+.+.+.}, at: []
pipe_lock+0x5b/0x70 fs/pipe.c:74

stack backtrace:
CPU: 1 PID: 15138 Comm: syz-executor Not tainted 4.8.0+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
 880044d4fa38 82d383c9  fbfff1097248
 88005a44a3c0 88005a44a3c0 dc00 88005a44a3c0
 8800541fb9b8 880044d4fa58 81463cd5 
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0x12e/0x185 lib/dump_stack.c:51
 [< inline >] print_held_locks_bug kernel/locking/lockdep.c:4296
 [] debug_check_no_locks_held+0x125/0x140
kernel/locking/lockdep.c:4302
 [< inline >] try_to_freeze include/linux/freezer.h:65
 [< inline >] freezer_count include/linux/freezer.h:127
 [< inline >] freezable_schedule_timeout include/linux/freezer.h:192
 [< inline >] unix_stream_data_wait net/unix/af_unix.c:2223
 [] unix_stream_read_generic+0x1317/0x1b70
net/unix/af_unix.c:2332
 [] unix_stream_splice_read+0x15b/0x1d0
net/unix/af_unix.c:2506
 [] sock_splice_read+0xbe/0x100 net/socket.c:775
 [] do_splice_to+0x10f/0x170 fs/splice.c:908
 [< inline >] do_splice fs/splice.c:1196
 [< inline >] SYSC_splice fs/splice.c:1420
 [] SyS_splice+0x114c/0x15b0 fs/splice.c:1403
 [] entry_SYSCALL_64_fastpath+0x23/0xc6


I suspect this is:

commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
Author: Al Viro 
Date:   Sat Sep 17 21:02:10 2016 -0400
skb_splice_bits(): get rid of callback
since pipe_lock is the outermost now, we don't need to drop/regain
socket locks around the call of splice_to_pipe() from skb_splice_bits(),
which kills the need to have a socket-specific callback; we can just
call splice_to_pipe() and be done with that.