Re: BUG: spinlock bad magic in tun_do_read
On 2018年05月09日 10:50, Cong Wang wrote: On Mon, May 7, 2018 at 11:04 PM, Eric Dumazet wrote: On 05/07/2018 10:54 PM, Cong Wang wrote: Yeah, we should return early before hitting this uninitialized ptr ring... Something like: diff --git a/drivers/net/tun.c b/drivers/net/tun.c index ef33950a45d9..638c87a95247 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2128,6 +2128,9 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err) void *ptr = NULL; int error = 0; + if (!tfile->tx_ring.queue) + goto out; + Or, checking if tun is detached... tx_ring was properly initialized when first ptr_ring_consume() at line 2131 was attempted. The bug happens later at line 2143 , after a schedule() call, line 2155 So a single check at function prologue wont solve the case the thread had to sleep, then some uninit happened. Very good point. RTNL lock is supposed to protect cleanup path, but I don't think we can acquire RTNL for tun_chr_read_iter() path... I think the root cause is we try to initialize ptr ring during TUNSETIFF since the length depends on the dev->tx_queue_len and try to destroy it when device is gone. We can solve this by initializing a zero size ptr_ring during open() and resize if necessary. Then there no need for any workaround like memset and checking against NULL. Let me try to cook a patch for this. Thanks
Re: BUG: spinlock bad magic in tun_do_read
On Mon, May 7, 2018 at 11:04 PM, Eric Dumazet wrote: > > > On 05/07/2018 10:54 PM, Cong Wang wrote: >> >> Yeah, we should return early before hitting this uninitialized ptr ring... >> Something like: >> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >> index ef33950a45d9..638c87a95247 100644 >> --- a/drivers/net/tun.c >> +++ b/drivers/net/tun.c >> @@ -2128,6 +2128,9 @@ static void *tun_ring_recv(struct tun_file >> *tfile, int noblock, int *err) >> void *ptr = NULL; >> int error = 0; >> >> + if (!tfile->tx_ring.queue) >> + goto out; >> + >> >> Or, checking if tun is detached... >> >> > > tx_ring was properly initialized when first ptr_ring_consume() at line 2131 > was attempted. > > The bug happens later at line 2143 , after a schedule() call, line 2155 > > So a single check at function prologue wont solve the case the thread had to > sleep, > then some uninit happened. Very good point. RTNL lock is supposed to protect cleanup path, but I don't think we can acquire RTNL for tun_chr_read_iter() path...
Re: BUG: spinlock bad magic in tun_do_read
On 05/07/2018 10:54 PM, Cong Wang wrote: > On Mon, May 7, 2018 at 10:27 PM, syzbot > wrote: >> Hello, >> >> syzbot found the following crash on: >> >> HEAD commit:75bc37fefc44 Linux 4.17-rc4 >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=1162c69780 >> kernel config: https://syzkaller.appspot.com/x/.config?x=31f4b3733894ef79 >> dashboard link: https://syzkaller.appspot.com/bug?extid=e8b902c3c3fadf0a9dba >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> userspace arch: i386 >> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=172e4c9780 >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: >> Reported-by: syzbot+e8b902c3c3fadf0a9...@syzkaller.appspotmail.com >> >> random: sshd: uninitialized urandom read (32 bytes read) >> random: sshd: uninitialized urandom read (32 bytes read) >> random: sshd: uninitialized urandom read (32 bytes read) >> IPVS: ftp: loaded support on port[0] = 21 >> BUG: spinlock bad magic on CPU#0, syz-executor0/4586 >> lock: 0x8801ae8928c8, .magic: , .owner: /-1, .owner_cpu: >> 0 >> CPU: 0 PID: 4586 Comm: syz-executor0 Not tainted 4.17.0-rc4+ #62 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >> Google 01/01/2011 >> Call Trace: >> __dump_stack lib/dump_stack.c:77 [inline] >> dump_stack+0x1b9/0x294 lib/dump_stack.c:113 >> spin_dump+0x160/0x169 kernel/locking/spinlock_debug.c:67 >> spin_bug kernel/locking/spinlock_debug.c:75 [inline] >> debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline] >> do_raw_spin_lock.cold.3+0x37/0x3c kernel/locking/spinlock_debug.c:112 >> __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline] >> _raw_spin_lock+0x32/0x40 kernel/locking/spinlock.c:144 >> spin_lock include/linux/spinlock.h:310 [inline] >> ptr_ring_consume include/linux/ptr_ring.h:335 [inline] >> tun_ring_recv drivers/net/tun.c:2143 [inline] > > Yeah, we should return early before hitting this uninitialized ptr ring... > Something like: > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index ef33950a45d9..638c87a95247 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -2128,6 +2128,9 @@ static void *tun_ring_recv(struct tun_file > *tfile, int noblock, int *err) > void *ptr = NULL; > int error = 0; > > + if (!tfile->tx_ring.queue) > + goto out; > + > > Or, checking if tun is detached... > > tx_ring was properly initialized when first ptr_ring_consume() at line 2131 was attempted. The bug happens later at line 2143 , after a schedule() call, line 2155 So a single check at function prologue wont solve the case the thread had to sleep, then some uninit happened.
Re: BUG: spinlock bad magic in tun_do_read
On Mon, May 7, 2018 at 10:27 PM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit:75bc37fefc44 Linux 4.17-rc4 > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1162c69780 > kernel config: https://syzkaller.appspot.com/x/.config?x=31f4b3733894ef79 > dashboard link: https://syzkaller.appspot.com/bug?extid=e8b902c3c3fadf0a9dba > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > userspace arch: i386 > syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=172e4c9780 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+e8b902c3c3fadf0a9...@syzkaller.appspotmail.com > > random: sshd: uninitialized urandom read (32 bytes read) > random: sshd: uninitialized urandom read (32 bytes read) > random: sshd: uninitialized urandom read (32 bytes read) > IPVS: ftp: loaded support on port[0] = 21 > BUG: spinlock bad magic on CPU#0, syz-executor0/4586 > lock: 0x8801ae8928c8, .magic: , .owner: /-1, .owner_cpu: > 0 > CPU: 0 PID: 4586 Comm: syz-executor0 Not tainted 4.17.0-rc4+ #62 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x1b9/0x294 lib/dump_stack.c:113 > spin_dump+0x160/0x169 kernel/locking/spinlock_debug.c:67 > spin_bug kernel/locking/spinlock_debug.c:75 [inline] > debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline] > do_raw_spin_lock.cold.3+0x37/0x3c kernel/locking/spinlock_debug.c:112 > __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline] > _raw_spin_lock+0x32/0x40 kernel/locking/spinlock.c:144 > spin_lock include/linux/spinlock.h:310 [inline] > ptr_ring_consume include/linux/ptr_ring.h:335 [inline] > tun_ring_recv drivers/net/tun.c:2143 [inline] Yeah, we should return early before hitting this uninitialized ptr ring... Something like: diff --git a/drivers/net/tun.c b/drivers/net/tun.c index ef33950a45d9..638c87a95247 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2128,6 +2128,9 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err) void *ptr = NULL; int error = 0; + if (!tfile->tx_ring.queue) + goto out; + Or, checking if tun is detached... > tun_do_read+0x18b1/0x29f0 drivers/net/tun.c:2182 > tun_chr_read_iter+0xe5/0x1e0 drivers/net/tun.c:2214 > call_read_iter include/linux/fs.h:1778 [inline] > new_sync_read fs/read_write.c:406 [inline] > __vfs_read+0x696/0xa50 fs/read_write.c:418 > vfs_read+0x17f/0x3d0 fs/read_write.c:452 > ksys_pread64+0x174/0x1a0 fs/read_write.c:626 > __do_compat_sys_x86_pread arch/x86/ia32/sys_ia32.c:177 [inline] > __se_compat_sys_x86_pread arch/x86/ia32/sys_ia32.c:174 [inline] > __ia32_compat_sys_x86_pread+0xc4/0x130 arch/x86/ia32/sys_ia32.c:174 > do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline] > do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394 > entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 > RIP: 0023:0xf7fc0cb9 > RSP: 002b:f7fbc0ac EFLAGS: 0282 ORIG_RAX: 00b4 > RAX: ffda RBX: 0003 RCX: 2080 > RDX: 006e RSI: RDI: > RBP: R08: R09: > R10: R11: 0292 R12: > R13: R14: R15: > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkal...@googlegroups.com. > > syzbot will keep track of this bug report. > If you forgot to add the Reported-by tag, once the fix for this bug is > merged > into any tree, please reply to this email with: > #syz fix: exact-commit-title > If you want to test a patch for this bug, please reply with: > #syz test: git://repo/address.git branch > and provide the patch inline or as an attachment. > To mark this as a duplicate of another syzbot report, please reply with: > #syz dup: exact-subject-of-another-report > If it's a one-off invalid bug report, please reply with: > #syz invalid > Note: if the crash happens again, it will cause creation of a new bug > report. > Note: all commands must start from beginning of the line in the email body.