Re: net-next boot error
On Thu, Jul 26, 2018 at 10:17:48AM -0400, Steven Rostedt wrote: > > [ Added Thomas Gleixner ] > > > On Thu, 26 Jul 2018 11:34:39 +0200 > Dmitry Vyukov wrote: > > > On Thu, Jul 26, 2018 at 11:29 AM, syzbot > > wrote: > > > Hello, > > > > > > syzbot found the following crash on: > > > > > > HEAD commit:dc66fe43b7eb rds: send: Fix dead code in rds_sendmsg > > > git tree: net-next > > > console output: https://syzkaller.appspot.com/x/log.txt?x=127874c840 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f34ce142a9f5f0e8 > > > dashboard link: > > > https://syzkaller.appspot.com/bug?extid=604f8271211546f5b3c7 > > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > Reported-by: syzbot+604f8271211546f5b...@syzkaller.appspotmail.com > > > > > > possible deadlock in static_key_slow_incsd 0:0:1:0: [sda] Attached SCSI > > > disk > > > MACsec IEEE 802.1AE > > > tun: Universal TUN/TAP device driver, 1.6 > > > > > > > > > WARNING: possible recursive locking detected > > > > +Tetsuo, perhaps this boot lockdep problem then disables lockdep for > > actual testing. I think lockdep should respect panic_on_warn. > > > > > > > 4.18.0-rc6+ #141 Not tainted > > > > > > swapper/0/1 is trying to acquire lock: > > > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > > > static_key_slow_inc+0x12/0x30 kernel/jump_label.c:124 > > > > > > but task is already holding lock: > > > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: get_online_cpus > > > include/linux/cpu.h:126 [inline] > > > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > > > init_vqs+0xe1a/0x1520 > > > drivers/net/virtio_net.c:2777 > > Here init_vqs() does: > > get_online_cpus(); > virtnet_set_affinity(vi); > put_online_cpus(); > > Which disables cpu hotplug and calls virtnet_set_affinity() > > Note, get_online_cpus() is no longer recursive. > > > > > > > other info that might help us debug this: > > > Possible unsafe locking scenario: > > > > > >CPU0 > > > > > > lock(cpu_hotplug_lock.rw_sem); > > > lock(cpu_hotplug_lock.rw_sem); > > > > > > *** DEADLOCK *** > > > > > > May be due to missing lock nesting notation > > > > > > 3 locks held by swapper/0/1: > > > #0: (ptrval) (>mutex){}, at: device_lock > > > include/linux/device.h:1134 [inline] > > > #0: (ptrval) (>mutex){}, at: __driver_attach+0x15f/0x2f0 > > > drivers/base/dd.c:820 > > > #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: get_online_cpus > > > include/linux/cpu.h:126 [inline] > > > #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > > > init_vqs+0xe1a/0x1520 drivers/net/virtio_net.c:2777 > > > #2: (ptrval) (xps_map_mutex){+.+.}, at: > > > __netif_set_xps_queue+0x243/0x23f0 net/core/dev.c:2278 > > > > > > stack backtrace: > > > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc6+ #141 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > > Google 01/01/2011 > > > Call Trace: > > > __dump_stack lib/dump_stack.c:77 [inline] > > > dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 > > > print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] > > > check_deadlock kernel/locking/lockdep.c:1809 [inline] > > > validate_chain kernel/locking/lockdep.c:2405 [inline] > > > __lock_acquire.cold.65+0x1fb/0x486 kernel/locking/lockdep.c:3435 > > > lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 > > > percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:36 [inline] > > > percpu_down_read include/linux/percpu-rwsem.h:59 [inline] > > > cpus_read_lock+0x43/0xa0 kernel/cpu.c:289 > > > static_key_slow_inc+0x12/0x30 kernel/jump_label.c:124 > > > __netif_set_xps_queue+0xaac/0x23f0 net/core/dev.c:2320 > > > __netif_set_xps_queue() calls static_key_slow_inc() which will also do > a get_online_cpus() which will trigger this bug. > > There's a static_key_slow_inc_cpuslocked() version that should be used > when get_online_cpus() is already taken, but I see > __netif_set_xps_queue() is called from several places, and I doubt it > is always called with get_online_cpus() held. Thus just using the > cpuslocked() version is probably not sufficient of a fix. > > I don't know the code enough to offer other suggestions. > > -- Steve OK so the guess is it's due to combination of commit 04157469b7b848f4a9978b63b1ea2ce62ad3a0a3 Author: Amritha Nambiar Date: Fri Jun 29 21:26:46 2018 -0700 net: Use static_key for XPS maps which uses static_key_slow_inc and commit 8af2c06ff4b144064b51b7f688194474123d9c9c Author: Amritha Nambiar Date: Fri Jun 29 21:27:07 2018 -0700 net-sysfs: Add interface for Rx queue(s) map per Tx queue which makes it all
Re: net-next boot error
[ Added Thomas Gleixner ] On Thu, 26 Jul 2018 11:34:39 +0200 Dmitry Vyukov wrote: > On Thu, Jul 26, 2018 at 11:29 AM, syzbot > wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit:dc66fe43b7eb rds: send: Fix dead code in rds_sendmsg > > git tree: net-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=127874c840 > > kernel config: https://syzkaller.appspot.com/x/.config?x=f34ce142a9f5f0e8 > > dashboard link: https://syzkaller.appspot.com/bug?extid=604f8271211546f5b3c7 > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+604f8271211546f5b...@syzkaller.appspotmail.com > > > > possible deadlock in static_key_slow_incsd 0:0:1:0: [sda] Attached SCSI disk > > MACsec IEEE 802.1AE > > tun: Universal TUN/TAP device driver, 1.6 > > > > > > WARNING: possible recursive locking detected > > +Tetsuo, perhaps this boot lockdep problem then disables lockdep for > actual testing. I think lockdep should respect panic_on_warn. > > > > 4.18.0-rc6+ #141 Not tainted > > > > swapper/0/1 is trying to acquire lock: > > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > > static_key_slow_inc+0x12/0x30 kernel/jump_label.c:124 > > > > but task is already holding lock: > > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: get_online_cpus > > include/linux/cpu.h:126 [inline] > > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: init_vqs+0xe1a/0x1520 > > drivers/net/virtio_net.c:2777 Here init_vqs() does: get_online_cpus(); virtnet_set_affinity(vi); put_online_cpus(); Which disables cpu hotplug and calls virtnet_set_affinity() Note, get_online_cpus() is no longer recursive. > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > >CPU0 > > > > lock(cpu_hotplug_lock.rw_sem); > > lock(cpu_hotplug_lock.rw_sem); > > > > *** DEADLOCK *** > > > > May be due to missing lock nesting notation > > > > 3 locks held by swapper/0/1: > > #0: (ptrval) (>mutex){}, at: device_lock > > include/linux/device.h:1134 [inline] > > #0: (ptrval) (>mutex){}, at: __driver_attach+0x15f/0x2f0 > > drivers/base/dd.c:820 > > #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: get_online_cpus > > include/linux/cpu.h:126 [inline] > > #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > > init_vqs+0xe1a/0x1520 drivers/net/virtio_net.c:2777 > > #2: (ptrval) (xps_map_mutex){+.+.}, at: > > __netif_set_xps_queue+0x243/0x23f0 net/core/dev.c:2278 > > > > stack backtrace: > > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc6+ #141 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 > > print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] > > check_deadlock kernel/locking/lockdep.c:1809 [inline] > > validate_chain kernel/locking/lockdep.c:2405 [inline] > > __lock_acquire.cold.65+0x1fb/0x486 kernel/locking/lockdep.c:3435 > > lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 > > percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:36 [inline] > > percpu_down_read include/linux/percpu-rwsem.h:59 [inline] > > cpus_read_lock+0x43/0xa0 kernel/cpu.c:289 > > static_key_slow_inc+0x12/0x30 kernel/jump_label.c:124 > > __netif_set_xps_queue+0xaac/0x23f0 net/core/dev.c:2320 __netif_set_xps_queue() calls static_key_slow_inc() which will also do a get_online_cpus() which will trigger this bug. There's a static_key_slow_inc_cpuslocked() version that should be used when get_online_cpus() is already taken, but I see __netif_set_xps_queue() is called from several places, and I doubt it is always called with get_online_cpus() held. Thus just using the cpuslocked() version is probably not sufficient of a fix. I don't know the code enough to offer other suggestions. -- Steve > > netif_set_xps_queue+0x26/0x30 net/core/dev.c:2455 > > virtnet_set_affinity+0x2ba/0x4b0 drivers/net/virtio_net.c:1944 > > init_vqs+0xe22/0x1520 drivers/net/virtio_net.c:2778 > > virtnet_probe+0x1092/0x2260 drivers/net/virtio_net.c:3016 > > virtio_dev_probe+0x592/0x942 drivers/virtio/virtio.c:245 > > really_probe drivers/base/dd.c:446 [inline] > > driver_probe_device+0x6ad/0x970 drivers/base/dd.c:588 > > __driver_attach+0x28b/0x2f0 drivers/base/dd.c:822 > > bus_for_each_dev+0x15d/0x1f0 drivers/base/bus.c:311 > > driver_attach+0x3d/0x50 drivers/base/dd.c:841 > > bus_add_driver+0x4b2/0x600 drivers/base/bus.c:667 > > driver_register+0x1c8/0x320 drivers/base/driver.c:170 > > register_virtio_driver+0x79/0xd0
Re: net-next boot error
On Thu, Jul 26, 2018 at 11:29 AM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit:dc66fe43b7eb rds: send: Fix dead code in rds_sendmsg > git tree: net-next > console output: https://syzkaller.appspot.com/x/log.txt?x=127874c840 > kernel config: https://syzkaller.appspot.com/x/.config?x=f34ce142a9f5f0e8 > dashboard link: https://syzkaller.appspot.com/bug?extid=604f8271211546f5b3c7 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+604f8271211546f5b...@syzkaller.appspotmail.com > > possible deadlock in static_key_slow_incsd 0:0:1:0: [sda] Attached SCSI disk > MACsec IEEE 802.1AE > tun: Universal TUN/TAP device driver, 1.6 > > > WARNING: possible recursive locking detected +Tetsuo, perhaps this boot lockdep problem then disables lockdep for actual testing. I think lockdep should respect panic_on_warn. > 4.18.0-rc6+ #141 Not tainted > > swapper/0/1 is trying to acquire lock: > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > static_key_slow_inc+0x12/0x30 kernel/jump_label.c:124 > > but task is already holding lock: > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: get_online_cpus > include/linux/cpu.h:126 [inline] > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: init_vqs+0xe1a/0x1520 > drivers/net/virtio_net.c:2777 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(cpu_hotplug_lock.rw_sem); > lock(cpu_hotplug_lock.rw_sem); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by swapper/0/1: > #0: (ptrval) (>mutex){}, at: device_lock > include/linux/device.h:1134 [inline] > #0: (ptrval) (>mutex){}, at: __driver_attach+0x15f/0x2f0 > drivers/base/dd.c:820 > #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: get_online_cpus > include/linux/cpu.h:126 [inline] > #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > init_vqs+0xe1a/0x1520 drivers/net/virtio_net.c:2777 > #2: (ptrval) (xps_map_mutex){+.+.}, at: > __netif_set_xps_queue+0x243/0x23f0 net/core/dev.c:2278 > > stack backtrace: > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc6+ #141 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 > print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] > check_deadlock kernel/locking/lockdep.c:1809 [inline] > validate_chain kernel/locking/lockdep.c:2405 [inline] > __lock_acquire.cold.65+0x1fb/0x486 kernel/locking/lockdep.c:3435 > lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 > percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:36 [inline] > percpu_down_read include/linux/percpu-rwsem.h:59 [inline] > cpus_read_lock+0x43/0xa0 kernel/cpu.c:289 > static_key_slow_inc+0x12/0x30 kernel/jump_label.c:124 > __netif_set_xps_queue+0xaac/0x23f0 net/core/dev.c:2320 > netif_set_xps_queue+0x26/0x30 net/core/dev.c:2455 > virtnet_set_affinity+0x2ba/0x4b0 drivers/net/virtio_net.c:1944 > init_vqs+0xe22/0x1520 drivers/net/virtio_net.c:2778 > virtnet_probe+0x1092/0x2260 drivers/net/virtio_net.c:3016 > virtio_dev_probe+0x592/0x942 drivers/virtio/virtio.c:245 > really_probe drivers/base/dd.c:446 [inline] > driver_probe_device+0x6ad/0x970 drivers/base/dd.c:588 > __driver_attach+0x28b/0x2f0 drivers/base/dd.c:822 > bus_for_each_dev+0x15d/0x1f0 drivers/base/bus.c:311 > driver_attach+0x3d/0x50 drivers/base/dd.c:841 > bus_add_driver+0x4b2/0x600 drivers/base/bus.c:667 > driver_register+0x1c8/0x320 drivers/base/driver.c:170 > register_virtio_driver+0x79/0xd0 drivers/virtio/virtio.c:296 > virtio_net_driver_init+0x8d/0xc9 drivers/net/virtio_net.c:3209 > do_one_initcall+0x127/0x913 init/main.c:884 > do_initcall_level init/main.c:952 [inline] > do_initcalls init/main.c:960 [inline] > do_basic_setup init/main.c:978 [inline] > kernel_init_freeable+0x49b/0x58e init/main.c:1135 > kernel_init+0x11/0x1b3 init/main.c:1061 > ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 > vcan: Virtual CAN interface driver > vxcan: Virtual CAN Tunnel driver > slcan: serial line CAN interface driver > slcan: 10 dynamic interface channels. > CAN device driver interface > enic: Cisco VIC Ethernet NIC Driver, ver 2.3.0.53 > e100: Intel(R) PRO/100 Network Driver, 3.5.24-k2-NAPI > e100: Copyright(c) 1999-2006 Intel Corporation > e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI > e1000: Copyright (c) 1999-2006 Intel Corporation. > e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k > e1000e: Copyright(c) 1999 - 2015 Intel Corporation. > sky2: driver version 1.30 > PPP generic