Re: IPv6 issue in next-20171102 - lockdep and BUG handling RA packet.
On 11/7/17 9:31 AM, Eric Dumazet wrote: > On Mon, Nov 6, 2017 at 4:29 PM, David Ahern wrote: >> On 11/7/17 5:56 AM, valdis.kletni...@vt.edu wrote: >>> I've hit this 6 times now, across 3 boots: >>> >>> Nov 3 11:04:54 turing-police kernel: [ 547.814748] BUG: sleeping function >>> called from invalid context at mm/slab.h:422 >>> >>> Nov 3 20:24:11 turing-police kernel: [ 60.093793] BUG: sleeping function >>> called from invalid context at mm/slab.h:422 >>> Nov 4 20:20:54 turing-police kernel: [86264.366955] BUG: sleeping function >>> called from invalid context at mm/slab.h:422 >>> Nov 5 19:17:40 turing-police kernel: [172469.769179] BUG: sleeping >>> function called from invalid context at mm/slab.h:422 >>> Nov 6 06:07:37 turing-police kernel: [211467.239460] BUG: sleeping >>> function called from invalid context at mm/slab.h:422 >>> >>> Nov 6 14:12:43 turing-police kernel: [ 54.891848] BUG: sleeping function >>> called from invalid context at mm/slab.h:422 >>> >>> Something seems to be going astray while handling a RA packet. >> >> Odd. I tested RA before sending the patches and again just now - no >> traceback with an RCU / lock debugging kernel. What is sending the RA's >> in your case? I'd like to understand the config and add a test for this. > > Do you have CONFIG_DEBUG_ATOMIC_SLEEP=y in your .config ? > I needed to set sysctl -w net.ipv6.conf.eth1.use_tempaddr=1. With that I see the trace.
Re: IPv6 issue in next-20171102 - lockdep and BUG handling RA packet.
On 11/7/17 5:56 AM, valdis.kletni...@vt.edu wrote: > I've hit this 6 times now, across 3 boots: > > Nov 3 11:04:54 turing-police kernel: [ 547.814748] BUG: sleeping function > called from invalid context at mm/slab.h:422 > > Nov 3 20:24:11 turing-police kernel: [ 60.093793] BUG: sleeping function > called from invalid context at mm/slab.h:422 > Nov 4 20:20:54 turing-police kernel: [86264.366955] BUG: sleeping function > called from invalid context at mm/slab.h:422 > Nov 5 19:17:40 turing-police kernel: [172469.769179] BUG: sleeping function > called from invalid context at mm/slab.h:422 > Nov 6 06:07:37 turing-police kernel: [211467.239460] BUG: sleeping function > called from invalid context at mm/slab.h:422 > > Nov 6 14:12:43 turing-police kernel: [ 54.891848] BUG: sleeping function > called from invalid context at mm/slab.h:422 > > Something seems to be going astray while handling a RA packet. Odd. I tested RA before sending the patches and again just now - no traceback with an RCU / lock debugging kernel. What is sending the RA's in your case? I'd like to understand the config and add a test for this.
Re: IPv6 issue in next-20171102 - lockdep and BUG handling RA packet.
On 11/7/17 9:31 AM, Eric Dumazet wrote: > Do you have CONFIG_DEBUG_ATOMIC_SLEEP=y in your .config ? dsa@kenny:mgmt:~/kernel-2.git$ grep CONFIG_DEBUG_ATOMIC_SLEEP kbuild/rcu-lock-debug/.config CONFIG_DEBUG_ATOMIC_SLEEP=y Yep, that is on.
Re: IPv6 issue in next-20171102 - lockdep and BUG handling RA packet.
On Mon, Nov 6, 2017 at 4:29 PM, David Ahern wrote: > On 11/7/17 5:56 AM, valdis.kletni...@vt.edu wrote: >> I've hit this 6 times now, across 3 boots: >> >> Nov 3 11:04:54 turing-police kernel: [ 547.814748] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> >> Nov 3 20:24:11 turing-police kernel: [ 60.093793] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> Nov 4 20:20:54 turing-police kernel: [86264.366955] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> Nov 5 19:17:40 turing-police kernel: [172469.769179] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> Nov 6 06:07:37 turing-police kernel: [211467.239460] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> >> Nov 6 14:12:43 turing-police kernel: [ 54.891848] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> >> Something seems to be going astray while handling a RA packet. > > Odd. I tested RA before sending the patches and again just now - no > traceback with an RCU / lock debugging kernel. What is sending the RA's > in your case? I'd like to understand the config and add a test for this. Do you have CONFIG_DEBUG_ATOMIC_SLEEP=y in your .config ?
Re: IPv6 issue in next-20171102 - lockdep and BUG handling RA packet.
On Mon, Nov 6, 2017 at 2:04 PM, Eric Dumazet wrote: > I have a patch, will send in a couple of minutes. Thanks. https://patchwork.ozlabs.org/patch/834983/ ipv6: addrconf: fix a lockdep splat
Re: IPv6 issue in next-20171102 - lockdep and BUG handling RA packet.
On Mon, Nov 6, 2017 at 2:01 PM, Ido Schimmel wrote: > On Mon, Nov 06, 2017 at 03:56:54PM -0500, valdis.kletni...@vt.edu wrote: >> I've hit this 6 times now, across 3 boots: >> >> Nov 3 11:04:54 turing-police kernel: [ 547.814748] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> >> Nov 3 20:24:11 turing-police kernel: [ 60.093793] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> Nov 4 20:20:54 turing-police kernel: [86264.366955] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> Nov 5 19:17:40 turing-police kernel: [172469.769179] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> Nov 6 06:07:37 turing-police kernel: [211467.239460] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> >> Nov 6 14:12:43 turing-police kernel: [ 54.891848] BUG: sleeping function >> called from invalid context at mm/slab.h:422 >> >> Something seems to be going astray while handling a RA packet. >> >> Kernel dirty due to hand-patching >> https://patchwork.kernel.org/patch/10003555/ >> (signed int:1 bitfield in sched.h causing tons of warnings) >> >> Unfortunately, the previous next- kernel I built was -20170927 (which worked >> OK). >> >> Googling for things in the traceback in the last month comes up empty, and >> only >> thing in the git log for net/ipv6 that looks vaguely related: >> >> commit f3d9832e56c48e4ca50bab0457e21bcaade4536d >> Author: David Ahern >> Date: Wed Oct 18 09:56:52 2017 -0700 >> >> ipv6: addrconf: cleanup locking in ipv6_add_addr > > Probably right... > > [...] > >> [ 54.891848] BUG: sleeping function called from invalid context at >> mm/slab.h:422 >> [ 54.891855] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0 >> [ 54.891859] INFO: lockdep is turned off. >> [ 54.891867] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW OE >> 4.14.0-rc7-next-20171102-dirty #537 >> [ 54.891872] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A20 >> 05/08/2017 >> [ 54.891877] Call Trace: >> [ 54.891882] >> [ 54.891894] dump_stack+0x7b/0xe4 >> [ 54.891907] ___might_sleep+0x1b0/0x300 >> [ 54.891921] kmem_cache_alloc_trace+0x2c7/0x500 >> [ 54.891931] ? cyc2ns_read_end+0x1e/0x30 >> [ 54.891944] ipv6_add_addr+0x15a/0xc30 >> [ 54.891977] ? ipv6_create_tempaddr+0x2ea/0x5d0 >> [ 54.891986] ipv6_create_tempaddr+0x2ea/0x5d0 > > This function is called from softirq so we should call ipv6_add_addr() > with 'can_block' set to 'false'. > > DavidA, it's already late here so you can probably fix this before I > take a closer look tomorrow morning. :) I have a patch, will send in a couple of minutes. Thanks.
Re: IPv6 issue in next-20171102 - lockdep and BUG handling RA packet.
On Mon, Nov 06, 2017 at 03:56:54PM -0500, valdis.kletni...@vt.edu wrote: > I've hit this 6 times now, across 3 boots: > > Nov 3 11:04:54 turing-police kernel: [ 547.814748] BUG: sleeping function > called from invalid context at mm/slab.h:422 > > Nov 3 20:24:11 turing-police kernel: [ 60.093793] BUG: sleeping function > called from invalid context at mm/slab.h:422 > Nov 4 20:20:54 turing-police kernel: [86264.366955] BUG: sleeping function > called from invalid context at mm/slab.h:422 > Nov 5 19:17:40 turing-police kernel: [172469.769179] BUG: sleeping function > called from invalid context at mm/slab.h:422 > Nov 6 06:07:37 turing-police kernel: [211467.239460] BUG: sleeping function > called from invalid context at mm/slab.h:422 > > Nov 6 14:12:43 turing-police kernel: [ 54.891848] BUG: sleeping function > called from invalid context at mm/slab.h:422 > > Something seems to be going astray while handling a RA packet. > > Kernel dirty due to hand-patching https://patchwork.kernel.org/patch/10003555/ > (signed int:1 bitfield in sched.h causing tons of warnings) > > Unfortunately, the previous next- kernel I built was -20170927 (which worked > OK). > > Googling for things in the traceback in the last month comes up empty, and > only > thing in the git log for net/ipv6 that looks vaguely related: > > commit f3d9832e56c48e4ca50bab0457e21bcaade4536d > Author: David Ahern > Date: Wed Oct 18 09:56:52 2017 -0700 > > ipv6: addrconf: cleanup locking in ipv6_add_addr Probably right... [...] > [ 54.891848] BUG: sleeping function called from invalid context at > mm/slab.h:422 > [ 54.891855] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0 > [ 54.891859] INFO: lockdep is turned off. > [ 54.891867] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW OE > 4.14.0-rc7-next-20171102-dirty #537 > [ 54.891872] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A20 > 05/08/2017 > [ 54.891877] Call Trace: > [ 54.891882] > [ 54.891894] dump_stack+0x7b/0xe4 > [ 54.891907] ___might_sleep+0x1b0/0x300 > [ 54.891921] kmem_cache_alloc_trace+0x2c7/0x500 > [ 54.891931] ? cyc2ns_read_end+0x1e/0x30 > [ 54.891944] ipv6_add_addr+0x15a/0xc30 > [ 54.891977] ? ipv6_create_tempaddr+0x2ea/0x5d0 > [ 54.891986] ipv6_create_tempaddr+0x2ea/0x5d0 This function is called from softirq so we should call ipv6_add_addr() with 'can_block' set to 'false'. DavidA, it's already late here so you can probably fix this before I take a closer look tomorrow morning. :) Thanks > [ 54.892007] ? manage_tempaddrs+0x195/0x220 > [ 54.892023] ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 > [ 54.892032] addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 > [ 54.892052] addrconf_prefix_rcv+0x2e5/0x9b0 > [ 54.892068] ? neigh_update+0x446/0xb90 > [ 54.892098] ? ndisc_router_discovery+0x5ab/0xf00 > [ 54.892107] ndisc_router_discovery+0x5ab/0xf00 > [ 54.892120] ? retint_kernel+0x2d/0x2d > [ 54.892152] ndisc_rcv+0x1b6/0x270 > [ 54.892167] icmpv6_rcv+0x6aa/0x9f0 > [ 54.892175] ? ipv6_chk_mcast_addr+0x176/0x530 > [ 54.892183] ? do_csum+0x17b/0x260 > [ 54.892197] ip6_input_finish+0x194/0xb20 > [ 54.892216] ip6_input+0x5b/0x2c0 > [ 54.892228] ? ip6_rcv_finish+0x320/0x320 > [ 54.892244] ip6_mc_input+0x15a/0x250 > [ 54.892252] ipv6_rcv+0x772/0x1050 > [ 54.892261] ? consume_skb+0xbe/0x2d0 > [ 54.892275] ? ip6_make_skb+0x2a0/0x2a0 > [ 54.892283] ? ip6_input+0x2c0/0x2c0 > [ 54.892298] __netif_receive_skb_core+0xa0f/0x1600 > [ 54.892316] ? process_backlog+0xac/0x400 > [ 54.892323] process_backlog+0xfa/0x400 > [ 54.892334] ? net_rx_action+0x145/0x1130 > [ 54.892348] net_rx_action+0x310/0x1130 > [ 54.892407] ? RTUSBBulkReceive+0x11d/0x190 [mt7610u_sta] > [ 54.892430] __do_softirq+0x140/0xaba > [ 54.892455] irq_exit+0x10b/0x160 > [ 54.892464] do_IRQ+0xbb/0x1b0 > [ 54.892476] common_interrupt+0x93/0x93 > [ 54.892481] > [ 54.892490] RIP: 0010:cpuidle_enter_state+0xe6/0x5f0 > [ 54.892496] RSP: 0018:9ca03e60 EFLAGS: 0282 ORIG_RAX: > ffda > [ 54.892508] RAX: 0011b081 RBX: c01bff617430 RCX: > > [ 54.892515] RDX: RSI: 0001 RDI: > 9ca2e580 > [ 54.892521] RBP: 000cbf23c844 R08: 0b84 R09: > > [ 54.892528] R10: R11: R12: > 0005 > [ 54.892534] R13: 000cbf23c844 R14: R15: > 000cbf0f0e82 > [ 54.892574] do_idle+0x191/0x3a0 > [ 54.892587] cpu_startup_entry+0x7b/0x90 > [ 54.892599] start_kernel+0x80b/0x841 > [ 54.892613] secondary_startup_64+0xa5/0xb0
IPv6 issue in next-20171102 - lockdep and BUG handling RA packet.
I've hit this 6 times now, across 3 boots: Nov 3 11:04:54 turing-police kernel: [ 547.814748] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 3 20:24:11 turing-police kernel: [ 60.093793] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 4 20:20:54 turing-police kernel: [86264.366955] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 5 19:17:40 turing-police kernel: [172469.769179] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 6 06:07:37 turing-police kernel: [211467.239460] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 6 14:12:43 turing-police kernel: [ 54.891848] BUG: sleeping function called from invalid context at mm/slab.h:422 Something seems to be going astray while handling a RA packet. Kernel dirty due to hand-patching https://patchwork.kernel.org/patch/10003555/ (signed int:1 bitfield in sched.h causing tons of warnings) Unfortunately, the previous next- kernel I built was -20170927 (which worked OK). Googling for things in the traceback in the last month comes up empty, and only thing in the git log for net/ipv6 that looks vaguely related: commit f3d9832e56c48e4ca50bab0457e21bcaade4536d Author: David Ahern Date: Wed Oct 18 09:56:52 2017 -0700 ipv6: addrconf: cleanup locking in ipv6_add_addr Before I go bisecting, this ring any bells? [ 54.750340] [ 54.754060] WARNING: inconsistent lock state [ 54.757758] 4.14.0-rc7-next-20171102-dirty #537 Tainted: GW OE [ 54.761488] [ 54.765143] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 54.768954] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 54.772821] (fs_reclaim){+.?.}, at: [] fs_reclaim_acquire.part.60+0x5/0x30 [ 54.776762] {SOFTIRQ-ON-W} state was registered at: [ 54.780407] fs_reclaim_acquire.part.60+0x29/0x30 [ 54.784056] __kmalloc+0x71/0x540 [ 54.787666] smp_store_boot_cpu_info+0xfd/0x169 [ 54.791489] native_smp_prepare_cpus+0x155/0x7fc [ 54.795312] kernel_init_freeable+0x1f4/0x614 [ 54.799130] kernel_init+0xb/0x120 [ 54.802927] ret_from_fork+0x27/0x40 [ 54.806716] irq event stamp: 1159488 [ 54.810481] hardirqs last enabled at (1159488): [] __local_bh_enable_ip+0xae/0x150 [ 54.814186] hardirqs last disabled at (1159487): [] __local_bh_enable_ip+0x64/0x150 [ 54.830096] softirqs last enabled at (1159300): [] irq_enter+0x8c/0xd0 [ 54.833949] softirqs last disabled at (1159301): [] irq_exit+0x10b/0x160 [ 54.837745] other info that might help us debug this: [ 54.845446] Possible unsafe locking scenario: [ 54.853164]CPU0 [ 54.856967] [ 54.860759] lock(fs_reclaim); [ 54.864555] [ 54.868315] lock(fs_reclaim); [ 54.872096] *** DEADLOCK *** [ 54.883265] 2 locks held by swapper/0/0: [ 54.887028] #0: (rcu_read_lock){}, at: [] process_backlog+0xac/0x400 [ 54.891014] #1: (rcu_read_lock){}, at: [] ip6_input_finish+0x5/0xb20 [ 54.891030] stack backtrace: [ 54.891040] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW OE 4.14.0-rc7-next-20171102-dirty #537 [ 54.891043] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A20 05/08/2017 [ 54.891045] Call Trace: [ 54.891051] [ 54.891060] dump_stack+0x7b/0xe4 [ 54.891070] print_usage_bug+0x267/0x320 [ 54.891081] mark_lock+0x5f9/0x7f0 [ 54.891087] ? check_usage_backwards+0x160/0x160 [ 54.891095] ? sched_clock_cpu+0x18/0x1d0 [ 54.891101] ? sched_clock_cpu+0x18/0x1d0 [ 54.89] __lock_acquire+0x628/0x1ca0 [ 54.891121] ? sched_clock_cpu+0x18/0x1d0 [ 54.891126] ? sched_clock_cpu+0x18/0x1d0 [ 54.891135] ? __lock_acquire+0x2e3/0x1ca0 [ 54.891146] lock_acquire+0xb3/0x2f0 [ 54.891153] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891165] fs_reclaim_acquire.part.60+0x29/0x30 [ 54.891170] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891178] kmem_cache_alloc_trace+0x3f/0x500 [ 54.891186] ? cyc2ns_read_end+0x1e/0x30 [ 54.891196] ipv6_add_addr+0x15a/0xc30 [ 54.891217] ? ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891223] ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891238] ? manage_tempaddrs+0x195/0x220 [ 54.891249] ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891255] addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891268] addrconf_prefix_rcv+0x2e5/0x9b0 [ 54.891279] ? neigh_update+0x446/0xb90 [ 54.891298] ? ndisc_router_discovery+0x5ab/0xf00 [ 54.891303] ndisc_router_discovery+0x5ab/0xf00 [ 54.891311] ? retint_kernel+0x2d/0x2d [ 54.891331] ndisc_rcv+0x1b6/0x270 [ 54.891340] icmpv6_rcv+0x6aa/0x9f0 [ 54.891345] ? ipv6_chk_mcast_addr+0x176/0x530 [ 54.891351] ? do_csum+0x17b/0x260 [ 54.891360] ip6_input_finish+0x194/0xb20 [ 54.891372] ip6_input+0x5b/0x2c0 [ 54.891380] ? ip6_rcv_finish+0x320/0x320 [ 54.891389] ip6_mc_input+0x