Re: Oops with commit 6d18c73 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
On 01/06/17 17:16, Nikolay Aleksandrov wrote: > On 01/06/17 17:00, Nikolay Aleksandrov wrote: >> On 01/06/17 15:34, Sebastian Ott wrote: >>> On Thu, 1 Jun 2017, Xin Long wrote: On Thu, Jun 1, 2017 at 12:32 AM, Sebastian Ott wrote: > [...] I couldn't see any bridge-related thing here, and it couldn't be reproduced with virbr0 (stp=1) on my box (on both s390x and x86_64), I guess there is something else in you machine. With the latest upstream kernel, can you remove libvirt (virbr0) and boot your machine normally, then: # brctl addbr br0 # ip link set br0 up # brctl stp br0 on to check if it will still hang. >>> >>> Nope. That doesn't hang. >>> >>> If it can't be reproduced in this way, pls add this on your kernel: --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -178,9 +178,11 @@ static void br_stp_start(struct net_bridge *br) br->stp_enabled = BR_KERNEL_STP; br_debug(br, "using kernel STP\n"); + WARN_ON(1); /* To start timers on any ports left in blocking */ mod_timer(&br->hello_timer, jiffies + br->hello_time); br_port_state_selection(br); + pr_warn("hello timer start done\n"); } spin_unlock_bh(&br->lock); diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c index 60b6fe2..c98b3e5 100644 --- a/net/bridge/br_stp_timer.c +++ b/net/bridge/br_stp_timer.c @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg) if (br->dev->flags & IFF_UP) { br_config_bpdu_generation(br); - if (br->stp_enabled == BR_KERNEL_STP) + if (br->stp_enabled != BR_USER_STP) mod_timer(&br->hello_timer, round_jiffies(jiffies + br->hello_time)); let's see if it hangs when starting the timer. Thanks. >>> >>> No hang either: >>> >> [snip] >> Could you please try the patch below ? >> >> --- >> >> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c >> index 4efd5d54498a..89110319ef0f 100644 >> --- a/net/bridge/br_stp_if.c >> +++ b/net/bridge/br_stp_if.c >> @@ -173,7 +173,8 @@ static void br_stp_start(struct net_bridge *br) >> br_debug(br, "using kernel STP\n"); >> >> /* To start timers on any ports left in blocking */ >> -mod_timer(&br->hello_timer, jiffies + br->hello_time); >> +if (br->dev->flags & IFF_UP) >> +mod_timer(&br->hello_timer, jiffies + br->hello_time); >> br_port_state_selection(br); >> } >> >> > > Ah nevermind, this patch reverts it back to the previous state. > Okay, I saw the problem and can reliably reproduce it. I will send a fix for testing in a few minutes. I think the issue is that the timer can be started before the bridge even goes up, i.e. create bridge -> brctl stp br0 on -> ip l del br0 so the del_timer_sync() doesn't get executed and thus it's still armed. $ while :; do ip l add br0 type bridge hello_time 100; brctl stp br0 on; ip l del br0; done;
Re: Oops with commit 6d18c73 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
On 01/06/17 17:00, Nikolay Aleksandrov wrote: > On 01/06/17 15:34, Sebastian Ott wrote: >> On Thu, 1 Jun 2017, Xin Long wrote: >>> On Thu, Jun 1, 2017 at 12:32 AM, Sebastian Ott >>> wrote: [...] >>> I couldn't see any bridge-related thing here, and it couldn't be reproduced >>> with virbr0 (stp=1) on my box (on both s390x and x86_64), I guess there >>> is something else in you machine. >>> >>> With the latest upstream kernel, can you remove libvirt (virbr0) and boot >>> your >>> machine normally, then: >>> # brctl addbr br0 >>> # ip link set br0 up >>> # brctl stp br0 on >>> >>> to check if it will still hang. >> >> Nope. That doesn't hang. >> >> >>> If it can't be reproduced in this way, pls add this on your kernel: >>> >>> --- a/net/bridge/br_stp_if.c >>> +++ b/net/bridge/br_stp_if.c >>> @@ -178,9 +178,11 @@ static void br_stp_start(struct net_bridge *br) >>> br->stp_enabled = BR_KERNEL_STP; >>> br_debug(br, "using kernel STP\n"); >>> >>> + WARN_ON(1); >>> /* To start timers on any ports left in blocking */ >>> mod_timer(&br->hello_timer, jiffies + br->hello_time); >>> br_port_state_selection(br); >>> + pr_warn("hello timer start done\n"); >>> } >>> >>> spin_unlock_bh(&br->lock); >>> diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c >>> index 60b6fe2..c98b3e5 100644 >>> --- a/net/bridge/br_stp_timer.c >>> +++ b/net/bridge/br_stp_timer.c >>> @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg) >>> if (br->dev->flags & IFF_UP) { >>> br_config_bpdu_generation(br); >>> >>> - if (br->stp_enabled == BR_KERNEL_STP) >>> + if (br->stp_enabled != BR_USER_STP) >>> mod_timer(&br->hello_timer, >>> round_jiffies(jiffies + br->hello_time)); >>> >>> >>> let's see if it hangs when starting the timer. Thanks. >> >> No hang either: >> > [snip] > Could you please try the patch below ? > > --- > > diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c > index 4efd5d54498a..89110319ef0f 100644 > --- a/net/bridge/br_stp_if.c > +++ b/net/bridge/br_stp_if.c > @@ -173,7 +173,8 @@ static void br_stp_start(struct net_bridge *br) > br_debug(br, "using kernel STP\n"); > > /* To start timers on any ports left in blocking */ > - mod_timer(&br->hello_timer, jiffies + br->hello_time); > + if (br->dev->flags & IFF_UP) > + mod_timer(&br->hello_timer, jiffies + br->hello_time); > br_port_state_selection(br); > } > > Ah nevermind, this patch reverts it back to the previous state.
Re: Oops with commit 6d18c73 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
On 01/06/17 15:34, Sebastian Ott wrote: > On Thu, 1 Jun 2017, Xin Long wrote: >> On Thu, Jun 1, 2017 at 12:32 AM, Sebastian Ott >> wrote: >>> [...] >> I couldn't see any bridge-related thing here, and it couldn't be reproduced >> with virbr0 (stp=1) on my box (on both s390x and x86_64), I guess there >> is something else in you machine. >> >> With the latest upstream kernel, can you remove libvirt (virbr0) and boot >> your >> machine normally, then: >> # brctl addbr br0 >> # ip link set br0 up >> # brctl stp br0 on >> >> to check if it will still hang. > > Nope. That doesn't hang. > > >> If it can't be reproduced in this way, pls add this on your kernel: >> >> --- a/net/bridge/br_stp_if.c >> +++ b/net/bridge/br_stp_if.c >> @@ -178,9 +178,11 @@ static void br_stp_start(struct net_bridge *br) >> br->stp_enabled = BR_KERNEL_STP; >> br_debug(br, "using kernel STP\n"); >> >> + WARN_ON(1); >> /* To start timers on any ports left in blocking */ >> mod_timer(&br->hello_timer, jiffies + br->hello_time); >> br_port_state_selection(br); >> + pr_warn("hello timer start done\n"); >> } >> >> spin_unlock_bh(&br->lock); >> diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c >> index 60b6fe2..c98b3e5 100644 >> --- a/net/bridge/br_stp_timer.c >> +++ b/net/bridge/br_stp_timer.c >> @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg) >> if (br->dev->flags & IFF_UP) { >> br_config_bpdu_generation(br); >> >> - if (br->stp_enabled == BR_KERNEL_STP) >> + if (br->stp_enabled != BR_USER_STP) >> mod_timer(&br->hello_timer, >> round_jiffies(jiffies + br->hello_time)); >> >> >> let's see if it hangs when starting the timer. Thanks. > > No hang either: > [snip] Could you please try the patch below ? --- diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index 4efd5d54498a..89110319ef0f 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -173,7 +173,8 @@ static void br_stp_start(struct net_bridge *br) br_debug(br, "using kernel STP\n"); /* To start timers on any ports left in blocking */ - mod_timer(&br->hello_timer, jiffies + br->hello_time); + if (br->dev->flags & IFF_UP) + mod_timer(&br->hello_timer, jiffies + br->hello_time); br_port_state_selection(br); }
Re: Oops with commit 6d18c73 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
On Thu, 1 Jun 2017, Xin Long wrote: > On Thu, Jun 1, 2017 at 12:32 AM, Sebastian Ott > wrote: > > [...] > I couldn't see any bridge-related thing here, and it couldn't be reproduced > with virbr0 (stp=1) on my box (on both s390x and x86_64), I guess there > is something else in you machine. > > With the latest upstream kernel, can you remove libvirt (virbr0) and boot your > machine normally, then: > # brctl addbr br0 > # ip link set br0 up > # brctl stp br0 on > > to check if it will still hang. Nope. That doesn't hang. > If it can't be reproduced in this way, pls add this on your kernel: > > --- a/net/bridge/br_stp_if.c > +++ b/net/bridge/br_stp_if.c > @@ -178,9 +178,11 @@ static void br_stp_start(struct net_bridge *br) > br->stp_enabled = BR_KERNEL_STP; > br_debug(br, "using kernel STP\n"); > > + WARN_ON(1); > /* To start timers on any ports left in blocking */ > mod_timer(&br->hello_timer, jiffies + br->hello_time); > br_port_state_selection(br); > + pr_warn("hello timer start done\n"); > } > > spin_unlock_bh(&br->lock); > diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c > index 60b6fe2..c98b3e5 100644 > --- a/net/bridge/br_stp_timer.c > +++ b/net/bridge/br_stp_timer.c > @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg) > if (br->dev->flags & IFF_UP) { > br_config_bpdu_generation(br); > > - if (br->stp_enabled == BR_KERNEL_STP) > + if (br->stp_enabled != BR_USER_STP) > mod_timer(&br->hello_timer, > round_jiffies(jiffies + br->hello_time)); > > > let's see if it hangs when starting the timer. Thanks. No hang either: [ 134.018104] [ cut here ] [ 134.018144] WARNING: CPU: 1 PID: 1339 at net/bridge/br_stp_if.c:181 br_stp_set_enabled+0x154/0x2b0 [bridge] [ 134.018149] Modules linked in: bridge stp llc rdma_ucm ib_ucm ib_uverbs [...] [ 134.018257] CPU: 1 PID: 1339 Comm: brctl Not tainted 4.12.0-rc3-00011-gf511c0b-dirty #587 [ 134.018262] Hardware name: IBM 2827 H66 705 (LPAR) [ 134.018266] task: d141c100 task.stack: d143 [ 134.018271] Krnl PSW : 0704f0018000 03ff802bc4c4 (br_stp_set_enabled+0x154/0x2b0 [bridge]) [ 134.018286]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 134.018294] Krnl GPRS: c5eae501 05dc 0bb8 0001 [ 134.018298]03ff802bc42c d1433c78 0001 d3ad2d60 [ 134.018303]0002 03ff802c21a8 d3ad2d60 fffe [ 134.018308]d1671738 26a0 03ff802bc42c d1433c38 [ 134.018320] Krnl Code: 03ff802bc4b4: e54ca9180001mvhi 2328(%r10),1 03ff802bc4ba: c004brcl 0,3ff802bc4ba #03ff802bc4c0: a7f40001brc 15,3ff802bc4c2 >03ff802bc4c4: c418b5aalgrl %r1,3ff802b3018 03ff802bc4ca: 4120ac10la %r2,3088(%r10) 03ff802bc4ce: e3301004lg %r3,0(%r1) 03ff802bc4d4: e330a8d80008ag %r3,2264(%r10) 03ff802bc4da: c0e5bc8bbrasl %r14,3ff802b3df0 [ 134.018374] Call Trace: [ 134.018384] ([<03ff802bc42c>] br_stp_set_enabled+0xbc/0x2b0 [bridge]) [ 134.018393] [<03ff802c21d2>] set_stp_state+0x2a/0x40 [bridge] [ 134.018402] [<03ff802c0f30>] store_bridge_parm+0xa8/0xf8 [bridge] [ 134.018410] [<004012f2>] kernfs_fop_write+0x132/0x208 [ 134.018417] [<0036088e>] __vfs_write+0x36/0x140 [ 134.018422] [<00361b54>] vfs_write+0xbc/0x1a0 [ 134.018427] [<0036323e>] SyS_write+0x66/0xc0 [ 134.018434] [<008ccc80>] system_call+0xc4/0x28c [ 134.018438] 5 locks held by brctl/1339: [ 134.018443] #0: (sb_writers#5){.+.+.+}, at: [<00361b3e>] vfs_write+0xa6/0x1a0 [ 134.018462] #1: (&of->mutex){+.+.+.}, at: [<00401372>] kernfs_fop_write+0x1b2/0x208 [ 134.018478] #2: (s_active#116){.+.+.+}, at: [<0040137e>] kernfs_fop_write+0x1be/0x208 [ 134.018496] #3: (rtnl_mutex){+.+.+.}, at: [<03ff802c0f08>] store_bridge_parm+0x80/0xf8 [bridge] [ 134.018517] #4: (&(&br->lock)->rlock){+.}, at: [<03ff802bc42c>] br_stp_set_enabled+0xbc/0x2b0 [bridge] [ 134.018537] Last Breaking-Event-Address: [ 134.018546] [<03ff802bc4c0>] br_stp_set_enabled+0x150/0x2b0 [bridge] [ 134.018551] ---[ end trace 0fc342e82de9b3d7 ]--- [ 134.018638] hello timer start done In the system dump I observed that 3 cpus are within mod_timer (different timers) and spin for some lock (one o
Re: Oops with commit 6d18c73 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
On Thu, Jun 1, 2017 at 12:32 AM, Sebastian Ott wrote: [...] > > A system running v4.12-rc3-11-gf511c0b on s390 hangs after boot with no > messages on the console. The message buffer obtained via a system dump > looked like this: > > [...] > [ 17.870712] virbr0: port 1(virbr0-nic) entered disabled state > [ 19.618523] Unable to handle kernel pointer dereference in virtual kernel > address space > [ 250.028426] INFO: task jbd2/dasda1-8:100 blocked for more than 120 seconds. > [ 250.028427] Not tainted 4.12.0-rc3-00011-gf511c0b #573 > [ 250.028428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 250.028429] jbd2/dasda1-8 D12808 100 2 0x > [ 250.028437] Stack: > [ 250.028437]e8c4f9b0 00233afe > e8c48100 > [ 250.028441]e8c4f978 001b1c98 e8c4f978 > e8c4f9d8 > [ 250.028444]0400efdcce00 e8c48890 > efdcce18 > [ 250.028447]e8c48100 efdcce00 e8ce8100 > e73c6900 > [ 250.028450]008da090 008c4f54 e8c4f9d8 > e8c4fa60 > [ 250.028453] Call Trace: > [ 250.028458] ([<008c4f54>] __schedule+0xb14/0xc90) > [ 250.028459] [<008c5164>] schedule+0x94/0xc0 > [ 250.028462] [<001802ac>] io_schedule+0x34/0x58 > [ 250.028464] [<002a44c2>] wait_on_page_bit+0x16a/0x198 > [ 250.028465] [<002a4576>] __filemap_fdatawait_range+0x86/0x188 > [ 250.028467] [<002a46a6>] filemap_fdatawait_range+0x2e/0x58 > [ 250.028471] [<004719d4>] > jbd2_journal_commit_transaction+0x10e4/0x2200 > [ 250.028473] [<0047890a>] kjournald2+0xda/0x2c0 > [ 250.028475] [<0016da5e>] kthread+0x166/0x178 > [ 250.028477] [<008cce7a>] kernel_thread_starter+0x6/0xc > [ 250.028479] [<008cce74>] kernel_thread_starter+0x0/0xc > [ 250.028480] INFO: lockdep is turned off. > [...] I couldn't see any bridge-related thing here, and it couldn't be reproduced with virbr0 (stp=1) on my box (on both s390x and x86_64), I guess there is something else in you machine. With the latest upstream kernel, can you remove libvirt (virbr0) and boot your machine normally, then: # brctl addbr br0 # ip link set br0 up # brctl stp br0 on to check if it will still hang. If it can't be reproduced in this way, pls add this on your kernel: --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -178,9 +178,11 @@ static void br_stp_start(struct net_bridge *br) br->stp_enabled = BR_KERNEL_STP; br_debug(br, "using kernel STP\n"); + WARN_ON(1); /* To start timers on any ports left in blocking */ mod_timer(&br->hello_timer, jiffies + br->hello_time); br_port_state_selection(br); + pr_warn("hello timer start done\n"); } spin_unlock_bh(&br->lock); diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c index 60b6fe2..c98b3e5 100644 --- a/net/bridge/br_stp_timer.c +++ b/net/bridge/br_stp_timer.c @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg) if (br->dev->flags & IFF_UP) { br_config_bpdu_generation(br); - if (br->stp_enabled == BR_KERNEL_STP) + if (br->stp_enabled != BR_USER_STP) mod_timer(&br->hello_timer, round_jiffies(jiffies + br->hello_time)); let's see if it hangs when starting the timer. Thanks.