Re: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
On 04/12/2020 00:42, Huang, Joseph wrote: >> From: Huang, Joseph >> Sent: Thursday, December 3, 2020 4:53 PM >> To: Nikolay Aleksandrov ; Jakub Kicinski >> >> Cc: Roopa Prabhu ; David S. Miller >> ; bri...@lists.linux-foundation.org; >> net...@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing >> >> Subject: RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping >> >>> From: Nikolay Aleksandrov >>> Sent: Thursday, December 3, 2020 3:47 PM >>> To: Jakub Kicinski ; Huang, Joseph >>> >>> Cc: Roopa Prabhu ; David S. Miller >>> ; bri...@lists.linux-foundation.org; >>> net...@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing >>> >>> Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast >>> snooping >>> >>> On 03/12/2020 20:28, Jakub Kicinski wrote: >>>> On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: >>>>> When enabling multicast snooping, bridge module deadlocks on >>>>> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing >>>>> querier on the same L2 network. >>>>> >>>>> The deadlock was caused by the following sequence: While holding the >>>>> lock, br_multicast_open calls br_multicast_join_snoopers, which >>>>> eventually causes IP stack to (attempt to) send out a Listener Report (in >>> igmp6_join_group). >>>>> Since the destination Ethernet address is a multicast address, >>>>> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, >>>>> which in turn calls br_multicast_add_group, which then deadlocks on >>> multicast_lock. >>>>> >>>>> The fix is to move the call br_multicast_join_snoopers outside of the >>>>> critical section. This works since br_multicast_join_snoopers only >>>>> deals with IP and does not modify any multicast data structures of >>>>> the bridge, so there's no need to hold the lock. >>>>> >>>>> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") >>>>> >>>>> Signed-off-by: Joseph Huang >>>> >>>> Nik, Linus - how does this one look? >>>> >>> >>> Hi, >>> Thanks, somehow I missed this one too. Need to check my email config. :) I >>> believe I see how it can happen, although it's not straight-forward to >>> follow. >>> A selftest for this case would be great, and any traces (e.g. hung task) >> would >>> help a lot as well. >>> Correct me if I'm wrong but the sequence is something like: >>> br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> >>> igmp6_group_added >>> -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode >>> -> igmp6_join_group() -> igmp6_join_group >>> -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge >>> -> mcast processing code >>> which uses the multicast_lock spinlock. Right? >> >> That is correct. >> >> Here's a stack trace from a typical run: >> >> echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping >> [ 936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU >> [ 936.152534] rcu: 0-: (5594 ticks this GP) >> idle=75a/1/0x4002 softirq=2787/2789 fqs=2625 >> [ 936.162026](t=5253 jiffies g=4205 q=12) >> [ 936.166041] Task dump for CPU 0: >> [ 936.169272] sh R running task0 1315 1295 >> 0x0002 >> [ 936.176332] Call trace: >> [ 936.178797] dump_backtrace+0x0/0x140 >> [ 936.182469] show_stack+0x14/0x20 >> [ 936.185793] sched_show_task+0x108/0x138 >> [ 936.189727] dump_cpu_task+0x40/0x50 >> [ 936.193313] rcu_dump_cpu_stacks+0x94/0xd0 >> [ 936.197420] rcu_sched_clock_irq+0x75c/0x9c0 >> [ 936.201698] update_process_times+0x2c/0x68 >> [ 936.205893] tick_sched_handle.isra.0+0x30/0x50 >> [ 936.210432] tick_sched_timer+0x48/0x98 >> [ 936.214272] __hrtimer_run_queues+0x110/0x1b0 >> [ 936.218635] hrtimer_interrupt+0xe4/0x240 >> [ 936.222656] arch_timer_handler_phys+0x30/0x40 >> [ 936.227106] handle_percpu_devid_irq+0x80/0x140 >> [ 936.231654] generic_handle_irq+0x24/0x38 >> [ 936.235669] __handle_domain_irq+0x60/0xb8 >> [ 936.239774] gic_handle_irq+0x5c/0x148 >> [ 936.243535] el1_irq+0xb8/0x180 >> [ 936.246689] queued_spin_lo
Re: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
> From: Huang, Joseph > Sent: Thursday, December 3, 2020 4:53 PM > To: Nikolay Aleksandrov ; Jakub Kicinski > > Cc: Roopa Prabhu ; David S. Miller > ; bri...@lists.linux-foundation.org; > net...@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing > > Subject: RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping > > > From: Nikolay Aleksandrov > > Sent: Thursday, December 3, 2020 3:47 PM > > To: Jakub Kicinski ; Huang, Joseph > > > > Cc: Roopa Prabhu ; David S. Miller > > ; bri...@lists.linux-foundation.org; > > net...@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing > > > > Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast > > snooping > > > > On 03/12/2020 20:28, Jakub Kicinski wrote: > > > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: > > >> When enabling multicast snooping, bridge module deadlocks on > > >> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing > > >> querier on the same L2 network. > > >> > > >> The deadlock was caused by the following sequence: While holding the > > >> lock, br_multicast_open calls br_multicast_join_snoopers, which > > >> eventually causes IP stack to (attempt to) send out a Listener Report (in > > igmp6_join_group). > > >> Since the destination Ethernet address is a multicast address, > > >> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, > > >> which in turn calls br_multicast_add_group, which then deadlocks on > > multicast_lock. > > >> > > >> The fix is to move the call br_multicast_join_snoopers outside of the > > >> critical section. This works since br_multicast_join_snoopers only > > >> deals with IP and does not modify any multicast data structures of > > >> the bridge, so there's no need to hold the lock. > > >> > > >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > > >> > > >> Signed-off-by: Joseph Huang > > > > > > Nik, Linus - how does this one look? > > > > > > > Hi, > > Thanks, somehow I missed this one too. Need to check my email config. :) I > > believe I see how it can happen, although it's not straight-forward to > > follow. > > A selftest for this case would be great, and any traces (e.g. hung task) > would > > help a lot as well. > > Correct me if I'm wrong but the sequence is something like: > > br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> > > igmp6_group_added > > -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode > > -> igmp6_join_group() -> igmp6_join_group > > -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge > > -> mcast processing code > > which uses the multicast_lock spinlock. Right? > > That is correct. > > Here's a stack trace from a typical run: > > echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping > [ 936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU > [ 936.152534] rcu: 0-: (5594 ticks this GP) > idle=75a/1/0x4002 softirq=2787/2789 fqs=2625 > [ 936.162026](t=5253 jiffies g=4205 q=12) > [ 936.166041] Task dump for CPU 0: > [ 936.169272] sh R running task0 1315 1295 > 0x0002 > [ 936.176332] Call trace: > [ 936.178797] dump_backtrace+0x0/0x140 > [ 936.182469] show_stack+0x14/0x20 > [ 936.185793] sched_show_task+0x108/0x138 > [ 936.189727] dump_cpu_task+0x40/0x50 > [ 936.193313] rcu_dump_cpu_stacks+0x94/0xd0 > [ 936.197420] rcu_sched_clock_irq+0x75c/0x9c0 > [ 936.201698] update_process_times+0x2c/0x68 > [ 936.205893] tick_sched_handle.isra.0+0x30/0x50 > [ 936.210432] tick_sched_timer+0x48/0x98 > [ 936.214272] __hrtimer_run_queues+0x110/0x1b0 > [ 936.218635] hrtimer_interrupt+0xe4/0x240 > [ 936.222656] arch_timer_handler_phys+0x30/0x40 > [ 936.227106] handle_percpu_devid_irq+0x80/0x140 > [ 936.231654] generic_handle_irq+0x24/0x38 > [ 936.235669] __handle_domain_irq+0x60/0xb8 > [ 936.239774] gic_handle_irq+0x5c/0x148 > [ 936.243535] el1_irq+0xb8/0x180 > [ 936.246689] queued_spin_lock_slowpath+0x118/0x3b0 > [ 936.251495] _raw_spin_lock+0x5c/0x68 > [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] > [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] > [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] > [ 936.269689] dev_hard_start_xmit+0x94/0x158 > [ 936.273876] __dev_queue_xmit+0x5ac
RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
> From: Nikolay Aleksandrov > Sent: Thursday, December 3, 2020 3:47 PM > To: Jakub Kicinski ; Huang, Joseph > > Cc: Roopa Prabhu ; David S. Miller > ; bri...@lists.linux-foundation.org; > net...@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing > > Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast > snooping > > On 03/12/2020 20:28, Jakub Kicinski wrote: > > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: > >> When enabling multicast snooping, bridge module deadlocks on > >> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing > >> querier on the same L2 network. > >> > >> The deadlock was caused by the following sequence: While holding the > >> lock, br_multicast_open calls br_multicast_join_snoopers, which > >> eventually causes IP stack to (attempt to) send out a Listener Report (in > igmp6_join_group). > >> Since the destination Ethernet address is a multicast address, > >> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, > >> which in turn calls br_multicast_add_group, which then deadlocks on > multicast_lock. > >> > >> The fix is to move the call br_multicast_join_snoopers outside of the > >> critical section. This works since br_multicast_join_snoopers only > >> deals with IP and does not modify any multicast data structures of > >> the bridge, so there's no need to hold the lock. > >> > >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > >> > >> Signed-off-by: Joseph Huang > > > > Nik, Linus - how does this one look? > > > > Hi, > Thanks, somehow I missed this one too. Need to check my email config. :) I > believe I see how it can happen, although it's not straight-forward to follow. > A selftest for this case would be great, and any traces (e.g. hung task) would > help a lot as well. > Correct me if I'm wrong but the sequence is something like: > br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> > igmp6_group_added > -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode > -> igmp6_join_group() -> igmp6_join_group > -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge > -> mcast processing code > which uses the multicast_lock spinlock. Right? That is correct. Here's a stack trace from a typical run: echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping [ 936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU [ 936.152534] rcu: 0-: (5594 ticks this GP) idle=75a/1/0x4002 softirq=2787/2789 fqs=2625 [ 936.162026] (t=5253 jiffies g=4205 q=12) [ 936.166041] Task dump for CPU 0: [ 936.169272] sh R running task0 1315 1295 0x0002 [ 936.176332] Call trace: [ 936.178797] dump_backtrace+0x0/0x140 [ 936.182469] show_stack+0x14/0x20 [ 936.185793] sched_show_task+0x108/0x138 [ 936.189727] dump_cpu_task+0x40/0x50 [ 936.193313] rcu_dump_cpu_stacks+0x94/0xd0 [ 936.197420] rcu_sched_clock_irq+0x75c/0x9c0 [ 936.201698] update_process_times+0x2c/0x68 [ 936.205893] tick_sched_handle.isra.0+0x30/0x50 [ 936.210432] tick_sched_timer+0x48/0x98 [ 936.214272] __hrtimer_run_queues+0x110/0x1b0 [ 936.218635] hrtimer_interrupt+0xe4/0x240 [ 936.222656] arch_timer_handler_phys+0x30/0x40 [ 936.227106] handle_percpu_devid_irq+0x80/0x140 [ 936.231654] generic_handle_irq+0x24/0x38 [ 936.235669] __handle_domain_irq+0x60/0xb8 [ 936.239774] gic_handle_irq+0x5c/0x148 [ 936.243535] el1_irq+0xb8/0x180 [ 936.246689] queued_spin_lock_slowpath+0x118/0x3b0 [ 936.251495] _raw_spin_lock+0x5c/0x68 [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] [ 936.269689] dev_hard_start_xmit+0x94/0x158 [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 [ 936.277890] dev_queue_xmit+0x10/0x18 [ 936.281563] neigh_resolve_output+0xec/0x198 [ 936.285845] ip6_finish_output2+0x240/0x710 [ 936.290039] __ip6_finish_output+0x130/0x170 [ 936.294318] ip6_output+0x6c/0x1c8 [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 [ 936.301834] igmp6_send+0x358/0x558 [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 [ 936.309774] igmp6_group_added+0xfc/0x110 [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] > > One question - shouldn't leaving have the same problem? I.e. > br_multicast_toggle -> br_multicast_leave_snoopers > -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec -> > -> igmp6_group_drop
Re: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
On 03/12/2020 20:28, Jakub Kicinski wrote: > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: >> When enabling multicast snooping, bridge module deadlocks on multicast_lock >> if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 >> network. >> >> The deadlock was caused by the following sequence: While holding the lock, >> br_multicast_open calls br_multicast_join_snoopers, which eventually causes >> IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). >> Since the destination Ethernet address is a multicast address, br_dev_xmit >> feeds the packet back to the bridge via br_multicast_rcv, which in turn >> calls br_multicast_add_group, which then deadlocks on multicast_lock. >> >> The fix is to move the call br_multicast_join_snoopers outside of the >> critical section. This works since br_multicast_join_snoopers only deals >> with IP and does not modify any multicast data structures of the bridge, >> so there's no need to hold the lock. >> >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") >> >> Signed-off-by: Joseph Huang > > Nik, Linus - how does this one look? > Hi, Thanks, somehow I missed this one too. Need to check my email config. :) I believe I see how it can happen, although it's not straight-forward to follow. A selftest for this case would be great, and any traces (e.g. hung task) would help a lot as well. Correct me if I'm wrong but the sequence is something like: br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> igmp6_group_added -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode igmp6_join_group() -> igmp6_join_group -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge mcast processing code which uses the multicast_lock spinlock. Right? One question - shouldn't leaving have the same problem? I.e. br_multicast_toggle -> br_multicast_leave_snoopers -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec -> igmp6_group_dropped -> igmp6_leave_group -> MLDv1 mode && last reporter -> igmp6_send() ? I think it was saved by the fact that !br_opt_get(br, BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't be acquired in the br_dev_xmit path? If so, I'd appreciate a comment about that because it's not really trivial to find out. :) Anyhow, the patch is fine as-is too: Acked-by: Nikolay Aleksandrov Thanks, Nik
Re: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: > When enabling multicast snooping, bridge module deadlocks on multicast_lock > if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 > network. > > The deadlock was caused by the following sequence: While holding the lock, > br_multicast_open calls br_multicast_join_snoopers, which eventually causes > IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). > Since the destination Ethernet address is a multicast address, br_dev_xmit > feeds the packet back to the bridge via br_multicast_rcv, which in turn > calls br_multicast_add_group, which then deadlocks on multicast_lock. > > The fix is to move the call br_multicast_join_snoopers outside of the > critical section. This works since br_multicast_join_snoopers only deals > with IP and does not modify any multicast data structures of the bridge, > so there's no need to hold the lock. > > Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > > Signed-off-by: Joseph Huang Nik, Linus - how does this one look?
[PATCH] bridge: Fix a deadlock when enabling multicast snooping
When enabling multicast snooping, bridge module deadlocks on multicast_lock if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 network. The deadlock was caused by the following sequence: While holding the lock, br_multicast_open calls br_multicast_join_snoopers, which eventually causes IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). Since the destination Ethernet address is a multicast address, br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, which in turn calls br_multicast_add_group, which then deadlocks on multicast_lock. The fix is to move the call br_multicast_join_snoopers outside of the critical section. This works since br_multicast_join_snoopers only deals with IP and does not modify any multicast data structures of the bridge, so there's no need to hold the lock. Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") Signed-off-by: Joseph Huang --- net/bridge/br_device.c| 3 +++ net/bridge/br_multicast.c | 8 net/bridge/br_private.h | 5 + 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index 7730c8f3cb53..8d3ec29d3875 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -177,6 +177,9 @@ static int br_dev_open(struct net_device *dev) br_stp_enable_bridge(br); br_multicast_open(br); + if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) + br_multicast_join_snoopers(br); + return 0; } diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index eae898c3cff7..6c8d6b2eed91 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -3286,7 +3286,7 @@ static inline void br_ip6_multicast_join_snoopers(struct net_bridge *br) } #endif -static void br_multicast_join_snoopers(struct net_bridge *br) +void br_multicast_join_snoopers(struct net_bridge *br) { br_ip4_multicast_join_snoopers(br); br_ip6_multicast_join_snoopers(br); @@ -3336,9 +3336,6 @@ static void __br_multicast_open(struct net_bridge *br, void br_multicast_open(struct net_bridge *br) { - if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) - br_multicast_join_snoopers(br); - __br_multicast_open(br, &br->ip4_own_query); #if IS_ENABLED(CONFIG_IPV6) __br_multicast_open(br, &br->ip6_own_query); @@ -3509,6 +3506,9 @@ int br_multicast_toggle(struct net_bridge *br, unsigned long val) unlock: spin_unlock_bh(&br->multicast_lock); + if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) + br_multicast_join_snoopers(br); + return 0; } diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 345118e35c42..7085dd747130 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -792,6 +792,7 @@ void br_multicast_del_port(struct net_bridge_port *port); void br_multicast_enable_port(struct net_bridge_port *port); void br_multicast_disable_port(struct net_bridge_port *port); void br_multicast_init(struct net_bridge *br); +void br_multicast_join_snoopers(struct net_bridge *br); void br_multicast_open(struct net_bridge *br); void br_multicast_stop(struct net_bridge *br); void br_multicast_dev_del(struct net_bridge *br); @@ -969,6 +970,10 @@ static inline void br_multicast_init(struct net_bridge *br) { } +static inline void br_multicast_join_snoopers(struct net_bridge *br) +{ +} + static inline void br_multicast_open(struct net_bridge *br) { } -- 2.29.2