Re: bond: take rcu lock in bond_poll_controller
On Fri, Sep 28, 2018 at 12:03:22PM -0700, Cong Wang wrote: > On Fri, Sep 28, 2018 at 12:02 PM Cong Wang wrote: > > > > On Fri, Sep 28, 2018 at 11:26 AM Dave Jones > > wrote: > > > diff --git a/net/core/netpoll.c b/net/core/netpoll.c > > > index 3219a2932463..4f9494381635 100644 > > > --- a/net/core/netpoll.c > > > +++ b/net/core/netpoll.c > > > @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, > > > struct sk_buff *skb, > > > /* It is up to the caller to keep npinfo alive. */ > > > struct netpoll_info *npinfo; > > > > > > + rcu_read_lock(); > > > lockdep_assert_irqs_disabled(); > > > > > > npinfo = rcu_dereference_bh(np->dev->npinfo); > > > > I think you probably need rcu_read_lock_bh() to satisfy > > rcu_deference_bh()... > > But irq is disabled here, so not sure if rcu_read_lock_bh() > could cause trouble... Interesting... I was wondering for a moment why I never got a warning, then I remembered I disabled lockdep for that machine because nfs spews stuff. I'll doublecheck, and post v4. lol, this looked like a 2 minute fix at first. Dave
Re: bond: take rcu lock in bond_poll_controller
On Fri, Sep 28, 2018 at 12:02 PM Cong Wang wrote: > > On Fri, Sep 28, 2018 at 11:26 AM Dave Jones wrote: > > diff --git a/net/core/netpoll.c b/net/core/netpoll.c > > index 3219a2932463..4f9494381635 100644 > > --- a/net/core/netpoll.c > > +++ b/net/core/netpoll.c > > @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct > > sk_buff *skb, > > /* It is up to the caller to keep npinfo alive. */ > > struct netpoll_info *npinfo; > > > > + rcu_read_lock(); > > lockdep_assert_irqs_disabled(); > > > > npinfo = rcu_dereference_bh(np->dev->npinfo); > > I think you probably need rcu_read_lock_bh() to satisfy > rcu_deference_bh()... But irq is disabled here, so not sure if rcu_read_lock_bh() could cause trouble... Interesting...
Re: bond: take rcu lock in bond_poll_controller
On Fri, Sep 28, 2018 at 11:26 AM Dave Jones wrote: > diff --git a/net/core/netpoll.c b/net/core/netpoll.c > index 3219a2932463..4f9494381635 100644 > --- a/net/core/netpoll.c > +++ b/net/core/netpoll.c > @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct > sk_buff *skb, > /* It is up to the caller to keep npinfo alive. */ > struct netpoll_info *npinfo; > > + rcu_read_lock(); > lockdep_assert_irqs_disabled(); > > npinfo = rcu_dereference_bh(np->dev->npinfo); I think you probably need rcu_read_lock_bh() to satisfy rcu_deference_bh()...
Re: bond: take rcu lock in bond_poll_controller
On 09/28/2018 11:24 AM, Dave Jones wrote: > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, > otherwise a trace like below is shown > > WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 > netdev_lower_get_next_private_rcu+0x34/0x40 > CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1 > > > Suggested-by: Cong Wang > Signed-off-by: Dave Jones > You forgot to change patch title.
bond: take rcu lock in bond_poll_controller
Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, otherwise a trace like below is shown WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40 CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1 Workqueue: bond0 bond_mii_monitor RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40 Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8> RSP: 0018:c987fa68 EFLAGS: 00010046 RAX: RBX: 880429614560 RCX: RDX: 0001 RSI: RDI: a184ada0 RBP: c987fa80 R08: 0001 R09: R10: c987f9f0 R11: 880429798040 R12: 8804289d5980 R13: a1511f60 R14: 00c8 R15: FS: () GS:88042f88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f4b78fce180 CR3: 00018180f006 CR4: 001606e0 Call Trace: bond_poll_controller+0x52/0x170 netpoll_poll_dev+0x79/0x290 netpoll_send_skb_on_dev+0x158/0x2c0 netpoll_send_udp+0x2d5/0x430 write_ext_msg+0x1e0/0x210 console_unlock+0x3c4/0x630 vprintk_emit+0xfa/0x2f0 printk+0x52/0x6e ? __netdev_printk+0x12b/0x220 netdev_info+0x64/0x80 ? bond_3ad_set_carrier+0xe9/0x180 bond_select_active_slave+0x1fc/0x310 bond_mii_monitor+0x709/0x9b0 process_one_work+0x221/0x5e0 worker_thread+0x4f/0x3b0 kthread+0x100/0x140 ? process_one_work+0x5e0/0x5e0 ? kthread_delayed_work_timer_fn+0x90/0x90 ret_from_fork+0x24/0x30 Suggested-by: Cong Wang Signed-off-by: Dave Jones -- v3: Do this in netpoll_send_skb_on_dev as Cong suggests. diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 3219a2932463..4f9494381635 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, /* It is up to the caller to keep npinfo alive. */ struct netpoll_info *npinfo; + rcu_read_lock(); lockdep_assert_irqs_disabled(); npinfo = rcu_dereference_bh(np->dev->npinfo); @@ -374,6 +375,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, skb_queue_tail(&npinfo->txq, skb); schedule_delayed_work(&npinfo->tx_work,0); } + rcu_read_unlock(); } EXPORT_SYMBOL(netpoll_send_skb_on_dev);
Re: bond: take rcu lock in bond_poll_controller
On Fri, Sep 28, 2018 at 10:31:39AM -0700, Cong Wang wrote: > On Fri, Sep 28, 2018 at 10:25 AM Dave Jones wrote: > > > > On Fri, Sep 28, 2018 at 09:55:52AM -0700, Cong Wang wrote: > > > On Fri, Sep 28, 2018 at 9:18 AM Dave Jones > > wrote: > > > > > > > > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, > > > > otherwise a trace like below is shown > > > > > > So why not take rcu read lock in netpoll_send_skb_on_dev() where > > > RCU is also assumed? > > > > that does seem to solve the backtrace spew I saw too, so if that's > > preferable I can respin the patch. > > > >From my observations, netpoll_send_skb_on_dev() does not take > RCU read lock _and_ it relies on rcu read lock because it calls > rcu_dereference_bh(). > > If my observation is correct, you should catch a RCU warning like > this but within netpoll_send_skb_on_dev(). > > > > As I said, I can't explain why you didn't trigger the RCU warning in > > > netpoll_send_skb_on_dev()... > > > > netpoll_send_skb_on_dev takes the rcu lock itself. > > Could you please point me where exactly is the rcu lock here? > > I am too stupid to see it. :) No, I'm the stupid one. I looked at the tree I had just edited to try your proposed change. Now that I've untangled myself, I'll repost with your suggested change. Dave
Re: bond: take rcu lock in bond_poll_controller
On Fri, Sep 28, 2018 at 10:25 AM Dave Jones wrote: > > On Fri, Sep 28, 2018 at 09:55:52AM -0700, Cong Wang wrote: > > On Fri, Sep 28, 2018 at 9:18 AM Dave Jones wrote: > > > > > > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, > > > otherwise a trace like below is shown > > > > So why not take rcu read lock in netpoll_send_skb_on_dev() where > > RCU is also assumed? > > that does seem to solve the backtrace spew I saw too, so if that's > preferable I can respin the patch. >From my observations, netpoll_send_skb_on_dev() does not take RCU read lock _and_ it relies on rcu read lock because it calls rcu_dereference_bh(). If my observation is correct, you should catch a RCU warning like this but within netpoll_send_skb_on_dev(). > > > As I said, I can't explain why you didn't trigger the RCU warning in > > netpoll_send_skb_on_dev()... > > netpoll_send_skb_on_dev takes the rcu lock itself. Could you please point me where exactly is the rcu lock here? I am too stupid to see it. :)
Re: bond: take rcu lock in bond_poll_controller
On Fri, Sep 28, 2018 at 09:55:52AM -0700, Cong Wang wrote: > On Fri, Sep 28, 2018 at 9:18 AM Dave Jones wrote: > > > > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, > > otherwise a trace like below is shown > > So why not take rcu read lock in netpoll_send_skb_on_dev() where > RCU is also assumed? that does seem to solve the backtrace spew I saw too, so if that's preferable I can respin the patch. > As I said, I can't explain why you didn't trigger the RCU warning in > netpoll_send_skb_on_dev()... netpoll_send_skb_on_dev takes the rcu lock itself. Dave
Re: bond: take rcu lock in bond_poll_controller
On Fri, Sep 28, 2018 at 9:18 AM Dave Jones wrote: > > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, > otherwise a trace like below is shown So why not take rcu read lock in netpoll_send_skb_on_dev() where RCU is also assumed? As I said, I can't explain why you didn't trigger the RCU warning in netpoll_send_skb_on_dev()...
bond: take rcu lock in bond_poll_controller
Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, otherwise a trace like below is shown WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40 CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1 Workqueue: bond0 bond_mii_monitor RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40 Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8> RSP: 0018:c987fa68 EFLAGS: 00010046 RAX: RBX: 880429614560 RCX: RDX: 0001 RSI: RDI: a184ada0 RBP: c987fa80 R08: 0001 R09: R10: c987f9f0 R11: 880429798040 R12: 8804289d5980 R13: a1511f60 R14: 00c8 R15: FS: () GS:88042f88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f4b78fce180 CR3: 00018180f006 CR4: 001606e0 Call Trace: bond_poll_controller+0x52/0x170 netpoll_poll_dev+0x79/0x290 netpoll_send_skb_on_dev+0x158/0x2c0 netpoll_send_udp+0x2d5/0x430 write_ext_msg+0x1e0/0x210 console_unlock+0x3c4/0x630 vprintk_emit+0xfa/0x2f0 printk+0x52/0x6e ? __netdev_printk+0x12b/0x220 netdev_info+0x64/0x80 ? bond_3ad_set_carrier+0xe9/0x180 bond_select_active_slave+0x1fc/0x310 bond_mii_monitor+0x709/0x9b0 process_one_work+0x221/0x5e0 worker_thread+0x4f/0x3b0 kthread+0x100/0x140 ? process_one_work+0x5e0/0x5e0 ? kthread_delayed_work_timer_fn+0x90/0x90 ret_from_fork+0x24/0x30 Signed-off-by: Dave Jones diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index c05c01a00755..77a3607a7099 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -977,6 +977,7 @@ static void bond_poll_controller(struct net_device *bond_dev) if (bond_3ad_get_active_agg_info(bond, &ad_info)) return; + rcu_read_lock(); bond_for_each_slave_rcu(bond, slave, iter) { if (!bond_slave_is_up(slave)) continue; @@ -992,6 +993,7 @@ static void bond_poll_controller(struct net_device *bond_dev) netpoll_poll_dev(slave->dev); } + rcu_read_unlock(); } static void bond_netpoll_cleanup(struct net_device *bond_dev)
Re: bond: take rcu lock in bond_poll_controller
From: Dave Jones Date: Mon, 24 Sep 2018 15:23:17 -0400 > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, > otherwise a trace like below is shown ... > Signed-off-by: Dave Jones Hey Dave, after some recent changes by Eric Dumazet this no longer applies. Please respin against 'net'. Thanks.
Re: bond: take rcu lock in bond_poll_controller
On Mon, Sep 24, 2018 at 1:08 PM Dave Jones wrote: > > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, > otherwise a trace like below is shown Interesting, netpoll_send_skb_on_dev() already assumes RCU read lock when it calls rcu_dereference_bh()... I wonder how it can't catch such a warning before the one you reported.
bond: take rcu lock in bond_poll_controller
Callers of bond_for_each_slave_rcu are expected to hold the rcu lock, otherwise a trace like below is shown WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40 CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1 Workqueue: bond0 bond_mii_monitor RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40 Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8> RSP: 0018:c987fa68 EFLAGS: 00010046 RAX: RBX: 880429614560 RCX: RDX: 0001 RSI: RDI: a184ada0 RBP: c987fa80 R08: 0001 R09: R10: c987f9f0 R11: 880429798040 R12: 8804289d5980 R13: a1511f60 R14: 00c8 R15: FS: () GS:88042f88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f4b78fce180 CR3: 00018180f006 CR4: 001606e0 Call Trace: bond_poll_controller+0x52/0x170 netpoll_poll_dev+0x79/0x290 netpoll_send_skb_on_dev+0x158/0x2c0 netpoll_send_udp+0x2d5/0x430 write_ext_msg+0x1e0/0x210 console_unlock+0x3c4/0x630 vprintk_emit+0xfa/0x2f0 printk+0x52/0x6e ? __netdev_printk+0x12b/0x220 netdev_info+0x64/0x80 ? bond_3ad_set_carrier+0xe9/0x180 bond_select_active_slave+0x1fc/0x310 bond_mii_monitor+0x709/0x9b0 process_one_work+0x221/0x5e0 worker_thread+0x4f/0x3b0 kthread+0x100/0x140 ? process_one_work+0x5e0/0x5e0 ? kthread_delayed_work_timer_fn+0x90/0x90 ret_from_fork+0x24/0x30 Signed-off-by: Dave Jones diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index a764a83f99da..519968d4513b 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -978,6 +978,7 @@ static void bond_poll_controller(struct net_device *bond_dev) if (bond_3ad_get_active_agg_info(bond, &ad_info)) return; + rcu_read_lock(); bond_for_each_slave_rcu(bond, slave, iter) { ops = slave->dev->netdev_ops; if (!bond_slave_is_up(slave) || !ops->ndo_poll_controller) @@ -998,6 +999,7 @@ static void bond_poll_controller(struct net_device *bond_dev) ops->ndo_poll_controller(slave->dev); up(&ni->dev_lock); } + rcu_read_unlock(); } static void bond_netpoll_cleanup(struct net_device *bond_dev)