Re: bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Dave Jones
On Fri, Sep 28, 2018 at 12:03:22PM -0700, Cong Wang wrote:
 > On Fri, Sep 28, 2018 at 12:02 PM Cong Wang  wrote:
 > >
 > > On Fri, Sep 28, 2018 at 11:26 AM Dave Jones  
 > > wrote:
 > > > diff --git a/net/core/netpoll.c b/net/core/netpoll.c
 > > > index 3219a2932463..4f9494381635 100644
 > > > --- a/net/core/netpoll.c
 > > > +++ b/net/core/netpoll.c
 > > > @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, 
 > > > struct sk_buff *skb,
 > > > /* It is up to the caller to keep npinfo alive. */
 > > > struct netpoll_info *npinfo;
 > > >
 > > > +   rcu_read_lock();
 > > > lockdep_assert_irqs_disabled();
 > > >
 > > > npinfo = rcu_dereference_bh(np->dev->npinfo);
 > >
 > > I think you probably need rcu_read_lock_bh() to satisfy
 > > rcu_deference_bh()...
 > 
 > But irq is disabled here, so not sure if rcu_read_lock_bh()
 > could cause trouble... Interesting...

I was wondering for a moment why I never got a warning, then I
remembered I disabled lockdep for that machine because nfs spews stuff.

I'll doublecheck, and post v4. lol, this looked like a 2 minute fix at first.

Dave


Re: bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Cong Wang
On Fri, Sep 28, 2018 at 12:02 PM Cong Wang  wrote:
>
> On Fri, Sep 28, 2018 at 11:26 AM Dave Jones  wrote:
> > diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> > index 3219a2932463..4f9494381635 100644
> > --- a/net/core/netpoll.c
> > +++ b/net/core/netpoll.c
> > @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
> > sk_buff *skb,
> > /* It is up to the caller to keep npinfo alive. */
> > struct netpoll_info *npinfo;
> >
> > +   rcu_read_lock();
> > lockdep_assert_irqs_disabled();
> >
> > npinfo = rcu_dereference_bh(np->dev->npinfo);
>
> I think you probably need rcu_read_lock_bh() to satisfy
> rcu_deference_bh()...

But irq is disabled here, so not sure if rcu_read_lock_bh()
could cause trouble... Interesting...


Re: bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Cong Wang
On Fri, Sep 28, 2018 at 11:26 AM Dave Jones  wrote:
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 3219a2932463..4f9494381635 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
> sk_buff *skb,
> /* It is up to the caller to keep npinfo alive. */
> struct netpoll_info *npinfo;
>
> +   rcu_read_lock();
> lockdep_assert_irqs_disabled();
>
> npinfo = rcu_dereference_bh(np->dev->npinfo);

I think you probably need rcu_read_lock_bh() to satisfy
rcu_deference_bh()...


Re: bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Eric Dumazet



On 09/28/2018 11:24 AM, Dave Jones wrote:
> Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
> otherwise a trace like below is shown
> 
> WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 
> netdev_lower_get_next_private_rcu+0x34/0x40
> CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
>
> 
> Suggested-by: Cong Wang 
> Signed-off-by: Dave Jones 
> 


You forgot to change patch title.


bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Dave Jones
Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
otherwise a trace like below is shown

WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 
netdev_lower_get_next_private_rcu+0x34/0x40
CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
Workqueue: bond0 bond_mii_monitor
RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 
8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 
0f 1f 40 00 0f 1f 44 00 00 48 8>
RSP: 0018:c987fa68 EFLAGS: 00010046
RAX:  RBX: 880429614560 RCX: 
RDX: 0001 RSI:  RDI: a184ada0
RBP: c987fa80 R08: 0001 R09: 
R10: c987f9f0 R11: 880429798040 R12: 8804289d5980
R13: a1511f60 R14: 00c8 R15: 
FS:  () GS:88042f88() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f4b78fce180 CR3: 00018180f006 CR4: 001606e0
Call Trace:
 bond_poll_controller+0x52/0x170
 netpoll_poll_dev+0x79/0x290
 netpoll_send_skb_on_dev+0x158/0x2c0
 netpoll_send_udp+0x2d5/0x430
 write_ext_msg+0x1e0/0x210
 console_unlock+0x3c4/0x630
 vprintk_emit+0xfa/0x2f0
 printk+0x52/0x6e
 ? __netdev_printk+0x12b/0x220
 netdev_info+0x64/0x80
 ? bond_3ad_set_carrier+0xe9/0x180
 bond_select_active_slave+0x1fc/0x310
 bond_mii_monitor+0x709/0x9b0
 process_one_work+0x221/0x5e0
 worker_thread+0x4f/0x3b0
 kthread+0x100/0x140
 ? process_one_work+0x5e0/0x5e0
 ? kthread_delayed_work_timer_fn+0x90/0x90
 ret_from_fork+0x24/0x30

Suggested-by: Cong Wang 
Signed-off-by: Dave Jones 

-- 
v3: Do this in netpoll_send_skb_on_dev as Cong suggests.

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 3219a2932463..4f9494381635 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
sk_buff *skb,
/* It is up to the caller to keep npinfo alive. */
struct netpoll_info *npinfo;
 
+   rcu_read_lock();
lockdep_assert_irqs_disabled();
 
npinfo = rcu_dereference_bh(np->dev->npinfo);
@@ -374,6 +375,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
sk_buff *skb,
skb_queue_tail(&npinfo->txq, skb);
schedule_delayed_work(&npinfo->tx_work,0);
}
+   rcu_read_unlock();
 }
 EXPORT_SYMBOL(netpoll_send_skb_on_dev);
 


Re: bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Dave Jones
On Fri, Sep 28, 2018 at 10:31:39AM -0700, Cong Wang wrote:
 > On Fri, Sep 28, 2018 at 10:25 AM Dave Jones  wrote:
 > >
 > > On Fri, Sep 28, 2018 at 09:55:52AM -0700, Cong Wang wrote:
 > >  > On Fri, Sep 28, 2018 at 9:18 AM Dave Jones  
 > > wrote:
 > >  > >
 > >  > > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
 > >  > > otherwise a trace like below is shown
 > >  >
 > >  > So why not take rcu read lock in netpoll_send_skb_on_dev() where
 > >  > RCU is also assumed?
 > >
 > > that does seem to solve the backtrace spew I saw too, so if that's
 > > preferable I can respin the patch.
 > 
 > 
 > >From my observations, netpoll_send_skb_on_dev() does not take
 > RCU read lock _and_ it relies on rcu read lock because it calls
 > rcu_dereference_bh().
 > 
 > If my observation is correct, you should catch a RCU warning like
 > this but within netpoll_send_skb_on_dev().
 >
 > >  > As I said, I can't explain why you didn't trigger the RCU warning in
 > >  > netpoll_send_skb_on_dev()...
 > >
 > > netpoll_send_skb_on_dev takes the rcu lock itself.
 > 
 > Could you please point me where exactly is the rcu lock here?
 > 
 > I am too stupid to see it. :)

No, I'm the stupid one. I looked at the tree I had just edited to try your
proposed change. 

Now that I've untangled myself, I'll repost with your suggested change.

Dave



Re: bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Cong Wang
On Fri, Sep 28, 2018 at 10:25 AM Dave Jones  wrote:
>
> On Fri, Sep 28, 2018 at 09:55:52AM -0700, Cong Wang wrote:
>  > On Fri, Sep 28, 2018 at 9:18 AM Dave Jones  wrote:
>  > >
>  > > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
>  > > otherwise a trace like below is shown
>  >
>  > So why not take rcu read lock in netpoll_send_skb_on_dev() where
>  > RCU is also assumed?
>
> that does seem to solve the backtrace spew I saw too, so if that's
> preferable I can respin the patch.


>From my observations, netpoll_send_skb_on_dev() does not take
RCU read lock _and_ it relies on rcu read lock because it calls
rcu_dereference_bh().

If my observation is correct, you should catch a RCU warning like
this but within netpoll_send_skb_on_dev().


>
>  > As I said, I can't explain why you didn't trigger the RCU warning in
>  > netpoll_send_skb_on_dev()...
>
> netpoll_send_skb_on_dev takes the rcu lock itself.

Could you please point me where exactly is the rcu lock here?

I am too stupid to see it. :)


Re: bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Dave Jones
On Fri, Sep 28, 2018 at 09:55:52AM -0700, Cong Wang wrote:
 > On Fri, Sep 28, 2018 at 9:18 AM Dave Jones  wrote:
 > >
 > > Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
 > > otherwise a trace like below is shown
 > 
 > So why not take rcu read lock in netpoll_send_skb_on_dev() where
 > RCU is also assumed?

that does seem to solve the backtrace spew I saw too, so if that's
preferable I can respin the patch.

 > As I said, I can't explain why you didn't trigger the RCU warning in
 > netpoll_send_skb_on_dev()...

netpoll_send_skb_on_dev takes the rcu lock itself.

Dave



Re: bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Cong Wang
On Fri, Sep 28, 2018 at 9:18 AM Dave Jones  wrote:
>
> Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
> otherwise a trace like below is shown

So why not take rcu read lock in netpoll_send_skb_on_dev() where
RCU is also assumed?

As I said, I can't explain why you didn't trigger the RCU warning in
netpoll_send_skb_on_dev()...


bond: take rcu lock in bond_poll_controller

2018-09-28 Thread Dave Jones
Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
otherwise a trace like below is shown

WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 
netdev_lower_get_next_private_rcu+0x34/0x40
CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
Workqueue: bond0 bond_mii_monitor
RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 
8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 
0f 1f 40 00 0f 1f 44 00 00 48 8>
RSP: 0018:c987fa68 EFLAGS: 00010046
RAX:  RBX: 880429614560 RCX: 
RDX: 0001 RSI:  RDI: a184ada0
RBP: c987fa80 R08: 0001 R09: 
R10: c987f9f0 R11: 880429798040 R12: 8804289d5980
R13: a1511f60 R14: 00c8 R15: 
FS:  () GS:88042f88() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f4b78fce180 CR3: 00018180f006 CR4: 001606e0
Call Trace:
 bond_poll_controller+0x52/0x170
 netpoll_poll_dev+0x79/0x290
 netpoll_send_skb_on_dev+0x158/0x2c0
 netpoll_send_udp+0x2d5/0x430
 write_ext_msg+0x1e0/0x210
 console_unlock+0x3c4/0x630
 vprintk_emit+0xfa/0x2f0
 printk+0x52/0x6e
 ? __netdev_printk+0x12b/0x220
 netdev_info+0x64/0x80
 ? bond_3ad_set_carrier+0xe9/0x180
 bond_select_active_slave+0x1fc/0x310
 bond_mii_monitor+0x709/0x9b0
 process_one_work+0x221/0x5e0
 worker_thread+0x4f/0x3b0
 kthread+0x100/0x140
 ? process_one_work+0x5e0/0x5e0
 ? kthread_delayed_work_timer_fn+0x90/0x90
 ret_from_fork+0x24/0x30

Signed-off-by: Dave Jones 

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c05c01a00755..77a3607a7099 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -977,6 +977,7 @@ static void bond_poll_controller(struct net_device 
*bond_dev)
if (bond_3ad_get_active_agg_info(bond, &ad_info))
return;
 
+   rcu_read_lock();
bond_for_each_slave_rcu(bond, slave, iter) {
if (!bond_slave_is_up(slave))
continue;
@@ -992,6 +993,7 @@ static void bond_poll_controller(struct net_device 
*bond_dev)
 
netpoll_poll_dev(slave->dev);
}
+   rcu_read_unlock();
 }
 
 static void bond_netpoll_cleanup(struct net_device *bond_dev)



Re: bond: take rcu lock in bond_poll_controller

2018-09-26 Thread David Miller
From: Dave Jones 
Date: Mon, 24 Sep 2018 15:23:17 -0400

> Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
> otherwise a trace like below is shown
 ...
> Signed-off-by: Dave Jones 

Hey Dave, after some recent changes by Eric Dumazet this no longer
applies.

Please respin against 'net'.

Thanks.


Re: bond: take rcu lock in bond_poll_controller

2018-09-25 Thread Cong Wang
On Mon, Sep 24, 2018 at 1:08 PM Dave Jones  wrote:
>
> Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
> otherwise a trace like below is shown

Interesting, netpoll_send_skb_on_dev() already assumes RCU read lock
when it calls rcu_dereference_bh()...

I wonder how it can't catch such a warning before the one you reported.


bond: take rcu lock in bond_poll_controller

2018-09-24 Thread Dave Jones
Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
otherwise a trace like below is shown

WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 
netdev_lower_get_next_private_rcu+0x34/0x40
CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
Workqueue: bond0 bond_mii_monitor
RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 
8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 
0f 1f 40 00 0f 1f 44 00 00 48 8>
RSP: 0018:c987fa68 EFLAGS: 00010046
RAX:  RBX: 880429614560 RCX: 
RDX: 0001 RSI:  RDI: a184ada0
RBP: c987fa80 R08: 0001 R09: 
R10: c987f9f0 R11: 880429798040 R12: 8804289d5980
R13: a1511f60 R14: 00c8 R15: 
FS:  () GS:88042f88() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f4b78fce180 CR3: 00018180f006 CR4: 001606e0
Call Trace:
 bond_poll_controller+0x52/0x170
 netpoll_poll_dev+0x79/0x290
 netpoll_send_skb_on_dev+0x158/0x2c0
 netpoll_send_udp+0x2d5/0x430
 write_ext_msg+0x1e0/0x210
 console_unlock+0x3c4/0x630
 vprintk_emit+0xfa/0x2f0
 printk+0x52/0x6e
 ? __netdev_printk+0x12b/0x220
 netdev_info+0x64/0x80
 ? bond_3ad_set_carrier+0xe9/0x180
 bond_select_active_slave+0x1fc/0x310
 bond_mii_monitor+0x709/0x9b0
 process_one_work+0x221/0x5e0
 worker_thread+0x4f/0x3b0
 kthread+0x100/0x140
 ? process_one_work+0x5e0/0x5e0
 ? kthread_delayed_work_timer_fn+0x90/0x90
 ret_from_fork+0x24/0x30

Signed-off-by: Dave Jones 

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a764a83f99da..519968d4513b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -978,6 +978,7 @@ static void bond_poll_controller(struct net_device 
*bond_dev)
if (bond_3ad_get_active_agg_info(bond, &ad_info))
return;
 
+   rcu_read_lock();
bond_for_each_slave_rcu(bond, slave, iter) {
ops = slave->dev->netdev_ops;
if (!bond_slave_is_up(slave) || !ops->ndo_poll_controller)
@@ -998,6 +999,7 @@ static void bond_poll_controller(struct net_device 
*bond_dev)
ops->ndo_poll_controller(slave->dev);
up(&ni->dev_lock);
}
+   rcu_read_unlock();
 }
 
 static void bond_netpoll_cleanup(struct net_device *bond_dev)