Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-14 Thread Cong Wang
On Tue, Aug 14, 2018 at 10:35 AM Vlad Buslov  wrote:
>
>
> On Mon 13 Aug 2018 at 23:18, Cong Wang  wrote:
> > Hi, Vlad,
> >
> > Could you help to test my fixes?
> >
> > I just pushed them into my own git repo:
> > https://github.com/congwang/linux/commits/net-sched-fixes
> >
> > Particularly, this is the revert:
> > https://github.com/congwang/linux/commit/b3f51c4ab8272cc8d3244848e528fce1426c4659
> > and this is my fix for the lockdep warning you reported:
> > https://github.com/congwang/linux/commit/ecadcde94919183e9f0d5bc376f05e731baf2661
> >
> > I don't have environment to test ife modules.
>
> Hi Cong,
>
> I've run the test with your patch applied and couldn't reproduce the
> lockdep warning.
>

Thank you! I will add your Test-by and send them out.


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-14 Thread Vlad Buslov


On Mon 13 Aug 2018 at 23:18, Cong Wang  wrote:
> Hi, Vlad,
>
> Could you help to test my fixes?
>
> I just pushed them into my own git repo:
> https://github.com/congwang/linux/commits/net-sched-fixes
>
> Particularly, this is the revert:
> https://github.com/congwang/linux/commit/b3f51c4ab8272cc8d3244848e528fce1426c4659
> and this is my fix for the lockdep warning you reported:
> https://github.com/congwang/linux/commit/ecadcde94919183e9f0d5bc376f05e731baf2661
>
> I don't have environment to test ife modules.

Hi Cong,

I've run the test with your patch applied and couldn't reproduce the
lockdep warning.

>
> BTW, this is the fix for the deadlock I spotted:
> https://github.com/congwang/linux/commit/44f3d7f5b6ed2d4a46177e6c658fa23b76141afa
>
> Thanks!



Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Cong Wang
Hi, Vlad,

Could you help to test my fixes?

I just pushed them into my own git repo:
https://github.com/congwang/linux/commits/net-sched-fixes

Particularly, this is the revert:
https://github.com/congwang/linux/commit/b3f51c4ab8272cc8d3244848e528fce1426c4659
and this is my fix for the lockdep warning you reported:
https://github.com/congwang/linux/commit/ecadcde94919183e9f0d5bc376f05e731baf2661

I don't have environment to test ife modules.

BTW, this is the fix for the deadlock I spotted:
https://github.com/congwang/linux/commit/44f3d7f5b6ed2d4a46177e6c658fa23b76141afa

Thanks!


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Cong Wang
On Mon, Aug 13, 2018 at 3:53 PM David Miller  wrote:
>
> From: Cong Wang 
> Date: Mon, 13 Aug 2018 12:16:52 -0700
>
> > Your fix doesn't make sense, because what ife_mod_lock protects
> > is absolutely not touched in BH context, they have no race.
>
> It does make sense, the problem is if you acquire ife_mod_lock and
> take a software interrupt while you hold it.
>
> If that software interrupt takes the tcfa_lock, we're setup for an
> AB-BA deadlock.


The lockdep does make sense, for sure. The fix does NOT.



>
> And there is also no easy way to reverse the lock ordering to
> avoid this either.

There is.


>
> I therefore think his fix is perfectly fine and that's why I
> applied it.

I will send a revert and a better fix.

Thanks.


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread David Miller
From: Cong Wang 
Date: Mon, 13 Aug 2018 12:16:52 -0700

> Your fix doesn't make sense, because what ife_mod_lock protects
> is absolutely not touched in BH context, they have no race.

It does make sense, the problem is if you acquire ife_mod_lock and
take a software interrupt while you hold it.

If that software interrupt takes the tcfa_lock, we're setup for an
AB-BA deadlock.

And there is also no easy way to reverse the lock ordering to
avoid this either.

I therefore think his fix is perfectly fine and that's why I
applied it.


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Cong Wang
On Mon, Aug 13, 2018 at 12:16 PM Cong Wang  wrote:
>
> On Mon, Aug 13, 2018 at 10:20 AM Vlad Buslov  wrote:
> >
> > Lockdep reports deadlock for following locking scenario in ife action:
> >
> > Task one:
> > 1) Executes ife action update.
> > 2) Takes tcfa_lock.
> > 3) Waits on ife_mod_lock which is already taken by task two.
> >
> > Task two:
> >
> > 1) Executes any path that obtains ife_mod_lock without disabling bh (any
> > path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like
> > loading a meta module, or creating new action.
> > 2) Takes ife_mod_lock.
> > 3) Task is preempted by rate estimator timer.
> > 4) Timer callback waits on tcfa_lock which is taken by task one.
> >
> > In described case tasks deadlock because they take same two locks in
> > different order. To prevent potential deadlock reported by lockdep, always
> > disable bh when obtaining ife_mod_lock.
>
> Your fix doesn't make sense, because what ife_mod_lock protects
> is absolutely not touched in BH context, they have no race.
>
> The only time you need tcfa_lock is when adding it to ->metalist:
>
> list_add_tail(>metalist, >metalist);
>
> when it already exists.
>
> Which means you can just take tcfa_lock after taking ife_mod_lock.

BTW, there is an obvious deadlock:

use_all_metadata() acquires read_lock(_mod_lock), then calls
add_metainfo() which calls find_ife_oplist() which acquires the same
lock

But this is _irreverent_ to your fix, just want to point it out.


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Cong Wang
On Mon, Aug 13, 2018 at 10:20 AM Vlad Buslov  wrote:
>
> Lockdep reports deadlock for following locking scenario in ife action:
>
> Task one:
> 1) Executes ife action update.
> 2) Takes tcfa_lock.
> 3) Waits on ife_mod_lock which is already taken by task two.
>
> Task two:
>
> 1) Executes any path that obtains ife_mod_lock without disabling bh (any
> path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like
> loading a meta module, or creating new action.
> 2) Takes ife_mod_lock.
> 3) Task is preempted by rate estimator timer.
> 4) Timer callback waits on tcfa_lock which is taken by task one.
>
> In described case tasks deadlock because they take same two locks in
> different order. To prevent potential deadlock reported by lockdep, always
> disable bh when obtaining ife_mod_lock.

Your fix doesn't make sense, because what ife_mod_lock protects
is absolutely not touched in BH context, they have no race.

The only time you need tcfa_lock is when adding it to ->metalist:

list_add_tail(>metalist, >metalist);

when it already exists.

Which means you can just take tcfa_lock after taking ife_mod_lock.


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread David Miller
From: Vlad Buslov 
Date: Mon, 13 Aug 2018 20:20:11 +0300

> Lockdep reports deadlock for following locking scenario in ife action:
> 
> Task one:
> 1) Executes ife action update.
> 2) Takes tcfa_lock.
> 3) Waits on ife_mod_lock which is already taken by task two.
> 
> Task two:
> 
> 1) Executes any path that obtains ife_mod_lock without disabling bh (any
> path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like
> loading a meta module, or creating new action.
> 2) Takes ife_mod_lock.
> 3) Task is preempted by rate estimator timer.
> 4) Timer callback waits on tcfa_lock which is taken by task one.
> 
> In described case tasks deadlock because they take same two locks in
> different order. To prevent potential deadlock reported by lockdep, always
> disable bh when obtaining ife_mod_lock.
> 
> Lockdep warning:
 ...
> Fixes: ef6980b6becb ("introduce IFE action")
> Signed-off-by: Vlad Buslov 

Applied and queued up for -stable, thanks.


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Vlad Buslov


On Mon 13 Aug 2018 at 17:23, Jamal Hadi Salim  wrote:
> On 2018-08-13 1:20 p.m., Vlad Buslov wrote:
>> Lockdep reports deadlock for following locking scenario in ife action:
>> 
>> Task one:
>> 1) Executes ife action update.
>> 2) Takes tcfa_lock.
>> 3) Waits on ife_mod_lock which is already taken by task two.
>> 
>> Task two:
>> 
>> 1) Executes any path that obtains ife_mod_lock without disabling bh (any
>> path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like
>> loading a meta module, or creating new action.
>> 2) Takes ife_mod_lock.
>> 3) Task is preempted by rate estimator timer.
>> 4) Timer callback waits on tcfa_lock which is taken by task one.
>> 
>> In described case tasks deadlock because they take same two locks in
>> different order. To prevent potential deadlock reported by lockdep, always
>> disable bh when obtaining ife_mod_lock.
>> 
>
> Looks like your recent changes on net-next exposed this.

Its because I've recently expanded my private tests to create all kinds
of actions with estimator.

>
> Acked-by: Jamal Hadi Salim 
>
> cheers,
> jamal



Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread David Miller
From: Vlad Buslov 
Date: Mon, 13 Aug 2018 20:26:40 +0300

> Is it okay to submit a fix for issue I uncovered when testing actions
> with estimators, or I should resubmit to net when net-next is moved?

Yes, this is fine.


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Vlad Buslov
Hi David,

Is it okay to submit a fix for issue I uncovered when testing actions
with estimators, or I should resubmit to net when net-next is moved?

Thanks,
Vlad


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Jamal Hadi Salim

On 2018-08-13 1:20 p.m., Vlad Buslov wrote:

Lockdep reports deadlock for following locking scenario in ife action:

Task one:
1) Executes ife action update.
2) Takes tcfa_lock.
3) Waits on ife_mod_lock which is already taken by task two.

Task two:

1) Executes any path that obtains ife_mod_lock without disabling bh (any
path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like
loading a meta module, or creating new action.
2) Takes ife_mod_lock.
3) Task is preempted by rate estimator timer.
4) Timer callback waits on tcfa_lock which is taken by task one.

In described case tasks deadlock because they take same two locks in
different order. To prevent potential deadlock reported by lockdep, always
disable bh when obtaining ife_mod_lock.



Looks like your recent changes on net-next exposed this.

Acked-by: Jamal Hadi Salim 

cheers,
jamal


[PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Vlad Buslov
Lockdep reports deadlock for following locking scenario in ife action:

Task one:
1) Executes ife action update.
2) Takes tcfa_lock.
3) Waits on ife_mod_lock which is already taken by task two.

Task two:

1) Executes any path that obtains ife_mod_lock without disabling bh (any
path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like
loading a meta module, or creating new action.
2) Takes ife_mod_lock.
3) Task is preempted by rate estimator timer.
4) Timer callback waits on tcfa_lock which is taken by task one.

In described case tasks deadlock because they take same two locks in
different order. To prevent potential deadlock reported by lockdep, always
disable bh when obtaining ife_mod_lock.

Lockdep warning:

[  508.101192] =
[  508.107708] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
[  508.114728] 4.18.0-rc8+ #646 Not tainted
[  508.119050] -
[  508.125559] tc/5460 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
[  508.132025] 5a938c68 (ife_mod_lock){}, at: 
find_ife_oplist+0x1e/0xc0 [act_ife]
[  508.140996]
   and this task is already holding:
[  508.147548] d46f6c56 (&(>tcfa_lock)->rlock){+.-.}, at: 
tcf_ife_init+0x6ae/0xf40 [act_ife]
[  508.157371] which would create a new lock dependency:
[  508.162828]  (&(>tcfa_lock)->rlock){+.-.} -> (ife_mod_lock){}
[  508.169572]
   but this new dependency connects a SOFTIRQ-irq-safe lock:
[  508.178197]  (&(>tcfa_lock)->rlock){+.-.}
[  508.178201]
   ... which became SOFTIRQ-irq-safe at:
[  508.189771]   _raw_spin_lock+0x2c/0x40
[  508.193906]   est_fetch_counters+0x41/0xb0
[  508.198391]   est_timer+0x83/0x3c0
[  508.202180]   call_timer_fn+0x16a/0x5d0
[  508.206400]   run_timer_softirq+0x399/0x920
[  508.210967]   __do_softirq+0x157/0x97d
[  508.215102]   irq_exit+0x152/0x1c0
[  508.21]   smp_apic_timer_interrupt+0xc0/0x4e0
[  508.223976]   apic_timer_interrupt+0xf/0x20
[  508.228540]   cpuidle_enter_state+0xf8/0x5d0
[  508.233198]   do_idle+0x28a/0x350
[  508.236881]   cpu_startup_entry+0xc7/0xe0
[  508.241296]   start_secondary+0x2e8/0x3f0
[  508.245678]   secondary_startup_64+0xa5/0xb0
[  508.250347]
   to a SOFTIRQ-irq-unsafe lock:  (ife_mod_lock){}
[  508.256531]
   ... which became SOFTIRQ-irq-unsafe at:
[  508.267279] ...
[  508.267283]   _raw_write_lock+0x2c/0x40
[  508.273653]   register_ife_op+0x118/0x2c0 [act_ife]
[  508.278926]   do_one_initcall+0xf7/0x4d9
[  508.283214]   do_init_module+0x18b/0x44e
[  508.287521]   load_module+0x4167/0x5730
[  508.291739]   __do_sys_finit_module+0x16d/0x1a0
[  508.296654]   do_syscall_64+0x7a/0x3f0
[  508.300788]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  508.306302]
   other info that might help us debug this:

[  508.315286]  Possible interrupt unsafe locking scenario:

[  508.322771]CPU0CPU1
[  508.327681]
[  508.332604]   lock(ife_mod_lock);
[  508.336300]local_irq_disable();
[  508.342608]lock(&(>tcfa_lock)->rlock);
[  508.349793]lock(ife_mod_lock);
[  508.355990]   
[  508.358974] lock(&(>tcfa_lock)->rlock);
[  508.363803]
*** DEADLOCK ***

[  508.370715] 2 locks held by tc/5460:
[  508.374680]  #0: e27e4fa4 (rtnl_mutex){+.+.}, at: 
rtnetlink_rcv_msg+0x583/0x7b0
[  508.383366]  #1: d46f6c56 (&(>tcfa_lock)->rlock){+.-.}, at: 
tcf_ife_init+0x6ae/0xf40 [act_ife]
[  508.393648]
   the dependencies between SOFTIRQ-irq-safe lock and the holding 
lock:
[  508.403505] -> (&(>tcfa_lock)->rlock){+.-.} ops: 1001553 {
[  508.409646]HARDIRQ-ON-W at:
[  508.413136] _raw_spin_lock_bh+0x34/0x40
[  508.419059] gnet_stats_start_copy_compat+0xa2/0x230
[  508.426021] gnet_stats_start_copy+0x16/0x20
[  508.432333] tcf_action_copy_stats+0x95/0x1d0
[  508.438735] tcf_action_dump_1+0xb0/0x4e0
[  508.444795] tcf_action_dump+0xca/0x200
[  508.450673] tcf_exts_dump+0xd9/0x320
[  508.456392] fl_dump+0x1b7/0x4a0 [cls_flower]
[  508.462798] tcf_fill_node+0x380/0x530
[  508.468601] tfilter_notify+0xdf/0x1c0
[  508.474404] tc_new_tfilter+0x84a/0xc90
[  508.480270] rtnetlink_rcv_msg+0x5bd/0x7b0
[  508.486419] netlink_rcv_skb+0x184/0x220
[  508.492394] netlink_unicast+0x31b/0x460
[  508.507411] netlink_sendmsg+0x3fb/0x840
[  508.513390] sock_sendmsg+0x7b/0xd0
[  508.518907] ___sys_sendmsg+0x4c6/0x610
[  508.524797] __sys_sendmsg+0xd7/0x150