Re: Lockdep splat with tc filter on 4.11

2017-06-13 Thread Cong Wang
On Tue, Jun 13, 2017 at 10:28 AM, Laura Abbott  wrote:
> Hi,
>
> Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1459626
> of a hang on 4.11.3 with lockdep splat:
>
> [  129.100206] BUG: sleeping function called from invalid context at 
> mm/slab.h:432
> [  129.100237] in_atomic(): 1, irqs_disabled(): 0, pid: 1793, name: tc
> [  129.100239] 2 locks held by tc/1793:
> [  129.100241]  #0:  (rtnl_mutex){+.+.+.}, at: [] 
> rtnl_lock+0x17/0x20
> [  129.100250]  #1:  (&(>tcfa_lock)->rlock){+...+.}, at: 
> [] tcf_act_police_init+0x22d/0x580 [act_police]
> [  129.100259] CPU: 2 PID: 1793 Comm: tc Not tainted 
> 4.11.3-202.fc25.x86_64+debug #1
> [  129.100261] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.9.3-1.fc25 04/01/2014
> [  129.100262] Call Trace:
> [  129.100270]  dump_stack+0x86/0xc3
> [  129.100274]  ___might_sleep+0x17d/0x250
> [  129.100277]  __might_sleep+0x4a/0x80
> [  129.100280]  kmem_cache_alloc_trace+0x1e6/0x2d0
> [  129.100284]  ? gen_new_estimator+0x76/0x1d0
> [  129.100288]  gen_new_estimator+0x76/0x1d0
> [  129.100293]  gen_replace_estimator+0xe/0x10
> [  129.100296]  tcf_act_police_init+0x255/0x580 [act_police]


Thanks for the report!

That spinlock is intended to serialize police->tcf_rate_est,
but it looks unnecessary since we already have rtnl lock.

I will work on a patch.


Lockdep splat with tc filter on 4.11

2017-06-13 Thread Laura Abbott
Hi,

Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1459626
of a hang on 4.11.3 with lockdep splat:

[  129.100206] BUG: sleeping function called from invalid context at 
mm/slab.h:432
[  129.100237] in_atomic(): 1, irqs_disabled(): 0, pid: 1793, name: tc
[  129.100239] 2 locks held by tc/1793:
[  129.100241]  #0:  (rtnl_mutex){+.+.+.}, at: [] 
rtnl_lock+0x17/0x20
[  129.100250]  #1:  (&(>tcfa_lock)->rlock){+...+.}, at: 
[] tcf_act_police_init+0x22d/0x580 [act_police]
[  129.100259] CPU: 2 PID: 1793 Comm: tc Not tainted 
4.11.3-202.fc25.x86_64+debug #1
[  129.100261] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-1.fc25 04/01/2014
[  129.100262] Call Trace:
[  129.100270]  dump_stack+0x86/0xc3
[  129.100274]  ___might_sleep+0x17d/0x250
[  129.100277]  __might_sleep+0x4a/0x80
[  129.100280]  kmem_cache_alloc_trace+0x1e6/0x2d0
[  129.100284]  ? gen_new_estimator+0x76/0x1d0
[  129.100288]  gen_new_estimator+0x76/0x1d0
[  129.100293]  gen_replace_estimator+0xe/0x10
[  129.100296]  tcf_act_police_init+0x255/0x580 [act_police]
[  129.100304]  tcf_action_init_1+0xb9/0x340
[  129.100316]  tcf_exts_validate+0x52/0x110
[  129.100320]  fw_change_attrs+0xbb/0x2b0 [cls_fw]
[  129.100324]  ? rcu_read_lock_sched_held+0x4a/0x80
[  129.100329]  fw_change+0x3ad/0x4bb [cls_fw]
[  129.100338]  tc_ctl_tfilter+0x470/0xa80
[  129.100350]  rtnetlink_rcv_msg+0xe6/0x210
[  129.100354]  ? sched_clock+0x9/0x10
[  129.100358]  ? sched_clock_cpu+0x11/0xc0
[  129.100360]  ? netlink_unicast+0x1cd/0x2f0
[  129.100363]  ? rtnl_newlink+0x8e0/0x8e0
[  129.100366]  netlink_rcv_skb+0xa4/0xc0
[  129.100369]  rtnetlink_rcv+0x2a/0x40
[  129.100371]  netlink_unicast+0x1f5/0x2f0
[  129.100375]  netlink_sendmsg+0x302/0x3c0
[  129.100382]  sock_sendmsg+0x38/0x50
[  129.100385]  ___sys_sendmsg+0x2f6/0x310
[  129.100392]  ? _raw_spin_unlock+0x27/0x40
[  129.100396]  ? __handle_mm_fault+0x539/0x1170
[  129.100401]  ? __do_page_fault+0x28e/0x520
[  129.100405]  ? kvm_sched_clock_read+0x25/0x40
[  129.100407]  ? sched_clock+0x9/0x10
[  129.100409]  ? sched_clock_cpu+0x11/0xc0
[  129.100411]  ? __do_page_fault+0x28e/0x520
[  129.100420]  __sys_sendmsg+0x54/0x90
[  129.100422]  ? __sys_sendmsg+0x54/0x90
[  129.100430]  SyS_sendmsg+0x12/0x20
[  129.100432]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[  129.100434] RIP: 0033:0x7f50dffca037
[  129.100436] RSP: 002b:7ffe99d12698 EFLAGS: 0246 ORIG_RAX: 
002e
[  129.100438] RAX: ffda RBX: 7ffe99d1e7e0 RCX: 7f50dffca037
[  129.100439] RDX:  RSI: 7ffe99d126e0 RDI: 0003
[  129.100441] RBP: 7ffe99d126e0 R08: 0001 R09: 
[  129.100442] R10: 05eb R11: 0246 R12: 7ffe99d12720
[  129.100443] R13: 00668060 R14: 7ffe99d1a7b0 R15: 

[  129.100454] =
[  129.100455] [ INFO: possible recursive locking detected ]
[  129.100456] 4.11.3-202.fc25.x86_64+debug #1 Tainted: GW  
[  129.100456] -
[  129.100457] tc/1793 is trying to acquire lock:
[  129.100457]  (&(>tcfa_lock)->rlock){+...+.}, at: [] 
est_fetch_counters+0x23/0x50
[  129.100460] 
   but task is already holding lock:
[  129.100460]  (&(>tcfa_lock)->rlock){+...+.}, at: [] 
tcf_act_police_init+0x22d/0x580 [act_police]
[  129.100463] 
   other info that might help us debug this:
[  129.100463]  Possible unsafe locking scenario:

[  129.100464]CPU0
[  129.100465]
[  129.100465]   lock(&(>tcfa_lock)->rlock);
[  129.100466]   lock(&(>tcfa_lock)->rlock);
[  129.100467] 
*** DEADLOCK ***

[  129.100468]  May be due to missing lock nesting notation

[  129.100469] 2 locks held by tc/1793:
[  129.100470]  #0:  (rtnl_mutex){+.+.+.}, at: [] 
rtnl_lock+0x17/0x20
[  129.100472]  #1:  (&(>tcfa_lock)->rlock){+...+.}, at: 
[] tcf_act_police_init+0x22d/0x580 [act_police]
[  129.100474] 
   stack backtrace:
[  129.100475] CPU: 2 PID: 1793 Comm: tc Tainted: GW   
4.11.3-202.fc25.x86_64+debug #1
[  129.100476] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-1.fc25 04/01/2014
[  129.100477] Call Trace:
[  129.100478]  dump_stack+0x86/0xc3
[  129.100480]  __lock_acquire+0x858/0x1450
[  129.100481]  ? ___slab_alloc+0x2a9/0x5f0
[  129.100483]  lock_acquire+0xe3/0x1d0
[  129.100484]  ? gen_new_estimator+0x76/0x1d0
[  129.100486]  ? lock_acquire+0xe3/0x1d0
[  129.100487]  ? est_fetch_counters+0x23/0x50
[  129.100488]  _raw_spin_lock+0x34/0x70
[  129.100489]  ? est_fetch_counters+0x23/0x50
[  129.100491]  est_fetch_counters+0x23/0x50
[  129.100492]  gen_new_estimator+0xdf/0x1d0
[  129.100493]  gen_replace_estimator+0xe/0x10
[  129.100495]  tcf_act_police_init+0x255/0x580 [act_police]
[  129.100496]  tcf_action_init_1+0xb9/0x340
[  129.100499]  tcf_exts_validate+0x52/0x110
[  129.100537]