Bob Copeland wrote:
> On Tue, Apr 14, 2009 at 3:19 PM, Ben Greear <gree...@candelatech.com> wrote:
>> Patrick McHardy has been working on patches for this feature
>> for me.  I've hacked on it a small bit too, so he doesn't get all the blame.
> 
> Great, thanks for working on it!  Looks like you put a lot of work into it.
> 
>> lockdep warning on some race with handler callbacks.
> 
> What's the warning?



=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.29.1c3 #20
-------------------------------------------------------
ip/3220 is trying to acquire lock:
(&ifsta->work){--..}, at: [<c0135c9f>] __cancel_work_timer+0x3f/0x190

but task is already holding lock:
(rtnl_mutex){--..}, at: [<c032d71f>] rtnl_lock+0xf/0x20

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (rtnl_mutex){--..}:
       [<c0149963>] validate_chain+0xb83/0x1120
       [<c014a137>] __lock_acquire+0x237/0xa30
       [<c014a98c>] lock_acquire+0x5c/0x80
       [<c039e53d>] mutex_lock_nested+0x8d/0x2f0
       [<c032d71f>] rtnl_lock+0xf/0x20
       [<d062eabb>] nl80211_new_interface+0x9b/0x130 [cfg80211]
       [<c033e27a>] genl_rcv_msg+0x15a/0x190
       [<c033b53d>] netlink_rcv_skb+0x7d/0xa0
       [<c033d78e>] genl_rcv+0x1e/0x30
       [<c033b34d>] netlink_unicast+0x23d/0x240
       [<c033bb06>] netlink_sendmsg+0x1d6/0x290
       [<c0315421>] sock_sendmsg+0xd1/0xf0
       [<c031554a>] sys_sendmsg+0x10a/0x220
       [<c0316a13>] sys_socketcall+0x263/0x290
       [<c010341d>] sysenter_do_call+0x12/0x31
       [<ffffffff>] 0xffffffff

-> #2 (&drv->mtx){--..}:
       [<c0149963>] validate_chain+0xb83/0x1120
       [<c014a137>] __lock_acquire+0x237/0xa30
       [<c014a98c>] lock_acquire+0x5c/0x80
       [<c039e53d>] mutex_lock_nested+0x8d/0x2f0
       [<d062a55a>] cfg80211_get_dev_from_ifindex+0x4a/0x70 [cfg80211]
       [<d062c503>] get_drv_dev_by_info_ifindex+0x53/0x70 [cfg80211]
       [<d062f52f>] nl80211_get_interface+0x1f/0xc0 [cfg80211]
       [<c033e27a>] genl_rcv_msg+0x15a/0x190
       [<c033b53d>] netlink_rcv_skb+0x7d/0xa0
       [<c033d78e>] genl_rcv+0x1e/0x30
       [<c033b34d>] netlink_unicast+0x23d/0x240
       [<c033bb06>] netlink_sendmsg+0x1d6/0x290
       [<c0315421>] sock_sendmsg+0xd1/0xf0
       [<c031554a>] sys_sendmsg+0x10a/0x220
       [<c0316a13>] sys_socketcall+0x263/0x290
       [<c010341d>] sysenter_do_call+0x12/0x31
       [<ffffffff>] 0xffffffff

-> #1 (cfg80211_drv_mutex){--..}:
       [<c0149963>] validate_chain+0xb83/0x1120
       [<c014a137>] __lock_acquire+0x237/0xa30
       [<c014a98c>] lock_acquire+0x5c/0x80
       [<c039e53d>] mutex_lock_nested+0x8d/0x2f0
       [<d062bd25>] regulatory_hint_11d+0x25/0x320 [cfg80211]
       [<d07580b8>] ieee80211_sta_work+0x328/0x11a0 [mac80211]
       [<c01357c1>] run_workqueue+0x161/0x1f0
       [<c0136307>] worker_thread+0x97/0xf0
       [<c0139002>] kthread+0x42/0x70
       [<c0103ca7>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (&ifsta->work){--..}:
       [<c0149447>] validate_chain+0x667/0x1120
       [<c014a137>] __lock_acquire+0x237/0xa30
       [<c014a98c>] lock_acquire+0x5c/0x80
       [<c0135cd9>] __cancel_work_timer+0x79/0x190
       [<c0135e0a>] cancel_work_sync+0xa/0x10
       [<d075a2cc>] ieee80211_stop+0x1cc/0x4f0 [mac80211]
       [<c03239a2>] dev_close+0x62/0xb0
       [<c0323797>] dev_change_flags+0x77/0x180
       [<c032cc15>] do_setlink+0x325/0x400
       [<c032e174>] rtnl_newlink+0x344/0x420
       [<c032d918>] rtnetlink_rcv_msg+0x1c8/0x200
       [<c033b53d>] netlink_rcv_skb+0x7d/0xa0
       [<c032d747>] rtnetlink_rcv+0x17/0x20
       [<c033b34d>] netlink_unicast+0x23d/0x240
       [<c033bb06>] netlink_sendmsg+0x1d6/0x290
       [<c0315421>] sock_sendmsg+0xd1/0xf0
       [<c031554a>] sys_sendmsg+0x10a/0x220
       [<c0316a13>] sys_socketcall+0x263/0x290
       [<c010341d>] sysenter_do_call+0x12/0x31
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by ip/3220:
#0:  (rtnl_mutex){--..}, at: [<c032d71f>] rtnl_lock+0xf/0x20

stack backtrace:
Pid: 3220, comm: ip Tainted: P           2.6.29.1c3 #20
Call Trace:
[<c0148d88>] print_circular_bug_tail+0x78/0xd0
[<c0149447>] validate_chain+0x667/0x1120
[<c01491c8>] ? validate_chain+0x3e8/0x1120
[<c014a137>] __lock_acquire+0x237/0xa30
[<c014a137>] ? __lock_acquire+0x237/0xa30
[<c014a98c>] lock_acquire+0x5c/0x80
[<c0135c9f>] ? __cancel_work_timer+0x3f/0x190
[<c0135cd9>] __cancel_work_timer+0x79/0x190
[<c0135c9f>] ? __cancel_work_timer+0x3f/0x190
[<c03a03f5>] ? _spin_unlock_irqrestore+0x55/0x70
[<c014866c>] ? trace_hardirqs_on_caller+0xfc/0x190
[<c012edc9>] ? del_timer+0x59/0x70
[<c014870b>] ? trace_hardirqs_on+0xb/0x10
[<c03a03d9>] ? _spin_unlock_irqrestore+0x39/0x70
[<c0135e0a>] cancel_work_sync+0xa/0x10
[<d075a2cc>] ieee80211_stop+0x1cc/0x4f0 [mac80211]
[<d075a17a>] ? ieee80211_stop+0x7a/0x4f0 [mac80211]
[<c03239a2>] dev_close+0x62/0xb0
[<c0321d8a>] ? dev_set_rx_mode+0x2a/0x40
[<c0323797>] dev_change_flags+0x77/0x180
[<c032cc15>] do_setlink+0x325/0x400
[<c014846c>] ? mark_held_locks+0x3c/0x90
[<c014846c>] ? mark_held_locks+0x3c/0x90
[<c01618dc>] ? __rcu_process_callbacks+0x16c/0x230
[<c032e174>] rtnl_newlink+0x344/0x420
[<c01491c8>] ? validate_chain+0x3e8/0x1120
[<c014849f>] ? mark_held_locks+0x6f/0x90
[<c039e6d9>] ? mutex_lock_nested+0x229/0x2f0
[<c032de30>] ? rtnl_newlink+0x0/0x420
[<c032d918>] rtnetlink_rcv_msg+0x1c8/0x200
[<c032d71f>] ? rtnl_lock+0xf/0x20
[<c032d750>] ? rtnetlink_rcv_msg+0x0/0x200
[<c033b53d>] netlink_rcv_skb+0x7d/0xa0
[<c032d747>] rtnetlink_rcv+0x17/0x20
[<c033b34d>] netlink_unicast+0x23d/0x240
[<c031dad1>] ? memcpy_fromiovec+0x41/0x60
[<c033bb06>] netlink_sendmsg+0x1d6/0x290
[<c0315421>] sock_sendmsg+0xd1/0xf0
[<c01392e0>] ? autoremove_wake_function+0x0/0x50
[<c017e4c0>] ? might_fault+0x50/0xa0
[<c023195c>] ? copy_from_user+0x3c/0x70
[<c031dd6c>] ? verify_iovec+0x2c/0x90
[<c031554a>] sys_sendmsg+0x10a/0x220
[<c01491c8>] ? validate_chain+0x3e8/0x1120
[<c017b8d9>] ? __do_fault+0x199/0x400
[<c014a137>] ? __lock_acquire+0x237/0xa30
[<c017e4c0>] ? might_fault+0x50/0xa0
[<c017e4c0>] ? might_fault+0x50/0xa0
[<c0316a13>] sys_socketcall+0x263/0x290
[<c0118ca0>] ? do_page_fault+0x0/0x630
[<c014866c>] ? trace_hardirqs_on_caller+0xfc/0x190
[<c010341d>] sysenter_do_call+0x12/0x31




> 
>> soft-lockup in irq occassionally
> 
> Do you get a stack trace here?  (try booting with nmi_watchdog=1 if not).


ath5k phy0: noise floor calibration failed (2462MHz)
BUG: soft lockup - CPU#0 stuck for 61s! [iwconfig:24158]
Modules linked in: michael_mic ath5k mac80211 cfg80211 nfs lockd auth_rpcgss 
8021q garp stp llc nf_d]
irq event stamp: 90171
hardirqs last  enabled at (90170): [<c010354c>] restore_nocheck_notrace+0x0/0xe
hardirqs last disabled at (90171): [<c0103ae6>] apic_timer_interrupt+0x26/0x40
softirqs last  enabled at (5956): [<f8af0bdf>] ath5k_reset+0x10f/0x350 [ath5k]
softirqs last disabled at (5961): [<c012a695>] do_softirq+0x55/0x60

Pid: 24158, comm: iwconfig Tainted: P           (2.6.29.1c3 #22)
EIP: 0060:[<c012a56b>] EFLAGS: 00200246 CPU: 0
EIP is at __do_softirq+0x4b/0x120
EAX: 0000174a EBX: 00000002 ECX: 00000004 EDX: f79ff77c
ESI: 00000000 EDI: eed24788 EBP: e99ffc70 ESP: e99ffc4c
  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
CR0: 8005003b CR2: 43a0ae74 CR3: 2b47c000 CR4: 00000690
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Call Trace:
  [<c013fa15>] ? profile_tick+0x35/0x80
  [<c012a695>] do_softirq+0x55/0x60
  [<c012a98a>] irq_exit+0x7a/0x80
  [<c0114b49>] smp_apic_timer_interrupt+0x49/0x80
  [<c0231534>] ? trace_hardirqs_off_thunk+0xc/0x18
  [<c0103aed>] apic_timer_interrupt+0x2d/0x40
  [<f89acd21>] ? ieee80211_wake_queues+0x1/0x10 [mac80211]
  [<f8af0e4f>] ? ath5k_reset_wake+0x2f/0x40 [ath5k]
  [<f8af2054>] ath5k_config_interface+0xa4/0x540 [ath5k]
  [<c01491c8>] ? validate_chain+0x3e8/0x1120
  [<f8994f9a>] ieee80211_if_config+0x9a/0x170 [mac80211]
  [<c014a137>] ? __lock_acquire+0x237/0xa30
  [<f899c93f>] ieee80211_sta_set_bssid+0xaf/0xd0 [mac80211]
  [<f8995ca2>] ieee80211_ioctl_siwap+0xc2/0x150 [mac80211]
  [<c03931f3>] ioctl_standard_call+0x53/0x330
  [<c032d70f>] ? rtnl_lock+0xf/0x20
  [<c032d70f>] ? rtnl_lock+0xf/0x20
  [<c032127d>] ? __dev_get_by_name+0x7d/0xa0
  [<c0392f8d>] wext_handle_ioctl+0x1dd/0x1f0
  [<f8995be0>] ? ieee80211_ioctl_siwap+0x0/0x150 [mac80211]
  [<c03149e0>] ? sock_ioctl+0x0/0x270
  [<c032498b>] dev_ioctl+0x3ab/0x500
  [<c03149e0>] ? sock_ioctl+0x0/0x270
  [<c0314a33>] sock_ioctl+0x53/0x270
  [<c03149e0>] ? sock_ioctl+0x0/0x270
  [<c019d368>] vfs_ioctl+0x28/0x80
  [<c019d6ac>] do_vfs_ioctl+0x1ec/0x570
  [<c0118eb5>] ? do_page_fault+0x215/0x630
  [<c013d0d6>] ? up_read+0x16/0x30
  [<c0118eb5>] ? do_page_fault+0x215/0x630
  [<c031682b>] ? sys_socketcall+0x8b/0x290
  [<c010354c>] ? restore_nocheck_notrace+0x0/0xe
  [<c0118ca0>] ? do_page_fault+0x0/0x630
  [<c019da69>] sys_ioctl+0x39/0x60
  [<c010341d>] sysenter_do_call+0x12/0x31

> 
>> modprobe ath5k with nohwaccel=1  (I think that's the name...test system is
> 
> nohwcrypt=1... but it would be nice if we swapped in/out keys into the
> hw crypto key cache based on vif.

Seems the HW hashes on the AP's MAC, and so if you have multiple VIFS
associated with the same AP, it breaks.  This is from memory,
so I could be wrong here.

>> I would welcome inclusion of any/all of this code and/or help
>> with debugging if anyone wants to give it a try.
> 
> Ok, well this change touches a lot of different things (mac80211 and nl80211
> changes e.g. should at least be reviewed by the relevant maintainers).
> So when you think it's ready, a good plan of attack is to split it up
> into smaller patches that only do one logical change -- that will make
> it easier to review.
> 
> Also note that the ath5k/9k merge hit wireless-testing yesterday evening
> so the patch needs rebasing to apply on top of the current tree.  It'd
> be nice if 9k and 5k did this stuff the same way; I think Jouni Malinen
> posted VAP patches around 3/30 to linux-wireless for ath9k.

Yes, I put it all in one patch since we don't have things sorted out
quite yet.

Will probably wait to merge until stuff hits mainline unless there is
a reason (ie, it will fix some bugs we're hitting) to merge against a dev tree.

Thanks,
Ben

-- 
Ben Greear <gree...@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

_______________________________________________
ath5k-devel mailing list
ath5k-devel@lists.ath5k.org
https://lists.ath5k.org/mailman/listinfo/ath5k-devel

Reply via email to