BUG when doing rmmod

2014-08-28 Thread Kalle Valo
Hi,

while I was testing testmode patches I noticed ath10k was crashing
during rmmod. Further investigation showed that I actually see the crash
also without testmode patches (using commit 346c16654356) . I just need
to reload ath10k modules in sequence, usually 10-15 times is enough to
reprocuce the crash.

I also tested with ath10k: re-enable interrupts properly in hw
recovery but it doesn't seem to make any difference.

[  118.520582] cfg80211: Calling CRDA to update world regulatory domain
[  118.554828] cfg80211: World regulatory domain updated:
[  118.554897] cfg80211:  DFS Master region: unset
[  118.554951] cfg80211:   (start_freq - end_freq @ bandwidth), 
(max_antenna_gain, max_eirp), (dfs_cac_time)
[  118.555064] cfg80211:   (2402000 KHz - 2472000 KHz @ 4 KHz), (N/A, 2000 
mBm), (N/A)
[  118.555122] cfg80211:   (2457000 KHz - 2482000 KHz @ 4 KHz), (N/A, 2000 
mBm), (N/A)
[  118.555178] cfg80211:   (2474000 KHz - 2494000 KHz @ 2 KHz), (N/A, 2000 
mBm), (N/A)
[  118.555235] cfg80211:   (517 KHz - 525 KHz @ 8 KHz), (N/A, 2000 
mBm), (N/A)
[  118.555294] cfg80211:   (5735000 KHz - 5835000 KHz @ 8 KHz), (N/A, 2000 
mBm), (N/A)
[  118.555629] cfg80211:   (5724 KHz - 6372 KHz @ 216 KHz), (N/A, 0 
mBm), (N/A)
[  118.881727] ath10k_pci :02:00.0: irq 49 for MSI/MSI-X
[  118.882042] ath10k_pci :02:00.0: pci irq msi interrupts 1 irq_mode 0 
reset_mode 0
[  120.335878] ath10k_pci :02:00.0: qca988x hw2.0 (0x4100016c, 0x043202ff) 
fw 10.2-00082-4-2 api 3 htt 2.1
[  120.335946] ath10k_pci :02:00.0: debug 1 debugfs 1 tracing 1 dfs 1
[  120.856049] ath: EEPROM regdomain: 0x0
[  120.856115] ath: EEPROM indicates default country code should be used
[  120.856163] ath: doing EEPROM country-regdmn map search
[  120.856212] ath: country maps to regdmn code: 0x3a
[  120.856260] ath: Country alpha2 being used: US
[  120.856307] ath: Regpair used: 0x3a
[  120.872948] cfg80211: Calling CRDA for country: US
[  120.901453] cfg80211: Regulatory domain changed to country: US
[  120.901531] cfg80211:  DFS Master region: FCC
[  120.901594] cfg80211:   (start_freq - end_freq @ bandwidth), 
(max_antenna_gain, max_eirp), (dfs_cac_time)
[  120.901708] cfg80211:   (2402000 KHz - 2472000 KHz @ 4 KHz), (N/A, 3000 
mBm), (N/A)
[  120.901765] cfg80211:   (517 KHz - 525 KHz @ 8 KHz), (N/A, 1700 
mBm), (N/A)
[  120.901824] cfg80211:   (525 KHz - 533 KHz @ 8 KHz), (N/A, 2300 
mBm), (0 s)
[  120.901943] cfg80211:   (5735000 KHz - 5835000 KHz @ 8 KHz), (N/A, 3000 
mBm), (N/A)
[  120.902006] cfg80211:   (5724 KHz - 6372 KHz @ 216 KHz), (N/A, 
4000 mBm), (N/A)
[  121.136426] cfg80211: Calling CRDA to update world regulatory domain
[  121.173459] cfg80211: World regulatory domain updated:
[  121.173528] cfg80211:  DFS Master region: unset
[  121.173582] cfg80211:   (start_freq - end_freq @ bandwidth), 
(max_antenna_gain, max_eirp), (dfs_cac_time)
[  121.173696] cfg80211:   (2402000 KHz - 2472000 KHz @ 4 KHz), (N/A, 2000 
mBm), (N/A)
[  121.173753] cfg80211:   (2457000 KHz - 2482000 KHz @ 4 KHz), (N/A, 2000 
mBm), (N/A)
[  121.173810] cfg80211:   (2474000 KHz - 2494000 KHz @ 2 KHz), (N/A, 2000 
mBm), (N/A)
[  121.173868] cfg80211:   (517 KHz - 525 KHz @ 8 KHz), (N/A, 2000 
mBm), (N/A)
[  121.173971] cfg80211:   (5735000 KHz - 5835000 KHz @ 8 KHz), (N/A, 2000 
mBm), (N/A)
[  121.174028] cfg80211:   (5724 KHz - 6372 KHz @ 216 KHz), (N/A, 0 
mBm), (N/A)
[  121.512350] ath10k_pci :02:00.0: irq 49 for MSI/MSI-X
[  121.512717] ath10k_pci :02:00.0: pci irq msi interrupts 1 irq_mode 0 
reset_mode 0
[  122.962504] ath10k_pci :02:00.0: qca988x hw2.0 (0x4100016c, 0x043202ff) 
fw 10.2-00082-4-2 api 3 htt 2.1
[  122.962569] ath10k_pci :02:00.0: debug 1 debugfs 1 tracing 1 dfs 1
[  123.481643] ath: EEPROM regdomain: 0x0
[  123.481703] ath: EEPROM indicates default country code should be used
[  123.481752] ath: doing EEPROM country-regdmn map search
[  123.481801] ath: country maps to regdmn code: 0x3a
[  123.481848] ath: Country alpha2 being used: US
[  123.481896] ath: Regpair used: 0x3a
[  123.500142] cfg80211: Calling CRDA for country: US
[  123.527647] cfg80211: Regulatory domain changed to country: US
[  123.527711] cfg80211:  DFS Master region: FCC
[  123.527761] cfg80211:   (start_freq - end_freq @ bandwidth), 
(max_antenna_gain, max_eirp), (dfs_cac_time)
[  123.527867] cfg80211:   (2402000 KHz - 2472000 KHz @ 4 KHz), (N/A, 3000 
mBm), (N/A)
[  123.527920] cfg80211:   (517 KHz - 525 KHz @ 8 KHz), (N/A, 1700 
mBm), (N/A)
[  123.527974] cfg80211:   (525 KHz - 533 KHz @ 8 KHz), (N/A, 2300 
mBm), (0 s)
[  123.528026] cfg80211:   (5735000 KHz - 5835000 KHz @ 8 KHz), (N/A, 3000 
mBm), (N/A)
[  123.528080] cfg80211:   (5724 KHz - 6372 KHz @ 216 KHz), (N/A, 
4000 mBm), (N/A)
[  123.547470] BUG: unable to handle kernel paging request at fe589030
[  123.547745] IP: 

Re: BUG when doing rmmod

2014-08-28 Thread Kalle Valo
Michal Kazior michal.kaz...@tieto.com writes:

 On 28 August 2014 08:58, Kalle Valo kv...@qca.qualcomm.com wrote:

 [  123.552499] Call Trace:
 [  123.554957]  [fe576c1b] ath10k_pci_tasklet+0x1b/0x60 [ath10k_pci]
 [  123.557436]  [c1053fbe] tasklet_action+0x9e/0xb0
 [  123.559874]  [c10534f1] __do_softirq+0xf1/0x3f0
 [  123.562277]  [c1053400] ? ftrace_raw_event_irq_handler_entry+0xa0/0xa0
 [  123.564720]  [c1004999] do_softirq_own_stack+0x29/0x40
 [  123.567096]  IRQ
 [...]
 [  123.643338]  [fe5740b3] ath10k_pci_release+0x33/0x40 [ath10k_pci]
 [  123.645289]  [fe575d4b] ath10k_pci_remove+0x7b/0x90 [ath10k_pci]
 [  123.647174]  [c132f5b8] pci_device_remove+0x28/0x50
 [  123.649056]  [c146cbee] __device_release_driver+0x4e/0xb0

 I should've expected spurious interrupts in pci_remove().. Does the
 following fix the problem?

 diff --git a/drivers/net/wireless/ath/ath10k/pci.c
 b/drivers/net/wireless/ath/ath10k/pci.c
 index 144eb8a3..a03d885 100644
 --- a/drivers/net/wireless/ath/ath10k/pci.c
 +++ b/drivers/net/wireless/ath/ath10k/pci.c
 @@ -2598,6 +2598,7 @@ static void ath10k_pci_remove(struct pci_dev *pdev)

 ath10k_core_unregister(ar);
 ath10k_pci_free_irq(ar);
 +   ath10k_pci_kill_tasklet(ar);
 ath10k_pci_deinit_irq(ar);
 ath10k_pci_ce_deinit(ar);
 ath10k_pci_free_ce(ar);

Yup, this seems to fix it. Earlier my script didn't survive even 30
seconds, not it has been running 10 minutes without problems. Can you
write a proper patch for this, please?

Was this a regression due to some recent patches? If yes, that would be
good to document as well. Helps with people who port our patches to
older kernels.

Actually I did see the BUG below in the logs. But I don't really have
time to debug that right now and I just assume it's not ath10k bug.
(Please correct me if I'm wrong.)

[ 3236.078802] BUG: MAX_STACK_TRACE_ENTRIES too low!
[ 3236.078891] turning off the locking correctness validator.
[ 3236.078942] Please attach the output of /proc/lock_stat to the bug report
[ 3236.078994] CPU: 1 PID: 14428 Comm: rmmod Not tainted 3.16.0-wl-ath+ #570
[ 3236.079068] Hardware name: Hewlett-Packard HP ProBook 6540b/1722, BIOS 68CDD 
Ver. F.04 01/27/2010
[ 3236.079119]    ed01dd8c c17fea78 c2150100 ed01dd98 c10a0b45 
ef23b2b8
[ 3236.079700]  ed01ddcc c10a312e ed01ddb4 00927500 fc611da4 fc60bc60 03c0ca42 
02f8
[ 3236.080258]  ef23ad40 0002  ef23b2b8 ef23b2b0 ed01de4c c10a3bef 
008ef5a8
[ 3236.080810] Call Trace:
[ 3236.080896]  [c17fea78] dump_stack+0x48/0x60
[ 3236.081003]  [c10a0b45] save_trace+0x95/0xa0
[ 3236.081091]  [c10a312e] mark_lock+0x11e/0x640
[ 3236.081147]  [c10a3bef] __lock_acquire+0x59f/0x1b40
[ 3236.081205]  [c1087b95] ? local_clock+0x25/0x30
[ 3236.081262]  [c18085d7] ? _raw_spin_unlock_irqrestore+0x57/0x60
[ 3236.081315]  [c10a57c9] lock_acquire+0x79/0x1a0
[ 3236.081370]  [c106a250] ? queue_delayed_work_on+0x80/0x80
[ 3236.081495]  [c1319a59] ? __debug_object_init+0x89/0x330
[ 3236.081596]  [c106a28d] flush_work+0x3d/0x250
[ 3236.081701]  [c106a250] ? queue_delayed_work_on+0x80/0x80
[ 3236.081810]  [c105b694] ? timer_fixup_assert_init+0x64/0x70
[ 3236.081881]  [c131a24b] ? debug_object_assert_init+0xbb/0xe0
[ 3236.081937]  [c106bc7d] ? __cancel_work_timer+0x8d/0xf0
[ 3236.081992]  [c10a6114] ? trace_hardirqs_on_caller+0xf4/0x1c0
[ 3236.082046]  [c106bc59] __cancel_work_timer+0x69/0xf0
[ 3236.082101]  [c106bcf2] cancel_delayed_work_sync+0x12/0x20
[ 3236.082221]  [fc5c12de] regulatory_exit+0x1e/0xf0 [cfg80211]
[ 3236.082339]  [fc60130e] cfg80211_exit+0x26/0x40 [cfg80211]
[ 3236.082448]  [c10d652c] SyS_delete_module+0xfc/0x170
[ 3236.082557]  [c116a8a6] ? vm_munmap+0x46/0x60
[ 3236.082629]  [c1808cc7] ? sysenter_exit+0xf/0x16
[ 3236.082685]  [c10a6114] ? trace_hardirqs_on_caller+0xf4/0x1c0
[ 3236.082740]  [c116a8a6] ? vm_munmap+0x46/0x60
[ 3236.082795]  [c1808c98] sysenter_do_call+0x12/0x12

-- 
Kalle Valo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k