Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

2012-12-03 Thread Felix Fietkau
On 2012-11-20 11:05 AM, Felix Liao wrote:
 Hi Felix F, I'm Felix Liao from WatchGuard, I think we should 
 spin_lock_irqsave and spin_unlock_irqrestore on sc_pcu_lock 
 in the ath9k_tasklet() instead of spin_lock/spin_unlock, just to avoid the 
 tasklet being interrupted by the hardware IRQ,
 do you think so?
Using an IRQ spinlock in the tasklet is not the correct solution. If the
IRQ fires while the tasklet is running, it means that the driver does
not properly disable IRQs in the handler. They're supposed to get
disabled there and re-enabled only after the tasklet has completed.

- Felix

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

2012-12-03 Thread Chaoxing Lin

On 2012-11-20 11:05 AM, Felix Liao wrote:
 Hi Felix F, I'm Felix Liao from WatchGuard, I think we should 
 spin_lock_irqsave and spin_unlock_irqrestore on sc_pcu_lock in the 
 ath9k_tasklet() instead of spin_lock/spin_unlock, just to avoid the tasklet 
 being interrupted by the hardware IRQ, do you think so?
Using an IRQ spinlock in the tasklet is not the correct solution. If the IRQ 
fires while the tasklet is running, it means that the driver does not properly 
disable IRQs in the handler. They're supposed to get disabled there and 
re-enabled only after the tasklet has completed.


My 2 cents. 
I disagree this statement They're supposed to get disabled there and 
re-enabled only after the tasklet has completed.
In general, IRQs should be able to interrupt tasklet. Otherwise, it defeats the 
purpose of using tasklet. 

Only when both tasklet and IRQs can access common resources(critical region), 
IRQ needs to be disabled before tasklet access this resource and re-enable 
after that.

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

2012-11-20 Thread Felix Liao
Hi Felix F, I'm Felix Liao from WatchGuard, I think we should spin_lock_irqsave 
and spin_unlock_irqrestore on sc_pcu_lock 
in the ath9k_tasklet() instead of spin_lock/spin_unlock, just to avoid the 
tasklet being interrupted by the hardware IRQ,
do you think so?

Thanks,
- Felix Liao

-Original Message-
From: adrian.ch...@gmail.com [mailto:adrian.ch...@gmail.com] On Behalf Of 
Adrian Chadd
Sent: Sunday, November 18, 2012 10:11 PM
To: Felix Liao; Felix Fietkau
Cc: ath9k-devel@lists.ath9k.org
Subject: Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

On 12 November 2012 01:54, Felix Liao felix.l...@watchguard.com wrote:
 Thanks for you information.
 It is running in AP mode, and uses compat-wireless-3.5-3, but the irq 16: 
 nobody cared issue still happens when I update to compat-wireless-3.6.6-1.
 I found a mail list talk about this, 
 https://patchwork.kernel.org/patch/1536701/, I see this patch not in 3.6.6-1, 
 and just set the mask to ATH9K_INT_FATAL, then the ath_isr() will just reset 
 the hardware, it is just same with what I think in my mind, and I will take a 
 try for it.

 Another question I said, do you think we should spin_lock_irqsave and 
 spin_unlock_irqrestore in the ath9k_tasklet instead of spin_lock/spin_unlock?

That's a question for someone with a more intimate knowledge of the current 
workings of ath9k.

Felix (F) - would you mind taking a look at this thread? He's having issues 
with fatal sync interrupts not causing the NIC to be reset?


Adrian
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

2012-11-18 Thread Adrian Chadd
On 12 November 2012 01:54, Felix Liao felix.l...@watchguard.com wrote:
 Thanks for you information.
 It is running in AP mode, and uses compat-wireless-3.5-3, but the irq 16: 
 nobody cared issue still happens when I update to compat-wireless-3.6.6-1.
 I found a mail list talk about this, 
 https://patchwork.kernel.org/patch/1536701/, I see this patch not in 3.6.6-1, 
 and just set the mask to ATH9K_INT_FATAL,
 then the ath_isr() will just reset the hardware, it is just same with what I 
 think in my mind, and I will take a try for it.

 Another question I said, do you think we should spin_lock_irqsave and 
 spin_unlock_irqrestore in the ath9k_tasklet instead of spin_lock/spin_unlock?

That's a question for someone with a more intimate knowledge of the
current workings of ath9k.

Felix (F) - would you mind taking a look at this thread? He's having
issues with fatal sync interrupts not causing the NIC to be reset?


Adrian
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

2012-11-12 Thread Felix Liao
Thanks for you information.
It is running in AP mode, and uses compat-wireless-3.5-3, but the irq 16: 
nobody cared issue still happens when I update to compat-wireless-3.6.6-1.
I found a mail list talk about this, 
https://patchwork.kernel.org/patch/1536701/, I see this patch not in 3.6.6-1, 
and just set the mask to ATH9K_INT_FATAL, 
then the ath_isr() will just reset the hardware, it is just same with what I 
think in my mind, and I will take a try for it.

Another question I said, do you think we should spin_lock_irqsave and 
spin_unlock_irqrestore in the ath9k_tasklet instead of spin_lock/spin_unlock?

Thanks,
- Felix

-Original Message-
From: adrian.ch...@gmail.com [mailto:adrian.ch...@gmail.com] On Behalf Of 
Adrian Chadd
Sent: Monday, November 12, 2012 5:44 PM
To: Felix Liao
Cc: ath9k-devel@lists.ath9k.org
Subject: Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

Well, any sync error like that where the PCIe glue throws an error like that is 
a bad thing.

It could be from a variety of places. It could be a busted PCIe interconnect 
somehow. I've seen this happen before. The only real way to diagnose that is to 
grab a PCIe bus analyser.

It however could also be from the NIC going in and out of network sleep. Is 
this in STA or AP mode?


Adrian

On 12 November 2012 01:06, Felix Liao felix.l...@watchguard.com wrote:
 Hi ath9k team,
 I trace the ath9k_hw_getisr() today to find out why I get the zero ISR 
 mask, and I found that when I get the zero ISR mask, this function 
 return the boolean true, then I trace into the function 
 ar9003_hw_get_isr(),

 static bool ar9003_hw_get_isr(struct ath_hw *ah, enum ath9k_int 
 *masked) {
 
 async_cause = REG_READ(ah, AR_INTR_ASYNC_CAUSE);

 if (async_cause  (AR_INTR_MAC_IRQ | AR_INTR_ASYNC_MASK_MCI)) {
 if ((REG_READ(ah, AR_RTC_STATUS)  AR_RTC_STATUS_M)
 == AR_RTC_STATUS_ON)
 isr = REG_READ(ah, AR_ISR);
 }


 sync_cause = REG_READ(ah, AR_INTR_SYNC_CAUSE)  
 AR_INTR_SYNC_DEFAULT;

 *masked = 0;

 if (!isr  !sync_cause  !async_cause)
 return false;

 if (isr == 0) {
 //debug value here is isr=0, async_cause=0, sync_cause=0x20 
 -- AR_INTR_SYNC_HOST1_FATAL
 }
 ...
 }

 I have no idea why I always get the isr=0 and sync_cause=0x20 which means 
 AR_INTR_SYNC_HOST1_FATAL, but we can't deal with this value in the source 
 code, do you know any about this?

 Thanks,
 - Felix

 -Original Message-
 From: Felix Liao
 Sent: Monday, November 12, 2012 10:10 AM
 To: 'ath9k-devel@lists.ath9k.org'
 Subject: RE: ath9k_tasklet() can be interrupted by a hardware IRQ

 Hi ath9k team:
 Another interest question is since the ath9k interrupt was disabled, how can 
 we get a new IRQ to interrupt the tasklet?
 So I investigated and found that the cause of the issue irq 16: nobody 
 cared in my box is the ath9k's irq handler return too many IRQ_NONE since I 
 read the ISR register always return 0 as below.

 /*
  * Figure out the reason(s) for the interrupt.  Note
  * that the hal returns a pseudo-ISR that may include
  * bits we haven't explicitly enabled so we mask the
  * value to insure we only process bits we requested.
  */
 ath9k_hw_getisr(ah, status);   /* NB: clears ISR too */
 status = ah-imask;/* discard unasked-for bits */

 /*
  * If there are no status bits set, then this interrupt was not
  * for me (should have been caught above).
  */
 if (!status)
 return IRQ_NONE;

 if I go to reset chip instead of return IRQ_NONE in this position, the things 
 become a little healthier than before, I will never see the message irq 16: 
 nobody cared.

 I check each IRQ using if(AR_SREV_9300(ah)) and they are all passed, that 
 is, no other hardware share the irq with ath9k.
 I need to know why we read the ISR register and get value 0, and return 
 IRQ_NONE, the chip I used is AR9300.
 Can we using this change below in the ath_isr() function:
 if (!status)
 goto reset_chip;

 Thanks,
 - Felix
 -
 From: Felix Liao
 Sent: Friday, November 09, 2012 9:33 AM
 To: 'ath9k-devel@lists.ath9k.org'
 Subject: ath9k_tasklet() can be interrupted by a hardware IRQ

 Hi all,
 I met a kernel call trace below using the ath9k driver, we can see that the 
 ath9k_tasklet() was interrupted by a hardware IRQ, and at this time the ath9k 
 IRQ was disabled, so the kernel will complain that there is none irq handler 
 for this interrupt.

 [   60.977474] irq 16: nobody cared (try booting with the irqpoll option) [ 
   60.984194] Call Trace:
 [   60.986660] [dfff1f50] [c000843c] show_stack+0x70/0x1c8 (unreliable) [   
 60.993041] [dfff1f90] [c00777a0] __report_bad_irq+0x44/0xe8 [   60.998718] 
 [dfff1fb0] [c0077a44] note_interrupt+0x200

Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

2012-11-11 Thread Felix Liao
Hi ath9k team:
Another interest question is since the ath9k interrupt was disabled, how can we 
get a new IRQ to interrupt the tasklet? 
So I investigated and found that the cause of the issue irq 16: nobody cared 
in my box is the ath9k's irq handler return 
too many IRQ_NONE since I read the ISR register always return 0 as below.

/*
 * Figure out the reason(s) for the interrupt.  Note
 * that the hal returns a pseudo-ISR that may include
 * bits we haven't explicitly enabled so we mask the
 * value to insure we only process bits we requested.
 */
ath9k_hw_getisr(ah, status);   /* NB: clears ISR too */
status = ah-imask;/* discard unasked-for bits */

/*
 * If there are no status bits set, then this interrupt was not
 * for me (should have been caught above).
 */
if (!status) 
return IRQ_NONE;

if I go to reset chip instead of return IRQ_NONE in this position, the things 
become a little healthier than before, 
I will never see the message irq 16: nobody cared.

I check each IRQ using if(AR_SREV_9300(ah)) and they are all passed, that is, 
no other hardware share the irq with ath9k.
I need to know why we read the ISR register and get value 0, and return 
IRQ_NONE, the chip I used is AR9300.
Can we using this change below in the ath_isr() function:
if (!status) 
goto reset_chip;

Thanks,
- Felix
-
From: Felix Liao 
Sent: Friday, November 09, 2012 9:33 AM
To: 'ath9k-devel@lists.ath9k.org'
Subject: ath9k_tasklet() can be interrupted by a hardware IRQ

Hi all,
I met a kernel call trace below using the ath9k driver, we can see that the 
ath9k_tasklet() was interrupted by a hardware IRQ, 
and at this time the ath9k IRQ was disabled, so the kernel will complain that 
there is none irq handler for this interrupt.

[   60.977474] irq 16: nobody cared (try booting with the irqpoll option)
[   60.984194] Call Trace:
[   60.986660] [dfff1f50] [c000843c] show_stack+0x70/0x1c8 (unreliable)
[   60.993041] [dfff1f90] [c00777a0] __report_bad_irq+0x44/0xe8
[   60.998718] [dfff1fb0] [c0077a44] note_interrupt+0x200/0x260
[   61.004396] [dfff1fe0] [c0078704] handle_fasteoi_irq+0xc4/0x10c
[   61.010340] [dfff1ff0] [c0010fdc] call_handle_irq+0x18/0x28
[   61.015929] [dfff3e50] [c00051c8] do_IRQ+0xcc/0x1b4
[   61.020825] [dfff3e80] [c0011fc4] ret_from_except+0x0/0x18
[   61.026349] --- Exception: 501 at ioread32+0x8/0x14
[   61.026355] LR = ath_descdma_setup+0x294/0x664 [ath9k]
[   61.036733] [dfff3f40] [c04f5bdc] softirq_to_name+0x0/0x28 (unreliable)
[   61.043390] [dfff3f50] [e5ad66fc] ath9k_hw_enable_interrupts+0x160/0x1b0 
[ath9k_hw]
[   61.051074] [dfff3f70] [e5b41368] ath9k_tasklet+0xc0/0x208 [ath9k]
[   61.057274] [dfff3f90] [c0043298] tasklet_action+0xcc/0xf8
[   61.062777] [dfff3fb0] [c0043d48] __do_softirq+0xdc/0x18c
[   61.068194] [dfff3ff0] [c0010fb4] call_do_softirq+0x14/0x24
[   61.073784] [d8613e60] [c0004fec] do_softirq+0x8c/0x98
[   61.078939] [d8613e80] [c0043bd0] local_bh_enable+0x9c/0xa0
[   61.084536] [d8613e90] [c036d910] unix_create1+0x184/0x1ac
[   61.090039] [d8613eb0] [c036d9a0] unix_create+0x68/0xbc
[   61.095286] [d8613ec0] [c02c6c34] __sock_create+0x114/0x248
[   61.100877] [d8613ef0] [c02c7008] sys_socket+0x54/0x84
[   61.106031] [d8613f10] [c02c70e8] sys_socketcall+0xb0/0x258
[   61.111622] [d8613f40] [c0011970] ret_from_syscall+0x0/0x3c
[   61.117338] --- Exception: c01 at 0xeff41fc
[   61.117343] LR = 0xeff4350
[   61.124578] handlers:
[   61.126851] [e5b3eb44] (ath_isr+0x0/0x21c [ath9k])
[   61.131840] Disabling IRQ #16
[   61.143234] ath: phy0: Failed to stop TX DMA, queues=0x001!

I made a patch below for this issue and use it to reduce the probability of 
this issue, you can see that, 
I can't disable the IRQ from start to end of this function, I need to restore 
the IRQ flags before we call ath_rx_tasklet and ath_tx_tasklet, 
since these functions will disable and enable the local softirq using 
spin_lock_bh and spin_unlock_bh,  since with this gap of time, 
the tasklet still can be interrupted by the hardware IRQ.

diff -pNaur compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c 
compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c
--- compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c    
2012-11-08 00:17:08.0 +0800
+++ 
compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c  
 2012-11-09 07:27:28.459837051 +0800
@@ -368,6 +368,7 @@ void ath9k_tasklet(unsigned long data)
  u32 status = sc-intrstatus;
  u32 rxmask;

+ local_irq_save(flags);
  ath9k_ps_wakeup(sc);
  spin_lock(sc-sc_pcu_lock);

@@ -383,7 +384,7 @@ void ath9k_tasklet(unsigned long data)
 goto out;
  }

-  spin_lock_irqsave(sc-sc_pm_lock, flags);
+