Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
On 2012-11-20 11:05 AM, Felix Liao wrote: Hi Felix F, I'm Felix Liao from WatchGuard, I think we should spin_lock_irqsave and spin_unlock_irqrestore on sc_pcu_lock in the ath9k_tasklet() instead of spin_lock/spin_unlock, just to avoid the tasklet being interrupted by the hardware IRQ, do you think so? Using an IRQ spinlock in the tasklet is not the correct solution. If the IRQ fires while the tasklet is running, it means that the driver does not properly disable IRQs in the handler. They're supposed to get disabled there and re-enabled only after the tasklet has completed. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
On 2012-11-20 11:05 AM, Felix Liao wrote: Hi Felix F, I'm Felix Liao from WatchGuard, I think we should spin_lock_irqsave and spin_unlock_irqrestore on sc_pcu_lock in the ath9k_tasklet() instead of spin_lock/spin_unlock, just to avoid the tasklet being interrupted by the hardware IRQ, do you think so? Using an IRQ spinlock in the tasklet is not the correct solution. If the IRQ fires while the tasklet is running, it means that the driver does not properly disable IRQs in the handler. They're supposed to get disabled there and re-enabled only after the tasklet has completed. My 2 cents. I disagree this statement They're supposed to get disabled there and re-enabled only after the tasklet has completed. In general, IRQs should be able to interrupt tasklet. Otherwise, it defeats the purpose of using tasklet. Only when both tasklet and IRQs can access common resources(critical region), IRQ needs to be disabled before tasklet access this resource and re-enable after that. ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
Hi Felix F, I'm Felix Liao from WatchGuard, I think we should spin_lock_irqsave and spin_unlock_irqrestore on sc_pcu_lock in the ath9k_tasklet() instead of spin_lock/spin_unlock, just to avoid the tasklet being interrupted by the hardware IRQ, do you think so? Thanks, - Felix Liao -Original Message- From: adrian.ch...@gmail.com [mailto:adrian.ch...@gmail.com] On Behalf Of Adrian Chadd Sent: Sunday, November 18, 2012 10:11 PM To: Felix Liao; Felix Fietkau Cc: ath9k-devel@lists.ath9k.org Subject: Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ On 12 November 2012 01:54, Felix Liao felix.l...@watchguard.com wrote: Thanks for you information. It is running in AP mode, and uses compat-wireless-3.5-3, but the irq 16: nobody cared issue still happens when I update to compat-wireless-3.6.6-1. I found a mail list talk about this, https://patchwork.kernel.org/patch/1536701/, I see this patch not in 3.6.6-1, and just set the mask to ATH9K_INT_FATAL, then the ath_isr() will just reset the hardware, it is just same with what I think in my mind, and I will take a try for it. Another question I said, do you think we should spin_lock_irqsave and spin_unlock_irqrestore in the ath9k_tasklet instead of spin_lock/spin_unlock? That's a question for someone with a more intimate knowledge of the current workings of ath9k. Felix (F) - would you mind taking a look at this thread? He's having issues with fatal sync interrupts not causing the NIC to be reset? Adrian ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
On 12 November 2012 01:54, Felix Liao felix.l...@watchguard.com wrote: Thanks for you information. It is running in AP mode, and uses compat-wireless-3.5-3, but the irq 16: nobody cared issue still happens when I update to compat-wireless-3.6.6-1. I found a mail list talk about this, https://patchwork.kernel.org/patch/1536701/, I see this patch not in 3.6.6-1, and just set the mask to ATH9K_INT_FATAL, then the ath_isr() will just reset the hardware, it is just same with what I think in my mind, and I will take a try for it. Another question I said, do you think we should spin_lock_irqsave and spin_unlock_irqrestore in the ath9k_tasklet instead of spin_lock/spin_unlock? That's a question for someone with a more intimate knowledge of the current workings of ath9k. Felix (F) - would you mind taking a look at this thread? He's having issues with fatal sync interrupts not causing the NIC to be reset? Adrian ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
Thanks for you information. It is running in AP mode, and uses compat-wireless-3.5-3, but the irq 16: nobody cared issue still happens when I update to compat-wireless-3.6.6-1. I found a mail list talk about this, https://patchwork.kernel.org/patch/1536701/, I see this patch not in 3.6.6-1, and just set the mask to ATH9K_INT_FATAL, then the ath_isr() will just reset the hardware, it is just same with what I think in my mind, and I will take a try for it. Another question I said, do you think we should spin_lock_irqsave and spin_unlock_irqrestore in the ath9k_tasklet instead of spin_lock/spin_unlock? Thanks, - Felix -Original Message- From: adrian.ch...@gmail.com [mailto:adrian.ch...@gmail.com] On Behalf Of Adrian Chadd Sent: Monday, November 12, 2012 5:44 PM To: Felix Liao Cc: ath9k-devel@lists.ath9k.org Subject: Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ Well, any sync error like that where the PCIe glue throws an error like that is a bad thing. It could be from a variety of places. It could be a busted PCIe interconnect somehow. I've seen this happen before. The only real way to diagnose that is to grab a PCIe bus analyser. It however could also be from the NIC going in and out of network sleep. Is this in STA or AP mode? Adrian On 12 November 2012 01:06, Felix Liao felix.l...@watchguard.com wrote: Hi ath9k team, I trace the ath9k_hw_getisr() today to find out why I get the zero ISR mask, and I found that when I get the zero ISR mask, this function return the boolean true, then I trace into the function ar9003_hw_get_isr(), static bool ar9003_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked) { async_cause = REG_READ(ah, AR_INTR_ASYNC_CAUSE); if (async_cause (AR_INTR_MAC_IRQ | AR_INTR_ASYNC_MASK_MCI)) { if ((REG_READ(ah, AR_RTC_STATUS) AR_RTC_STATUS_M) == AR_RTC_STATUS_ON) isr = REG_READ(ah, AR_ISR); } sync_cause = REG_READ(ah, AR_INTR_SYNC_CAUSE) AR_INTR_SYNC_DEFAULT; *masked = 0; if (!isr !sync_cause !async_cause) return false; if (isr == 0) { //debug value here is isr=0, async_cause=0, sync_cause=0x20 -- AR_INTR_SYNC_HOST1_FATAL } ... } I have no idea why I always get the isr=0 and sync_cause=0x20 which means AR_INTR_SYNC_HOST1_FATAL, but we can't deal with this value in the source code, do you know any about this? Thanks, - Felix -Original Message- From: Felix Liao Sent: Monday, November 12, 2012 10:10 AM To: 'ath9k-devel@lists.ath9k.org' Subject: RE: ath9k_tasklet() can be interrupted by a hardware IRQ Hi ath9k team: Another interest question is since the ath9k interrupt was disabled, how can we get a new IRQ to interrupt the tasklet? So I investigated and found that the cause of the issue irq 16: nobody cared in my box is the ath9k's irq handler return too many IRQ_NONE since I read the ISR register always return 0 as below. /* * Figure out the reason(s) for the interrupt. Note * that the hal returns a pseudo-ISR that may include * bits we haven't explicitly enabled so we mask the * value to insure we only process bits we requested. */ ath9k_hw_getisr(ah, status); /* NB: clears ISR too */ status = ah-imask;/* discard unasked-for bits */ /* * If there are no status bits set, then this interrupt was not * for me (should have been caught above). */ if (!status) return IRQ_NONE; if I go to reset chip instead of return IRQ_NONE in this position, the things become a little healthier than before, I will never see the message irq 16: nobody cared. I check each IRQ using if(AR_SREV_9300(ah)) and they are all passed, that is, no other hardware share the irq with ath9k. I need to know why we read the ISR register and get value 0, and return IRQ_NONE, the chip I used is AR9300. Can we using this change below in the ath_isr() function: if (!status) goto reset_chip; Thanks, - Felix - From: Felix Liao Sent: Friday, November 09, 2012 9:33 AM To: 'ath9k-devel@lists.ath9k.org' Subject: ath9k_tasklet() can be interrupted by a hardware IRQ Hi all, I met a kernel call trace below using the ath9k driver, we can see that the ath9k_tasklet() was interrupted by a hardware IRQ, and at this time the ath9k IRQ was disabled, so the kernel will complain that there is none irq handler for this interrupt. [ 60.977474] irq 16: nobody cared (try booting with the irqpoll option) [ 60.984194] Call Trace: [ 60.986660] [dfff1f50] [c000843c] show_stack+0x70/0x1c8 (unreliable) [ 60.993041] [dfff1f90] [c00777a0] __report_bad_irq+0x44/0xe8 [ 60.998718] [dfff1fb0] [c0077a44] note_interrupt+0x200
Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
Hi ath9k team: Another interest question is since the ath9k interrupt was disabled, how can we get a new IRQ to interrupt the tasklet? So I investigated and found that the cause of the issue irq 16: nobody cared in my box is the ath9k's irq handler return too many IRQ_NONE since I read the ISR register always return 0 as below. /* * Figure out the reason(s) for the interrupt. Note * that the hal returns a pseudo-ISR that may include * bits we haven't explicitly enabled so we mask the * value to insure we only process bits we requested. */ ath9k_hw_getisr(ah, status); /* NB: clears ISR too */ status = ah-imask;/* discard unasked-for bits */ /* * If there are no status bits set, then this interrupt was not * for me (should have been caught above). */ if (!status) return IRQ_NONE; if I go to reset chip instead of return IRQ_NONE in this position, the things become a little healthier than before, I will never see the message irq 16: nobody cared. I check each IRQ using if(AR_SREV_9300(ah)) and they are all passed, that is, no other hardware share the irq with ath9k. I need to know why we read the ISR register and get value 0, and return IRQ_NONE, the chip I used is AR9300. Can we using this change below in the ath_isr() function: if (!status) goto reset_chip; Thanks, - Felix - From: Felix Liao Sent: Friday, November 09, 2012 9:33 AM To: 'ath9k-devel@lists.ath9k.org' Subject: ath9k_tasklet() can be interrupted by a hardware IRQ Hi all, I met a kernel call trace below using the ath9k driver, we can see that the ath9k_tasklet() was interrupted by a hardware IRQ, and at this time the ath9k IRQ was disabled, so the kernel will complain that there is none irq handler for this interrupt. [ 60.977474] irq 16: nobody cared (try booting with the irqpoll option) [ 60.984194] Call Trace: [ 60.986660] [dfff1f50] [c000843c] show_stack+0x70/0x1c8 (unreliable) [ 60.993041] [dfff1f90] [c00777a0] __report_bad_irq+0x44/0xe8 [ 60.998718] [dfff1fb0] [c0077a44] note_interrupt+0x200/0x260 [ 61.004396] [dfff1fe0] [c0078704] handle_fasteoi_irq+0xc4/0x10c [ 61.010340] [dfff1ff0] [c0010fdc] call_handle_irq+0x18/0x28 [ 61.015929] [dfff3e50] [c00051c8] do_IRQ+0xcc/0x1b4 [ 61.020825] [dfff3e80] [c0011fc4] ret_from_except+0x0/0x18 [ 61.026349] --- Exception: 501 at ioread32+0x8/0x14 [ 61.026355] LR = ath_descdma_setup+0x294/0x664 [ath9k] [ 61.036733] [dfff3f40] [c04f5bdc] softirq_to_name+0x0/0x28 (unreliable) [ 61.043390] [dfff3f50] [e5ad66fc] ath9k_hw_enable_interrupts+0x160/0x1b0 [ath9k_hw] [ 61.051074] [dfff3f70] [e5b41368] ath9k_tasklet+0xc0/0x208 [ath9k] [ 61.057274] [dfff3f90] [c0043298] tasklet_action+0xcc/0xf8 [ 61.062777] [dfff3fb0] [c0043d48] __do_softirq+0xdc/0x18c [ 61.068194] [dfff3ff0] [c0010fb4] call_do_softirq+0x14/0x24 [ 61.073784] [d8613e60] [c0004fec] do_softirq+0x8c/0x98 [ 61.078939] [d8613e80] [c0043bd0] local_bh_enable+0x9c/0xa0 [ 61.084536] [d8613e90] [c036d910] unix_create1+0x184/0x1ac [ 61.090039] [d8613eb0] [c036d9a0] unix_create+0x68/0xbc [ 61.095286] [d8613ec0] [c02c6c34] __sock_create+0x114/0x248 [ 61.100877] [d8613ef0] [c02c7008] sys_socket+0x54/0x84 [ 61.106031] [d8613f10] [c02c70e8] sys_socketcall+0xb0/0x258 [ 61.111622] [d8613f40] [c0011970] ret_from_syscall+0x0/0x3c [ 61.117338] --- Exception: c01 at 0xeff41fc [ 61.117343] LR = 0xeff4350 [ 61.124578] handlers: [ 61.126851] [e5b3eb44] (ath_isr+0x0/0x21c [ath9k]) [ 61.131840] Disabling IRQ #16 [ 61.143234] ath: phy0: Failed to stop TX DMA, queues=0x001! I made a patch below for this issue and use it to reduce the probability of this issue, you can see that, I can't disable the IRQ from start to end of this function, I need to restore the IRQ flags before we call ath_rx_tasklet and ath_tx_tasklet, since these functions will disable and enable the local softirq using spin_lock_bh and spin_unlock_bh, since with this gap of time, the tasklet still can be interrupted by the hardware IRQ. diff -pNaur compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c --- compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c 2012-11-08 00:17:08.0 +0800 +++ compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c 2012-11-09 07:27:28.459837051 +0800 @@ -368,6 +368,7 @@ void ath9k_tasklet(unsigned long data) u32 status = sc-intrstatus; u32 rxmask; + local_irq_save(flags); ath9k_ps_wakeup(sc); spin_lock(sc-sc_pcu_lock); @@ -383,7 +384,7 @@ void ath9k_tasklet(unsigned long data) goto out; } - spin_lock_irqsave(sc-sc_pm_lock, flags); +