On 02/28/2018 09:31 AM, Michał Kazior wrote:
On 28 February 2018 at 02:22, <[email protected]> wrote: [...]@@ -2086,8 +2087,28 @@ static void ath10k_pci_hif_stop(struct ath10k *ar) ath10k_pci_irq_disable(ar); ath10k_pci_irq_sync(ar); ath10k_pci_flush(ar); - napi_synchronize(&ar->napi); - napi_disable(&ar->napi); + + /* Calling napi_disable twice in a row (w/out starting it and/or without + * having NAPI active leads to deadlock because napi_disable sets + * NAPI_STATE_SCHED and NAPI_STATE_NPSVC when it returns, as far as I + * can tell. So, guard this call to napi_disable. I believe the + * failure case is something like this: + * rmmod ath10k_pci ath10k_core + * Firmware crashes before hif_stop is called by the rmmod path + * The crash handling logic calls hif_stop + * Then rmmod gets around to calling hif_stop, but spins endlessly + * in napi_synchronize. + * + * I think one way this could happen is that ath10k_stop checks + * for state != ATH10K_STATE_OFF, but STATE_RESTARTING is also + * a possibility. That might be how we can have hif_stop called twice + * without a hif_start in between. --Ben + */ + if (ar->napi_enabled) { + napi_synchronize(&ar->napi); + napi_disable(&ar->napi); + ar->napi_enabled = false; + }Looking at the code and your comment the described fail case seems legit. I would consider tuning ath10k_stop() so that it calls ath10k_halt() only if ar->state != OFF & ar->state != RESTARTING though. Or did you try already?
I did not try tuning ath10k_stop(). The code in this area is quite complex, and in my opinion trying to keep the start/stop calls exactly matched without individual 'has_started' flags seems ripe for bugs.
While your approach will probably work it won't prevent other non-NAPI bad things from happening. Even if there's nothing today it might creep up in the future. And you'd need to update ahb.c too.
I'll update ahb.c to match. Thanks, Ben
Michał
-- Ben Greear <[email protected]> Candela Technologies Inc http://www.candelatech.com
