[PATCH v3] ath10k: implement NAPI support
Add NAPI support for rx and tx completion. NAPI poll is scheduled from interrupt handler. The design is as below - on interrupt - schedule napi and mask interrupts - on poll - process all pipes (no actual Tx/Rx) - process Rx within budget - if quota exceeds budget reschedule napi poll by returning budget - process Tx completions and update budget if necessary - process Tx fetch indications (pull-push) - push any other pending Tx (if possible) - before resched or napi completion replenish htt rx ring buffer - if work done < budget, complete napi poll and unmask interrupts This change also get rid of two tasklets (intr_tq and txrx_compl_task). Measured peak throughput with NAPI on IPQ4019 platform in controlled environment. No noticeable reduction in throughput is seen and also observed improvements in CPU usage. Approx. 15% CPU usage got reduced in UDP uplink case. DL: AP DUT Tx UL: AP DUT Rx IPQ4019 (avg. cpu usage %) TOT +NAPI === = TCP DL 644 Mbps (42%)645 Mbps (36%) TCP UL 673 Mbps (30%)675 Mbps (26%) UDP DL 682 Mbps (49%)680 Mbps (49%) UDP UL 720 Mbps (28%)717 Mbps (11%) Signed-off-by: Rajkumar Manoharan --- v2: rebased change v3: napi_synchronize and napi_disable should be called on deinit. Otherwise it is causing invalid memory access. Thanks to Kalle for reporting this issue. drivers/net/wireless/ath/ath10k/ahb.c| 10 +- drivers/net/wireless/ath/ath10k/core.c | 2 + drivers/net/wireless/ath/ath10k/core.h | 8 ++ drivers/net/wireless/ath/ath10k/htt.h| 2 +- drivers/net/wireless/ath/ath10k/htt_rx.c | 154 +++ drivers/net/wireless/ath/ath10k/htt_tx.c | 2 - drivers/net/wireless/ath/ath10k/pci.c| 71 -- drivers/net/wireless/ath/ath10k/pci.h| 6 +- 8 files changed, 159 insertions(+), 96 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/ahb.c b/drivers/net/wireless/ath/ath10k/ahb.c index acec16b9cf49..83ecfb6d1a36 100644 --- a/drivers/net/wireless/ath/ath10k/ahb.c +++ b/drivers/net/wireless/ath/ath10k/ahb.c @@ -462,13 +462,13 @@ static void ath10k_ahb_halt_chip(struct ath10k *ar) static irqreturn_t ath10k_ahb_interrupt_handler(int irq, void *arg) { struct ath10k *ar = arg; - struct ath10k_pci *ar_pci = ath10k_pci_priv(ar); if (!ath10k_pci_irq_pending(ar)) return IRQ_NONE; ath10k_pci_disable_and_clear_legacy_irq(ar); - tasklet_schedule(&ar_pci->intr_tq); + ath10k_pci_irq_msi_fw_mask(ar); + napi_schedule(&ar->napi); return IRQ_HANDLED; } @@ -717,6 +717,9 @@ static void ath10k_ahb_hif_stop(struct ath10k *ar) synchronize_irq(ar_ahb->irq); ath10k_pci_flush(ar); + + napi_synchronize(&ar->napi); + napi_disable(&ar->napi); } static int ath10k_ahb_hif_power_up(struct ath10k *ar) @@ -748,6 +751,7 @@ static int ath10k_ahb_hif_power_up(struct ath10k *ar) ath10k_err(ar, "could not wake up target CPU: %d\n", ret); goto err_ce_deinit; } + napi_enable(&ar->napi); return 0; @@ -831,7 +835,7 @@ static int ath10k_ahb_probe(struct platform_device *pdev) goto err_resource_deinit; } - ath10k_pci_init_irq_tasklets(ar); + ath10k_pci_init_napi(ar); ret = ath10k_ahb_request_irq_legacy(ar); if (ret) diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index e88982921aa3..0ca58cf0ffea 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -2249,6 +2249,8 @@ struct ath10k *ath10k_core_create(size_t priv_size, struct device *dev, INIT_WORK(&ar->register_work, ath10k_core_register_work); INIT_WORK(&ar->restart_work, ath10k_core_restart); + init_dummy_netdev(&ar->napi_dev); + ret = ath10k_debug_create(ar); if (ret) goto err_free_aux_wq; diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h index 30ae5bf81611..f36c2b274ee5 100644 --- a/drivers/net/wireless/ath/ath10k/core.h +++ b/drivers/net/wireless/ath/ath10k/core.h @@ -65,6 +65,10 @@ #define ATH10K_KEEPALIVE_MAX_IDLE 3895 #define ATH10K_KEEPALIVE_MAX_UNRESPONSIVE 3900 +/* NAPI poll budget */ +#define ATH10K_NAPI_BUDGET 64 +#define ATH10K_NAPI_QUOTA_LIMIT 60 + struct ath10k; enum ath10k_bus { @@ -936,6 +940,10 @@ struct ath10k { struct ath10k_thermal thermal; struct ath10k_wow wow; + /* NAPI */ + struct net_device napi_dev; + struct napi_struct napi; + /* must be last */ u8 drv_priv[0] __aligned(sizeof(void *)); }; diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h index 430a83e142aa..98c14247021b 100644 --- a/drivers/net/wireless/ath/at
Re: [PATCH v3] ath10k: implement NAPI support
I'm always rather big on people testing latency under load, and napi tends to add some.
Re: [PATCH v3] ath10k: implement NAPI support
On Fri, 2016-08-26 at 03:48 -0700, Dave Taht wrote: > I'm always rather big on people testing latency under load, and napi > tends to add some. That's a completely useless comment. Obviously, everybody uses NAPI; it's necessary for system load and thus performance, and lets drivers take advantage of TCP merging to reduce ACKs, which is tremendously helpful (over wifi in particular.) Please stop making such drive-by comments that focus only on the single thing you find important above all; not all people can care only about that single thing, and unconstructively reiterating it over and over doesn't help. Thanks, johannes
Re: [PATCH v3] ath10k: implement NAPI support
On Fri, Aug 26, 2016 at 4:12 AM, Johannes Berg wrote: > On Fri, 2016-08-26 at 03:48 -0700, Dave Taht wrote: >> I'm always rather big on people testing latency under load, and napi >> tends to add some. > > That's a completely useless comment. > > Obviously, everybody uses NAPI; it's necessary for system load and thus > performance, and lets drivers take advantage of TCP merging to reduce > ACKs, which is tremendously helpful (over wifi in particular.) > > Please stop making such drive-by comments that focus only on the single > thing you find important above all; not all people can care only about > that single thing, and unconstructively reiterating it over and over > doesn't help. Well, I apologize for being testy. It is I spent a lot of time testing michal's patchset for the ath10k back in may, and I *will* go and retest ath10k, when these patches land. My principal concern with using napi is at lower rates than the maxes typically reported in a patchset. But it would be nice if people always did test for latency under load when making improvements, before getting to me, and despite having helped make a very comprehensive test suite available (flent) that tests all sorts of things for wifi, getting people to actually use it to see real problems, (in addition to latency under load!) while their fingers are still hot in the codebase, and track/plot their results, remains an ongoing issue across the entire industry. http://blog.cerowrt.org/post/fq_codel_on_ath10k/ There are many other problems in wifi, of course, that could use engineering mental internalization, like airtime fairness, and the mis-behavior of the hardware queues, http://blog.cerowrt.org/post/cs5_lockout/ wifi channel scans http://blog.cerowrt.org/post/disabling_channel_scans/ and so on. I have a ton more datasets and blog entries left to write up from the ath9k work thus far which point to some other issues (minstrel, aggregation, retries) > Thanks, > johannes -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org
Re: [PATCH v3] ath10k: implement NAPI support
On 2016-08-26 17:19, Dave Taht wrote: On Fri, Aug 26, 2016 at 4:12 AM, Johannes Berg wrote: On Fri, 2016-08-26 at 03:48 -0700, Dave Taht wrote: I'm always rather big on people testing latency under load, and napi tends to add some. That's a completely useless comment. Obviously, everybody uses NAPI; it's necessary for system load and thus performance, and lets drivers take advantage of TCP merging to reduce ACKs, which is tremendously helpful (over wifi in particular.) Please stop making such drive-by comments that focus only on the single thing you find important above all; not all people can care only about that single thing, and unconstructively reiterating it over and over doesn't help. Well, I apologize for being testy. It is I spent a lot of time testing michal's patchset for the ath10k back in may, and I *will* go and retest ath10k, when these patches land. My principal concern with using napi is at lower rates than the maxes typically reported in a patchset. You are always welcome to validate this change and share your feedback. But it would be nice if people always did test for latency under load when making improvements, before getting to me, and despite having helped make a very comprehensive test suite available (flent) that tests all sorts of things for wifi, getting people to actually use it to see real problems, (in addition to latency under load!) while their fingers are still hot in the codebase, and track/plot their results, remains an ongoing issue across the entire industry. http://blog.cerowrt.org/post/fq_codel_on_ath10k/ As you know, NAPI is designed to improve performance of high speed n/w devices. From LWN: "NAPI is a proven (www.cyberus.ca/~hadi/usenix-paper.tgz) technique to improve network performance on Linux." Even most of Gig-speed network drivers were already migrated to NAPI. Tasklets are heavy CPU consumers and it will impact performance of other low priority tasks. The article[1] explains the problems of tasklet. From my observations, average CPU usage got reduced by 10% under heavy data traffic against same peak throughput. I validated this change on both IPQ4019 platform (Quad-Core ARM Cortex A7 processor) and AP135 platform (uni-core MIPS 720 MHz processor). I did not observe any regression. There are many other problems in wifi, of course, that could use engineering mental internalization, like airtime fairness, and the mis-behavior of the hardware queues, http://blog.cerowrt.org/post/cs5_lockout/ wifi channel scans http://blog.cerowrt.org/post/disabling_channel_scans/ and so on. I have a ton more datasets and blog entries left to write up from the ath9k work thus far which point to some other issues (minstrel, aggregation, retries) your data are really impressive. Once again, feel free to validate this change and share your inputs. [1] http://lwn.net/Articles/239633/ -Rajkumar