On 4/2/21 7:20 PM, Gatis Peisenieks wrote:
> Tx queue cleanup happens in interrupt handler on same core as rx queue
> processing.
> Both can take considerable amount of processing in high packet-per-second
> scenarios.
>
> Sending big amounts of packets can stall the rx processing which is unfair
> and also can lead to to out-of-memory condition since __dev_kfree_skb_irq
> queues the skbs for later kfree in softirq which is not allowed to happen
> with heavy load in interrupt handler.
>
> This puts tx cleanup in its own napi and enables threaded napi to allow the
> rx/tx
> queue processing to happen on different cores.
>
> The ability to sustain equal amounts of tx/rx traffic increased:
> from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming Mikrotik 10/25G
> NIC,
> from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.
>
> Signed-off-by: Gatis Peisenieks <ga...@mikrotik.com>
> ---
> drivers/net/ethernet/atheros/atl1c/atl1c.h | 2 +
> .../net/ethernet/atheros/atl1c/atl1c_main.c | 43 +++++++++++++++++--
> 2 files changed, 41 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h
> b/drivers/net/ethernet/atheros/atl1c/atl1c.h
> index a0562a90fb6d..4404fa44d719 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c.h
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h
> @@ -506,6 +506,7 @@ struct atl1c_adapter {
> struct net_device *netdev;
> struct pci_dev *pdev;
> struct napi_struct napi;
> + struct napi_struct tx_napi;
> struct page *rx_page;
> unsigned int rx_page_offset;
> unsigned int rx_frag_size;
> @@ -529,6 +530,7 @@ struct atl1c_adapter {
> u16 link_duplex;
>
> spinlock_t mdio_lock;
> + spinlock_t irq_mask_lock;
> atomic_t irq_sem;
>
> struct work_struct common_task;
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 3f65f2b370c5..f51b28e8b6dc 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -813,6 +813,7 @@ static int atl1c_sw_init(struct atl1c_adapter *adapter)
> atl1c_set_rxbufsize(adapter, adapter->netdev);
> atomic_set(&adapter->irq_sem, 1);
> spin_lock_init(&adapter->mdio_lock);
> + spin_lock_init(&adapter->irq_mask_lock);
> set_bit(__AT_DOWN, &adapter->flags);
>
> return 0;
> @@ -1530,7 +1531,7 @@ static inline void atl1c_clear_phy_int(struct
> atl1c_adapter *adapter)
> spin_unlock(&adapter->mdio_lock);
> }
>
> -static bool atl1c_clean_tx_irq(struct atl1c_adapter *adapter,
> +static unsigned atl1c_clean_tx_irq(struct atl1c_adapter *adapter,
> enum atl1c_trans_queue type)
This v2 is much better, thanks.
You might rename this atl1c_clean_tx_irq(), because it is now
not run under hard irqs ?
Maybe merge atl1c_clean_tx_irq() and atl1c_clean_tx() into a single function ?