> -----Original Message-----
> From: Simon Horman <[email protected]>
> Sent: Saturday, June 13, 2026 4:20 AM
> To: [email protected]
> Cc: Simon Horman <[email protected]>; [email protected];
> [email protected]; KY Srinivasan <[email protected]>; Haiyang Zhang
> <[email protected]>; [email protected]; Dexuan Cui
> <[email protected]>; Long Li <[email protected]>;
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Konstantin Taranov
> <[email protected]>; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; Paul Rosswurm
> <[email protected]>
> Subject: [EXTERNAL] Re: [PATCH net-next v3] net: mana: Add Interrupt
> Moderation support
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsashiko.
> dev%2F&data=05%7C02%7Chaiyangz%40microsoft.com%7Ccbca53cce3d24759006708dec
> 924a556%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639169356403391033%7C
> Unknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJX
> aW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=D2ZPGJ5XznEkcL
> uH3Cxtze0FYOL3xxDIY%2Bg2DP508Q8%3D&reserved=0
> ---
> Date: Thu Jun 11 19:02:22 2026 +0000
>
> net: mana: Add Interrupt Moderation support
>
> This commit adds Static and Dynamic Interrupt Moderation (DIM) support
> for
> Rx and Tx. It updates the queue creation procedure, adds functions to
> collect stats for DIM, and provides ethtool handler support to get/set
> the moderation settings.
>
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 26aef21c6c2c8..d92e022d8533c 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -1579,6 +1579,9 @@ int mana_create_wq_obj(struct mana_port_context
> *apc,
> >
> > mana_gd_init_req_hdr(&req.hdr, MANA_CREATE_WQ_OBJ,
> > sizeof(req), sizeof(resp));
> > +
> > + req.hdr.req.msg_version = GDMA_MESSAGE_V3;
> > + req.hdr.resp.msg_version = GDMA_MESSAGE_V2;
> > req.vport = vport;
> > req.wq_type = wq_type;
> > req.wq_gdma_region = wq_spec->gdma_region;
> > @@ -1587,6 +1590,9 @@ int mana_create_wq_obj(struct mana_port_context
> *apc,
> > req.cq_size = cq_spec->queue_size;
> > req.cq_moderation_ctx_id = cq_spec->modr_ctx_id;
> > req.cq_parent_qid = cq_spec->attached_eq;
> > + req.req_cq_moderation = cq_spec->req_cq_moderation;
> > + req.cq_moderation_comp = cq_spec->cq_moderation_comp;
> > + req.cq_moderation_usec = cq_spec->cq_moderation_usec;
>
> [Severity: High]
> Does this break queue creation on older Azure host firmwares?
>
> The code unconditionally bumps the WQ object creation request to use
> GDMA_MESSAGE_V3 and passes the req_cq_moderation flag. If the firmware
> does
> not support GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION, could the
> firmware
> reject the queue creation request and break device initialization
> entirely?
Our firmware is forward compatible with newer message versions, so the
old firmware still properly handles this message, just the new feature
fields are ignored, and queue creation will be successful.
And if the DIM capability bit is zero from FW, driver will keep the DIM
feature to be off and unchangeable.
>
> [ ... ]
>
> > +static void mana_update_tx_dim(struct mana_cq *cq)
> > +{
> > + struct mana_port_context *apc = netdev_priv(cq->txq->ndev);
> > + struct dim_sample dim_sample = {};
> > + struct mana_txq *txq = cq->txq;
> > +
> > + if (!apc->tx_dim_enabled)
> > + return;
> > +
> > + dim_update_sample(READ_ONCE(cq->dim_event_ctr), txq->stats.packets,
> > + txq->stats.bytes, &dim_sample);
> > + net_dim(&cq->dim, &dim_sample);
> > +}
>
> [Severity: High]
> Does this measure the CPU enqueue rate rather than the hardware completion
> rate?
>
> It appears that txq->stats.packets is incremented in mana_start_xmit()
> upon enqueue, not during completion in mana_poll_tx_cq(). Dynamic
> interrupt
> moderation relies on measuring the actual hardware completion rate to
> function correctly.
>
> Additionally, is it safe to read txq->stats.packets here without the
> u64_stats_fetch_begin() retry loop? Reading a 64-bit statistic locklessly
> without the synchronization primitives could result in torn reads on 32-
> bit
> systems.
In the next patch version, I added counters to cq, and track the HW comp rate,
And the reader is in the same NAPI thread to prevent any torn reads.
>
> [ ... ]
>
> > @@ -2324,7 +2434,13 @@ static int mana_cq_handler(void *context, struct
> gdma_queue *gdma_queue)
> > if (w < cq->budget) {
> > mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
> > cq->work_done_since_doorbell = 0;
> > - napi_complete_done(&cq->napi, w);
> > +
> > + if (napi_complete_done(&cq->napi, w)) {
> > + if (cq->type == MANA_CQ_TYPE_RX)
> > + mana_update_rx_dim(cq);
> > + else
> > + mana_update_tx_dim(cq);
> > + }
>
> [Severity: High]
> Is it safe to call mana_update_rx_dim() and mana_update_tx_dim() after
> napi_complete_done()?
>
> Since napi_complete_done() ends the NAPI polling cycle and relinquishes
> mutual exclusion, and the hardware interrupt was already re-armed by
> mana_gd_ring_cq() right before this, couldn't a new packet trigger an
> interrupt immediately? This could allow NAPI to be scheduled on another
> CPU
> concurrently while this CPU is still running the DIM update, potentially
> corrupting the non-thread-safe net_dim() state machine.
Thanks, I moved the dim update before the napi_complete_done() to avoid
concurrently running it.
>
> [ ... ]
>
> > @@ -2562,6 +2685,9 @@ static int mana_create_txq(struct
> mana_port_context *apc,
> > napi_enable_locked(&cq->napi);
> > txq->napi_initialized = true;
> >
> > + INIT_WORK(&cq->dim.work, mana_tx_dim_work);
> > + cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
>
> [Severity: High]
> Could this allow NAPI to schedule an uninitialized work_struct?
>
> napi_enable_locked() is called right before INIT_WORK(). Once NAPI is
> enabled, it can be scheduled to run immediately (for example, via
> netpoll).
> If NAPI executes before INIT_WORK completes, mana_update_tx_dim() would
> pass the uninitialized cq->dim to net_dim(), which could cause a kernel
> panic when it attempts to call schedule_work().
>
> This pattern also occurs in mana_create_rxq().
I have updated the init order in the new version.
>
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > index 94e658d07a27e..67a2b282ff4da 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
>
> [ ... ]
>
> > @@ -440,18 +474,123 @@ static int mana_set_coalesce(struct net_device
> *ndev,
> > + if (enable_rx_dim)
> > + apc->rx_dim_enabled = true;
> > + if (enable_tx_dim)
> > + apc->tx_dim_enabled = true;
>
> [Severity: High]
> Could the lack of memory barriers here expose uninitialized DIM structures
> to NAPI?
>
> The driver initializes DIM via mana_dim_change() and then sets
> apc->rx_dim_enabled to true. Without an smp_store_release() here and a
> corresponding smp_load_acquire() in mana_update_rx_dim(), weakly-ordered
> CPUs like ARM64 might reorder the stores. Concurrently, NAPI polling might
> observe the flag as true before the initialization is fully visible in
> memory,
> potentially invoking net_dim() on garbage memory.
I added smp_store_release() & smp_load_acquire() to fix it in the next version.
And, I will submit the next version soon.
Thanks,
- Haiyang