Re: [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure

sashiko-bot Tue, 19 May 2026 12:53:17 -0700

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) 
to consider:
- [High] Missing re-allocation of `apc->rxqs` during interface bring-up causes 
deterministic NULL pointer dereference
- [High] Uninitialized queue handles (0 and INVALID_MANA_HANDLE) are sent to 
the firmware during error teardown, corrupting hardware state
- [High] GDMA admin commands are issued immediately after a PCI Function Level 
Reset (FLR), causing severe cascading timeouts and hardware state violations
--


commit 2129283d37aabfe724d7509e34346eab93d302a2
Author: Dipayaan Roy <[email protected]>

net: mana: Add NULL guards in teardown path to prevent panic on attach failure
    
When queue allocation fails partway through, the error cleanup frees
and NULLs apc->tx_qp and apc->rxqs. The patch adds NULL guards for these
pointers in teardown paths such as mana_destroy_vport(), mana_fence_rqs(),
and mana_dealloc_queues() to prevent NULL pointer dereferences.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c 
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9afc786b297a8..9e7e4bf526bf6 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
[ ... ]
> @@ -2858,13 +2861,16 @@ static void mana_destroy_vport(struct 
> mana_port_context *apc)
>       struct mana_rxq *rxq;
>       u32 rxq_idx;
>  
> -     for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> -             rxq = apc->rxqs[rxq_idx];
> -             if (!rxq)
> -                     continue;
> +     if (apc->rxqs) {
>  
> -             mana_destroy_rxq(apc, rxq, true);
> -             apc->rxqs[rxq_idx] = NULL;
> +             for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> +                     rxq = apc->rxqs[rxq_idx];
> +                     if (!rxq)
> +                             continue;
> +
> +                     mana_destroy_rxq(apc, rxq, true);
> +                     apc->rxqs[rxq_idx] = NULL;
> +             }
>       }

If the teardown path frees apc->rxqs and sets it to NULL on attach failure,
how does the interface recover if the user subsequently brings it back up?

When mana_open() is invoked, it directly calls mana_alloc_queues(). Inside
mana_alloc_queues() -> mana_add_rx_queues(), the code does:

    apc->rxqs[i] = rxq;

If apc->rxqs was left as NULL during the teardown, will this cause a NULL
pointer dereference on the next interface bring-up?

[ ... ]
> @@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device 
> *ndev)
>        * number of queues.
>        */
>  
> -     for (i = 0; i < apc->num_queues; i++) {
> -             txq = &apc->tx_qp[i].txq;
> -             tsleep = 1000;
> -             while (atomic_read(&txq->pending_sends) > 0 &&
> -                    time_before(jiffies, timeout)) {
> -                     usleep_range(tsleep, tsleep + 1000);
> -                     tsleep <<= 1;
> -             }
> -             if (atomic_read(&txq->pending_sends)) {
> -                     err = pcie_flr(to_pci_dev(gd->gdma_context->dev));
> -                     if (err) {
> -                             netdev_err(ndev, "flr failed %d with %d pkts 
> pending in txq %u\n",
> -                                        err, 
> atomic_read(&txq->pending_sends),
> -                                        txq->gdma_txq_id);
> +     if (apc->tx_qp) {

In the error path where queue creation fails partway through, will
mana_destroy_txq() be called?

If mana_destroy_txq() unconditionally iterates over all apc->num_queues:

    for (i = 0; i < apc->num_queues; i++) {
        ...
        mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object);
        ...

Will it pass 0 or INVALID_MANA_HANDLE to mana_destroy_wq_obj() for queues
that were never fully initialized?

Does mana_destroy_wq_obj() send a MANA_DESTROY_WQ_OBJ command to the hardware
with these uninitialized handles, and could handle 0 corrupt the firmware
state?

> +             for (i = 0; i < apc->num_queues; i++) {
> +                     txq = &apc->tx_qp[i].txq;
> +                     tsleep = 1000;
> +                     while (atomic_read(&txq->pending_sends) > 0 &&
> +                            time_before(jiffies, timeout)) {
> +                             usleep_range(tsleep, tsleep + 1000);
> +                             tsleep <<= 1;
> +                     }
> +                     if (atomic_read(&txq->pending_sends)) {
> +                             err =
> +                                 pcie_flr(to_pci_dev(gd->gdma_context->dev));

If pcie_flr() is triggered here, the PCI Function Level Reset wipes the
internal GDMA state.

Further down in this function, mana_disable_vport_rx(), mana_fence_rqs(),
and mana_destroy_vport() are called. Will these functions attempt to send
commands to the now-destroyed GDMA admin queue via mana_send_request() and
wait for completions?

Could this lead to cascading timeouts since the hardware was just reset and
cannot process admin commands?

> +                             if (err) {
> +                                     netdev_err(ndev, "flr failed %d with %d 
> pkts pending in txq %u\n",
> +                                                err,
> +                                         atomic_read(&txq->pending_sends),
> +                                         txq->gdma_txq_id);
> +                             }
> +                             break;
>                       }
> -                     break;
>               }

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=1

Re: [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure

Reply via email to