On 12/5/2025 12:24 AM, Aaron Ma wrote:
After wakeup from suspend, IRDMA is initialized with error:

kernel: ice 0000:60:00.0: IRDMA hardware initialization FAILED init_state=4 
status=-110
kernel: ice 0000:60:00.1: IRDMA hardware initialization FAILED init_state=4 
status=-110
kernel: irdma.gen_2 ice.roce.1: probe with driver irdma.gen_2 failed with error 
-110
kernel: irdma.gen_2 ice.roce.2: probe with driver irdma.gen_2 failed with error 
-110

IRDMA times out because the initialization before the schedule reset.
The ice_init_rdma() function already calls ice_plug_aux_dev() internally,
ensuring proper initialization order.

Fixes: bc69ad74867db ("ice: avoid IRQ collision to fix init failure on ACPI S3 
resume")
Reviewed-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Aaron Ma <[email protected]>
---
V1 -> V2: no changes.

  drivers/net/ethernet/intel/ice/ice_main.c | 12 ++++++------
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 2533876f1a2fd..c6dd04d24ac09 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5677,11 +5677,6 @@ static int ice_resume(struct device *dev)
        if (ret)
                dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
- ret = ice_init_rdma(pf);
-       if (ret)
-               dev_err(dev, "Reinitialize RDMA during resume failed: %d\n",
-                       ret);
-
        clear_bit(ICE_DOWN, pf->state);
        /* Now perform PF reset and rebuild */
        reset_type = ICE_RESET_PFR;
@@ -7805,7 +7800,12 @@ static void ice_rebuild(struct ice_pf *pf, enum 
ice_reset_req reset_type)
ice_health_clear(pf); - ice_plug_aux_dev(pf);
+       /* Initialize RDMA after control queues are ready */
+       err = ice_init_rdma(pf);

ice_init_rdma() allocates a new pf->cdev_info on each call. While it works for this particular flow, ice_rebuild() is called for all reset paths so this can cause a memory leak with cdev_info since RDMA is not de-inited for resets.

Additionally, ice_init_rdma() seems to be well placed in ice_resume() to mirror the deinit in ice_suspend(). As you mentioned the problem is caused by plug occurring before a reset. I think the call to ice_plug_aux_dev() should be removed from ice_init_rdma() to stop this from happening. With that change the plug won't occur before a reset and, following reset, plug will be called as part of rebuild when everything is up and ready. As ice_init_rdma() is also called in one other location (probe), ice_plug_aux_dev() should be added after the RDMA init to preserve current flow.

Corresponding changes should be made to the cleanup function as well to match these changes. i.e. mirror the removal of ice_plug_aux_dev() from ice_init_rdma() with removing ice_unplug_aux_dev() from ice_deinit_rdma() and precede the calls of ice_deinit_rdma() with ice_unplug_aux_dev().

Thanks,
Tony


+       if (err)
+               dev_err(dev, "Reinitialize RDMA after rebuild failed: %d\n",
+                       err);
+
        if (ice_is_feature_supported(pf, ICE_F_SRIOV_LAG))
                ice_lag_rebuild(pf);

Reply via email to