On Thu, Jul 17, 2025 at 09:57:09AM -0700, Jacob Keller wrote:
> The ice_cfg_tx_topo function attempts to apply Tx scheduler topology
> configuration based on NVM parameters, selecting either a 5 or 9 layer
> topology.
> 
> As part of this flow, the driver acquires the "Global Configuration Lock",
> which is a hardware resource associated with programming the DDP package
> to the device. This "lock" is implemented by firmware as a way to
> guarantee that only one PF can program the DDP for a device. Unlike a
> traditional lock, once a PF has acquired this lock, no other PF will be
> able to acquire it again (including that PF) until a CORER of the device.
> Future requests to acquire the lock report that global configuration has
> already completed.
> 
> The following flow is used to program the Tx topology:
> 
>  * Read the DDP package for scheduler configuration data
>  * Acquire the global configuration lock
>  * Program Tx scheduler topology according to DDP package data
>  * Trigger a CORER which clears the global configuration lock
> 
> This is followed by the flow for programming the DDP package:
> 
>  * Acquire the global configuration lock (again)
>  * Download the DDP package to the device
>  * Release the global configuration lock.
> 
> However, if configuration of the Tx topology fails, (i.e.
> ice_get_set_tx_topo returns an error code), the driver exits
> ice_cfg_tx_topo() immediately, and fails to trigger CORER.
> 
> While the global configuration lock is held, the firmware rejects most
> AdminQ commands, as it is waiting for the DDP package download (or Tx
> scheduler topology programming) to occur.
> 
> The current driver flows assume that the global configuration lock has been
> reset by CORER after programming the Tx topology. Thus, the same PF
> attempts to acquire the global lock again, and fails. This results in the
> driver reporting "an unknown error occurred when loading the DDP package".
> It then attempts to enter safe mode, but ultimately fails to finish
> ice_probe() since nearly all AdminQ command report error codes, and the
> driver stops loading the device at some point during its initialization.
> 
> The only currently known way that ice_get_set_tx_topo() can fail is with
> certain older DDP packages which contain invalid topology configuration, on
> firmware versions which strictly validate this data. The most recent
> releases of the DDP have resolved the invalid data. However, it is still
> poor practice to essentially brick the device, and prevent access to the
> device even through safe mode or recovery mode. It is also plausible that
> this command could fail for some other reason in the future.
> 
> We cannot simply release the global lock after a failed call to
> ice_get_set_tx_topo(). Releasing the lock indicates to firmware that global
> configuration (downloading of the DDP) has completed. Future attempts by
> this or other PFs to load the DDP will fail with a report that the DDP
> package has already been downloaded. Then, PFs will enter safe mode as they
> realize that the package on the device does not meet the minimum version
> requirement to load. The reported error messages are confusing, as they
> indicate the version of the default "safe mode" package in the NVM, rather
> than the version of the file loaded from /lib/firmware.
> 
> Instead, we need to trigger CORER to clear global configuration. This is
> the lowest level of hardware reset which clears the global configuration
> lock and related state. It also clears any already downloaded DDP.
> Crucially, it does *not* clear the Tx scheduler topology configuration.
> 
> Refactor ice_cfg_tx_topo() to always trigger a CORER after acquiring the
> global lock, regardless of success or failure of the topology
> configuration.
> 
> We need to re-initialize the HW structure when we trigger the CORER. Thus,
> it makes sense for this to be the responsibility of ice_cfg_tx_topo()
> rather than its caller, ice_init_tx_topology(). This avoids needless
> re-initialization in cases where we don't attempt to update the Tx
> scheduler topology, such as if it has already been programmed.
> 
> There is one catch: failure to re-initialize the HW struct should stop
> ice_probe(). If this function fails, we won't have a valid HW structure and
> cannot ensure the device is functioning properly. To handle this, ensure
> ice_cfg_tx_topo() returns a limited set of error codes. Set aside one
> specifically, -ENODEV, to indicate that the ice_init_tx_topology() should
> fail and stop probe.
> 
> Other error codes indicate failure to apply the Tx scheduler topology. This
> is treated as a non-fatal error, with an informational message informing
> the system administrator that the updated Tx topology did not apply. This
> allows the device to load and function with the default Tx scheduler
> topology, rather than failing to load entirely.
> 
> Note that this use of CORER will not result in loops with future PFs
> attempting to also load the invalid Tx topology configuration. The first PF
> will acquire the global configuration lock as part of programming the DDP.
> Each PF after this will attempt to acquire the global lock as part of
> programming the Tx topology, and will fail with the indication from
> firmware that global configuration is already complete. Tx scheduler
> topology configuration is only performed during driver init (probe or
> devlink reload) and not during cleanup for a CORER that happens after probe
> completes.
> 
> Fixes: 91427e6d9030 ("ice: Support 5 layer topology")
> Signed-off-by: Jacob Keller <[email protected]>

Thanks for the extensive explanation.

I have a minor nit below, but that notwithstanding this looks good to me.

Reviewed-by: Simon Horman <[email protected]>


> ---
>  drivers/net/ethernet/intel/ice/ice_ddp.c  | 44 
> ++++++++++++++++++++++---------
>  drivers/net/ethernet/intel/ice/ice_main.c | 14 ++++++----
>  2 files changed, 41 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c 
> b/drivers/net/ethernet/intel/ice/ice_ddp.c
> index 59323c019544..bc525de019de 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
> @@ -2374,7 +2374,13 @@ ice_get_set_tx_topo(struct ice_hw *hw, u8 *buf, u16 
> buf_size,
>   * The function will apply the new Tx topology from the package buffer
>   * if available.
>   *
> - * Return: zero when update was successful, negative values otherwise.
> + * Return:
> + * * 0 - Successfully applied topology configuration.
> + * * -EBUSY - Failed to acquire global configuration lock.
> + * * -EEXIST - Topology configuration has already been applied.
> + * * -EIO - Unable to apply topology configuration.
> + * * -ENODEV - Failed to re-initialize device after applying configuration.
> + * * Other negative error codes indicate unexpected failures.
>   */
>  int ice_cfg_tx_topo(struct ice_hw *hw, const void *buf, u32 len)
>  {
> @@ -2407,7 +2413,7 @@ int ice_cfg_tx_topo(struct ice_hw *hw, const void *buf, 
> u32 len)
>  
>       if (status) {
>               ice_debug(hw, ICE_DBG_INIT, "Get current topology is failed\n");
> -             return status;
> +             return -EIO;
>       }
>  
>       /* Is default topology already applied ? */
> @@ -2494,31 +2500,45 @@ int ice_cfg_tx_topo(struct ice_hw *hw, const void 
> *buf, u32 len)
>                                ICE_GLOBAL_CFG_LOCK_TIMEOUT);
>       if (status) {
>               ice_debug(hw, ICE_DBG_INIT, "Failed to acquire global lock\n");
> -             return status;
> +             return -EBUSY;
>       }
>  
>       /* Check if reset was triggered already. */
>       reg = rd32(hw, GLGEN_RSTAT);
>       if (reg & GLGEN_RSTAT_DEVSTATE_M) {
> -             /* Reset is in progress, re-init the HW again */
>               ice_debug(hw, ICE_DBG_INIT, "Reset is in progress. Layer 
> topology might be applied already\n");
>               ice_check_reset(hw);
> -             return 0;
> +             /* Reset is in progress, re-init the HW again */
> +             goto reinit_hw;
>       }
>  
>       /* Set new topology */
>       status = ice_get_set_tx_topo(hw, new_topo, size, NULL, NULL, true);
>       if (status) {
> -             ice_debug(hw, ICE_DBG_INIT, "Failed setting Tx topology\n");
> -             return status;
> +             ice_debug(hw, ICE_DBG_INIT, "Failed to set Tx topology, status 
> %pe\n",
> +                       ERR_PTR(status));
> +             /* only report -EIO here as the caller checks the error value
> +              * and reports an informational error message informing that
> +              * the driver failed to program Tx topology.
> +              */
> +             status = -EIO;
>       }
>  
> -     /* New topology is updated, delay 1 second before issuing the CORER */
> +     /* Even if Tx topology config failed, we need to CORE reset here to
> +      * clear the global configuration lock. Delay 1 second to allow
> +      * hardware to settle then issue a CORER
> +      */
>       msleep(1000);
>       ice_reset(hw, ICE_RESET_CORER);
> -     /* CORER will clear the global lock, so no explicit call
> -      * required for release.
> -      */
> +     ice_check_reset(hw);
>  
> -     return 0;
> +reinit_hw:

nit: I think you can move this label above ice_check_reset().
     As the only place that jumps to this label calls ice_check_reset()
     immediately before doing so. If so, renaming the label might
     also be appropriate (up to you on all fronts:)

> +     /* Since we triggered a CORER, re-initialize hardware */
> +     ice_deinit_hw(hw);
> +     if (ice_init_hw(hw)) {
> +             ice_debug(hw, ICE_DBG_INIT, "Failed to re-init hardware after 
> setting Tx topology\n");
> +             return -ENODEV;
> +     }
> +
> +     return status;
>  }

...

Reply via email to