On 5/4/2026 10:14 PM, Jacob Keller wrote:
> From: Bart Van Assche <[email protected]>
> 
> Move the mutex_lock() call up to prevent that DCB settings change after
> the first ice_query_port_ets() call. The second ice_query_port_ets()
> call in ice_dcb_rebuild() is already protected by pf->tc_mutex.
> 
> This also fixes a bug in an error path, as before taking the first
> "goto dcb_error" in the function jumped over mutex_lock() to
> mutex_unlock().
> 
> This bug has been detected by the clang thread-safety analyzer.
> 
> Cc: Aleksandr Loktionov <[email protected]>
> Cc: [email protected]
> Fixes: 242b5e068b25 ("ice: Fix DCB rebuild after reset")
> Signed-off-by: Bart Van Assche <[email protected]>
> Reviewed-by: Aleksandr Loktionov <[email protected]>
> Reviewed-by: Przemek Kitszel <[email protected]>
> Tested-by: Arpana Arland <[email protected]>
> Signed-off-by: Jacob Keller <[email protected]>
> ---
>  drivers/net/ethernet/intel/ice/ice_dcb_lib.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c 
> b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> index 16aa25535152..0bc6dd375687 100644
> --- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> @@ -537,14 +537,14 @@ void ice_dcb_rebuild(struct ice_pf *pf)
>       struct ice_dcbx_cfg *err_cfg;
>       int ret;
>  
> +     mutex_lock(&pf->tc_mutex);
> +
>       ret = ice_query_port_ets(pf->hw.port_info, &buf, sizeof(buf), NULL);
>       if (ret) {
>               dev_err(dev, "Query Port ETS failed\n");
>               goto dcb_error;
>       }
>  
> -     mutex_lock(&pf->tc_mutex);
> -
>       if (!pf->hw.port_info->qos_cfg.is_sw_lldp)
>               ice_cfg_etsrec_defaults(pf->hw.port_info);
>  
> 

Sashiko thinks there is a problem here:

> Does moving mutex_lock(&pf->tc_mutex) to the top of ice_dcb_rebuild()
> introduce an AB-BA deadlock with rtnl_lock?
> In standard DCB netlink operations, the netlink layer calls into the driver
> with rtnl_lock already held, and the driver subsequently acquires
> pf->tc_mutex. This establishes a required lock ordering of rtnl_lock
> followed by pf->tc_mutex.
> With this change, if the first ice_query_port_ets() fails, we take the goto
> dcb_error path while already holding pf->tc_mutex.
> Looking at the error cleanup path lower down in ice_dcb_rebuild():
> dcb_error:
>     dev_err(dev, "Disabling DCB until new settings occur\n");
>     [ ... ]
>     ice_pf_dcb_cfg(pf, err_cfg, false);
> Because the locked parameter is passed as false, ice_pf_dcb_cfg() will
> unconditionally acquire rtnl_lock().
> Does this create a lock inversion (pf->tc_mutex followed by rtnl_lock)
> against concurrent DCB netlink operations?
This seems like a fully pre-existing error. We already jump to dcb_error
else where in the function.

I don't know if this locking order really is an ABBA violation (I did
not review any of the other flows that take tc_mutex to confirm), but I
don't think it should hold this fix.

Someone from the ice team will need to investigate and see what the best
solution is. I suspect we'll have to take RTNL lock then the tc_mutex
and pass true to the ice_pf_dcb_cfg function. Or, better yet, see if
this converts to the netdev per-instance lock and we could drop the
tc_mutex entirely, relying on netdev_lock?

Thanks,
Jake

Reply via email to