On 5/4/2026 10:14 PM, Jacob Keller wrote: > From: Bart Van Assche <[email protected]> > > Move the mutex_lock() call up to prevent that DCB settings change after > the first ice_query_port_ets() call. The second ice_query_port_ets() > call in ice_dcb_rebuild() is already protected by pf->tc_mutex. > > This also fixes a bug in an error path, as before taking the first > "goto dcb_error" in the function jumped over mutex_lock() to > mutex_unlock(). > > This bug has been detected by the clang thread-safety analyzer. > > Cc: Aleksandr Loktionov <[email protected]> > Cc: [email protected] > Fixes: 242b5e068b25 ("ice: Fix DCB rebuild after reset") > Signed-off-by: Bart Van Assche <[email protected]> > Reviewed-by: Aleksandr Loktionov <[email protected]> > Reviewed-by: Przemek Kitszel <[email protected]> > Tested-by: Arpana Arland <[email protected]> > Signed-off-by: Jacob Keller <[email protected]> > --- > drivers/net/ethernet/intel/ice/ice_dcb_lib.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c > b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c > index 16aa25535152..0bc6dd375687 100644 > --- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c > +++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c > @@ -537,14 +537,14 @@ void ice_dcb_rebuild(struct ice_pf *pf) > struct ice_dcbx_cfg *err_cfg; > int ret; > > + mutex_lock(&pf->tc_mutex); > + > ret = ice_query_port_ets(pf->hw.port_info, &buf, sizeof(buf), NULL); > if (ret) { > dev_err(dev, "Query Port ETS failed\n"); > goto dcb_error; > } > > - mutex_lock(&pf->tc_mutex); > - > if (!pf->hw.port_info->qos_cfg.is_sw_lldp) > ice_cfg_etsrec_defaults(pf->hw.port_info); > >
Sashiko thinks there is a problem here: > Does moving mutex_lock(&pf->tc_mutex) to the top of ice_dcb_rebuild() > introduce an AB-BA deadlock with rtnl_lock? > In standard DCB netlink operations, the netlink layer calls into the driver > with rtnl_lock already held, and the driver subsequently acquires > pf->tc_mutex. This establishes a required lock ordering of rtnl_lock > followed by pf->tc_mutex. > With this change, if the first ice_query_port_ets() fails, we take the goto > dcb_error path while already holding pf->tc_mutex. > Looking at the error cleanup path lower down in ice_dcb_rebuild(): > dcb_error: > dev_err(dev, "Disabling DCB until new settings occur\n"); > [ ... ] > ice_pf_dcb_cfg(pf, err_cfg, false); > Because the locked parameter is passed as false, ice_pf_dcb_cfg() will > unconditionally acquire rtnl_lock(). > Does this create a lock inversion (pf->tc_mutex followed by rtnl_lock) > against concurrent DCB netlink operations? This seems like a fully pre-existing error. We already jump to dcb_error else where in the function. I don't know if this locking order really is an ABBA violation (I did not review any of the other flows that take tc_mutex to confirm), but I don't think it should hold this fix. Someone from the ice team will need to investigate and see what the best solution is. I suspect we'll have to take RTNL lock then the tc_mutex and pass true to the ice_pf_dcb_cfg function. Or, better yet, see if this converts to the netdev per-instance lock and we could drop the tc_mutex entirely, relying on netdev_lock? Thanks, Jake
