> Subject: [EXTERNAL] Re: [PATCH v2 2/8] net/netvsc: fix race conditions on VF > add/remove events > > On Fri, 20 Feb 2026 18:45:21 -0800 > [email protected] wrote: > > > From: Long Li <[email protected]> > > > > Netvsc gets notification from VSP on VF add/remove over VMBUS, but the > > timing may not match the DPDK sequence of device events triggered from > > uevents from kernel. > > > > Remove the retry logic from the code when attach to VF and rely on > > DPDK event to attach to VF. With this change, both the notifications > > from VSP and the DPDK will attempt a VF attach. > > > > Also implement locking when checking on all VF related fields. > > > > Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove") > > Cc: [email protected] > > > > Signed-off-by: Long Li <[email protected]> > > AI review spotted related issue. > > **Patch 2 (net/netvsc: fix race conditions on VF add/remove events)** — the > most complex patch in the series. > > **What it fixes correctly:** > > The old Tx/Rx paths had a TOCTOU race: they checked `vf_vsc_switched` without > the lock, acquired the lock, then re-checked. A VF remove could complete > between the first check and the lock acquisition. The new code takes the read > lock *before* any VF state checks — correct fix. The lock is properly > released on > both paths. > > The upgrade of `hn_vf_close()` from read lock to write lock is also a real > bug fix, > since it modifies `vf_attached` and calls `rte_eth_dev_close()`. > > Moving callback registration into `hn_vf_attach()` with proper rollback (via > the > new `hn_vf_detach()` helper) is a good structural improvement that ties > callback > lifetime to VF attach/detach lifecycle. > > The unconditional clear of `vf_vsc_switched` in `hn_vf_remove_unlocked()` is > correct — if the VF is being removed, the switched flag must be cleared > regardless of whether `hn_nvs_set_datapath(SYNTHETIC)` succeeded. > > **One potential concern (~60% confidence):** > > If the VF is successfully configured and started but `hn_nvs_set_datapath(VF)` > fails at the `switch_data_path:` label, the function returns an error but > leaves the > VF started and attached. The callers don't clean this up. This is a > pre-existing > design issue the patch doesn't worsen, and the hypervisor may retry, but it > could > confuse subsequent add/remove cycles.
I think it's safe to leave this code as is. Although VF is started and attached, netvsc will never use the VF since vf_vsc_switched is not set. This is similar to what netvsc does in the kernel.

