> Subject: [EXTERNAL] Re: [PATCH v2 2/8] net/netvsc: fix race conditions on VF
> add/remove events
> 
> On Fri, 20 Feb 2026 18:45:21 -0800
> [email protected] wrote:
> 
> > From: Long Li <[email protected]>
> >
> > Netvsc gets notification from VSP on VF add/remove over VMBUS, but the
> > timing may not match the DPDK sequence of device events triggered from
> > uevents from kernel.
> >
> > Remove the retry logic from the code when attach to VF and rely on
> > DPDK event to attach to VF. With this change, both the notifications
> > from VSP and the DPDK will attempt a VF attach.
> >
> > Also implement locking when checking on all VF related fields.
> >
> > Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> > Cc: [email protected]
> >
> > Signed-off-by: Long Li <[email protected]>
> 
> AI review spotted related issue.
> 
> **Patch 2 (net/netvsc: fix race conditions on VF add/remove events)** — the
> most complex patch in the series.
> 
> **What it fixes correctly:**
> 
> The old Tx/Rx paths had a TOCTOU race: they checked `vf_vsc_switched` without
> the lock, acquired the lock, then re-checked. A VF remove could complete
> between the first check and the lock acquisition. The new code takes the read
> lock *before* any VF state checks — correct fix. The lock is properly 
> released on
> both paths.
> 
> The upgrade of `hn_vf_close()` from read lock to write lock is also a real 
> bug fix,
> since it modifies `vf_attached` and calls `rte_eth_dev_close()`.
> 
> Moving callback registration into `hn_vf_attach()` with proper rollback (via 
> the
> new `hn_vf_detach()` helper) is a good structural improvement that ties 
> callback
> lifetime to VF attach/detach lifecycle.
> 
> The unconditional clear of `vf_vsc_switched` in `hn_vf_remove_unlocked()` is
> correct — if the VF is being removed, the switched flag must be cleared
> regardless of whether `hn_nvs_set_datapath(SYNTHETIC)` succeeded.
> 
> **One potential concern (~60% confidence):**
> 
> If the VF is successfully configured and started but `hn_nvs_set_datapath(VF)`
> fails at the `switch_data_path:` label, the function returns an error but 
> leaves the
> VF started and attached. The callers don't clean this up. This is a 
> pre-existing
> design issue the patch doesn't worsen, and the hypervisor may retry, but it 
> could
> confuse subsequent add/remove cycles.

I think it's safe to leave this code as is. Although VF is started and 
attached, netvsc will never use the VF since vf_vsc_switched is not set. This 
is similar to what netvsc does in the kernel.

Reply via email to