> Long Li <[email protected]> wrote: > > > When DPDK stops a netvsc device (e.g. on testpmd quit), the data path > > was left pointing to the VF/MANA device. If the kernel netvsc driver > > subsequently reloads the MANA device and opens it, incoming traffic > > arrives on the MANA device immediately, before the queues are fully > > initialized. This causes bogus RX completion events to appear on the > > TX completion queue, triggering a kernel WARNING in mana_poll_tx_cq(). > > > > Fix this by switching the data path back to synthetic (via > > NVS_DATAPATH_SYNTHETIC) in hn_vf_stop() before stopping the VF device. > > This tells the host to route traffic through the synthetic path, so > > that when the MANA driver recreates its queues, no unexpected traffic > > arrives until netvsc explicitly switches back to VF. > > > > Also update hn_vf_start() to switch the data path back to VF after the > > VF device is started, enabling correct stop/start cycling. > > > > Both functions now use write locks instead of read locks since they > > modify vf_vsc_switched state. > > > > Fixes: dc7680e8597c ("net/netvsc: support integrated VF") > > Cc: [email protected] > > > > Signed-off-by: Long Li <[email protected]> > > Looks good to me you might want to address the error condition spotted by AI > review. > > --- > > **Patch: net/netvsc: switch data path to synthetic on device stop** > > This patch addresses a real race condition where stopping a netvsc device > leaves > the data path pointing to the VF/MANA device, causing kernel warnings when the > MANA driver later reinitializes. The fix is logically sound — switch to > synthetic > before stopping the VF, and re-switch to VF on start. > > **Error: `hn_vf_stop()` — `vf_vsc_switched` cleared even when > `hn_nvs_set_datapath()` fails** > > In `hn_vf_stop()`, if `hn_nvs_set_datapath(hv, NVS_DATAPATH_SYNTHETIC)` fails, > `vf_vsc_switched` is unconditionally set to `false`. This means the driver > believes > it switched to synthetic when the host may still be routing traffic through > the VF. > On a subsequent `hn_vf_start()`, the `!hv->vf_ctx.vf_vsc_switched` check will > pass and the driver will try to re-switch to VF — but since the host never > left VF > mode, this is a no-op at best or confusing at worst. More importantly, if > stop is > being called on the path to teardown, the flag is now wrong. > > I note that `hn_vf_remove_unlocked()` has the same pattern (clears the flag > regardless, with the comment "Clear switched flag regardless — VF is being > removed"), so this may be intentional for netvsc since on the remove path you > want to forget the state. But on the stop path the device is still present > and will > be restarted — propagating the error and leaving `vf_vsc_switched = true` > might > be more correct so that `hn_vf_start()` retries the switch. Worth confirming > this is > intentional. > > **Warning: `hn_vf_start()` — error from `hn_nvs_set_datapath()` returned but > VF > device left started** > > In `hn_vf_start()`, if `rte_eth_dev_start()` succeeds but the subsequent > `hn_nvs_set_datapath(hv, NVS_DATAPATH_VF)` fails, the function logs the error > and returns the failure code. However, the VF device is left in the started > state. > The caller sees a failure from `hn_vf_start()`, but the VF is running with no > traffic > routed to it. This is a resource consistency issue — if the datapath switch > fails, > should the VF be stopped again to maintain consistent state?
I'm sending v2 to fix this. > > **Warning: `hn_vf_add_unlocked()` — change defers datapath switch but still > returns 0 on the deferred path** > > The patch modifies `hn_vf_add_unlocked()` to skip the datapath switch when > `!dev->data->dev_started`. This is correct, but note that in the original > code the > function would return the result of `hn_nvs_set_datapath()` — if that failed, > it > returned an error. Now on the deferred path, `ret` retains whatever value it > had > from the attach/configure path (could be 0 from a successful attach), so the > caller > gets success even though the datapath switch was not attempted. This is fine > for > the hot-add-before-start case, just noting the behavior change. > This warning can be ignored. Thanks, Long > **Info: Lock upgrade from read to write is correct** > > Both `hn_vf_start()` and `hn_vf_stop()` correctly switch from > `rte_rwlock_read_lock` to `rte_rwlock_write_lock` since they now modify > `vf_vsc_switched`. This matches the locking pattern used by `hn_vf_close()` > and > `hn_nvs_handle_vfassoc()`.

