> Long Li <[email protected]> wrote:
> 
> > When DPDK stops a netvsc device (e.g. on testpmd quit), the data path
> > was left pointing to the VF/MANA device. If the kernel netvsc driver
> > subsequently reloads the MANA device and opens it, incoming traffic
> > arrives on the MANA device immediately, before the queues are fully
> > initialized. This causes bogus RX completion events to appear on the
> > TX completion queue, triggering a kernel WARNING in mana_poll_tx_cq().
> >
> > Fix this by switching the data path back to synthetic (via
> > NVS_DATAPATH_SYNTHETIC) in hn_vf_stop() before stopping the VF device.
> > This tells the host to route traffic through the synthetic path, so
> > that when the MANA driver recreates its queues, no unexpected traffic
> > arrives until netvsc explicitly switches back to VF.
> >
> > Also update hn_vf_start() to switch the data path back to VF after the
> > VF device is started, enabling correct stop/start cycling.
> >
> > Both functions now use write locks instead of read locks since they
> > modify vf_vsc_switched state.
> >
> > Fixes: dc7680e8597c ("net/netvsc: support integrated VF")
> > Cc: [email protected]
> >
> > Signed-off-by: Long Li <[email protected]>
> 
> Looks good to me you might want to address the error condition spotted by AI
> review.
> 
> ---
> 
> **Patch: net/netvsc: switch data path to synthetic on device stop**
> 
> This patch addresses a real race condition where stopping a netvsc device 
> leaves
> the data path pointing to the VF/MANA device, causing kernel warnings when the
> MANA driver later reinitializes. The fix is logically sound — switch to 
> synthetic
> before stopping the VF, and re-switch to VF on start.
> 
> **Error: `hn_vf_stop()` — `vf_vsc_switched` cleared even when
> `hn_nvs_set_datapath()` fails**
> 
> In `hn_vf_stop()`, if `hn_nvs_set_datapath(hv, NVS_DATAPATH_SYNTHETIC)` fails,
> `vf_vsc_switched` is unconditionally set to `false`. This means the driver 
> believes
> it switched to synthetic when the host may still be routing traffic through 
> the VF.
> On a subsequent `hn_vf_start()`, the `!hv->vf_ctx.vf_vsc_switched` check will
> pass and the driver will try to re-switch to VF — but since the host never 
> left VF
> mode, this is a no-op at best or confusing at worst. More importantly, if 
> stop is
> being called on the path to teardown, the flag is now wrong.
> 
> I note that `hn_vf_remove_unlocked()` has the same pattern (clears the flag
> regardless, with the comment "Clear switched flag regardless — VF is being
> removed"), so this may be intentional for netvsc since on the remove path you
> want to forget the state. But on the stop path the device is still present 
> and will
> be restarted — propagating the error and leaving `vf_vsc_switched = true` 
> might
> be more correct so that `hn_vf_start()` retries the switch. Worth confirming 
> this is
> intentional.
> 
> **Warning: `hn_vf_start()` — error from `hn_nvs_set_datapath()` returned but 
> VF
> device left started**
> 
> In `hn_vf_start()`, if `rte_eth_dev_start()` succeeds but the subsequent
> `hn_nvs_set_datapath(hv, NVS_DATAPATH_VF)` fails, the function logs the error
> and returns the failure code. However, the VF device is left in the started 
> state.
> The caller sees a failure from `hn_vf_start()`, but the VF is running with no 
> traffic
> routed to it. This is a resource consistency issue — if the datapath switch 
> fails,
> should the VF be stopped again to maintain consistent state?

I'm sending v2 to fix this.

> 
> **Warning: `hn_vf_add_unlocked()` — change defers datapath switch but still
> returns 0 on the deferred path**
> 
> The patch modifies `hn_vf_add_unlocked()` to skip the datapath switch when
> `!dev->data->dev_started`. This is correct, but note that in the original 
> code the
> function would return the result of `hn_nvs_set_datapath()` — if that failed, 
> it
> returned an error. Now on the deferred path, `ret` retains whatever value it 
> had
> from the attach/configure path (could be 0 from a successful attach), so the 
> caller
> gets success even though the datapath switch was not attempted. This is fine 
> for
> the hot-add-before-start case, just noting the behavior change.
> 

This warning can be ignored.

Thanks,
Long

> **Info: Lock upgrade from read to write is correct**
> 
> Both `hn_vf_start()` and `hn_vf_stop()` correctly switch from
> `rte_rwlock_read_lock` to `rte_rwlock_write_lock` since they now modify
> `vf_vsc_switched`. This matches the locking pattern used by `hn_vf_close()` 
> and
> `hn_nvs_handle_vfassoc()`.

Reply via email to