On Thu, Jun 29, 2023 at 9:43 PM Ilya Maximets <i.maxim...@ovn.org> wrote: > > On 6/20/23 13:26, David Marchand wrote: > > At some point in OVS history, some virtio features were announced as > > supported (ECN and UFO virtio features). > > > > The userspace TSO code, which has been added later, does not support > > those features and tries to disable them. > > > > This breaks OVS upgrades: if an existing VM already negotiated such > > features, their lack on reconnection to an upgraded OVS triggers a > > vhost socket disconnection by Qemu. > > This results in an endless loop because Qemu then retries with the same > > set of virtio features. > > > > This patch proposes to try and detect those vhost socket disconnection > > and fallback restoring the old virtio features (and disabling TSO for this > > vhost port). > > I suppose, even if we can't fix live migration case, this patch sill > makes some sense, right? > > We will not be able to enable TSO by default though without some extra > work, i.e. adding some knobs...
Sorry, I have been thinking about the live migration case and did not reply yet on the other thread (well, I have no solution for it). Handling a simple case of upgrading OVS could still be handled with this current patch. So I think it still makes sense to take this patch. > > Some comments inline. > > > > > Signed-off-by: David Marchand <david.march...@redhat.com> > > Acked-by: Mike Pattrick <m...@redhat.com> > > --- > > Note: resending this v4 for CI (I added Mike ack). > > > > Changelog since v3: > > - updated documentation now that the interface offloads status is reported > > in ovsdb, > > - fixed one coding style issue, > > > > Changelog since v2: > > - reported workaround presence in the ovsdb port status field and > > updated documentation accordingly, > > - tried to use "better" names, to distinguish ECN virtio feature from > > TSO OVS netdev feature, > > > > Changelog since v1: > > - added a note in the documentation, > > - fixed vhost unregister trigger (so that both disabling and re-enabling > > TSO is handled), > > - cleared netdev features when disabling TSO, > > - changed level and ratelimited log message on vhost socket disconnect, > > > > --- > > Documentation/topics/userspace-tso.rst | 30 ++++++++- > > lib/netdev-dpdk.c | 85 +++++++++++++++++++++++++- > > 2 files changed, 111 insertions(+), 4 deletions(-) > > > > diff --git a/Documentation/topics/userspace-tso.rst > > b/Documentation/topics/userspace-tso.rst > > index 5a43c2e86b..d4111f1ab2 100644 > > --- a/Documentation/topics/userspace-tso.rst > > +++ b/Documentation/topics/userspace-tso.rst > > @@ -68,7 +68,7 @@ as follows. > > connection is established, `TSO` is thus advertised to the guest as an > > available feature: > > > > -QEMU Command Line Parameter:: > > +1. QEMU Command Line Parameter:: > > If you want to fix the numbered list, we should also fix the second point. > It is broken in two lines incorrectly (indentation is wrond), so it's not > counted as part of the numbered list, while this one will. Mm, not easy to notice on the generated documentation. I'll fix it too. > > > > > $ sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \ > > ... > > @@ -83,6 +83,34 @@ used to enable same:: > > $ ethtool -K eth0 tso on > > $ ethtool -k eth0 > > > > +**Note:** Enabling this feature impacts the virtio features exposed by the > > DPDK > > +vHost User backend to a guest. If a guest was already connected to OvS > > before > > +enabling TSO and restarting OvS, this guest ports won't have TSO > > available:: > > + > > + $ ovs-vsctl get interface vhost0 status | grep -o '[^ ]*_offload=[^ > > ]*"' > > + tx_ip_csum_offload="true" > > + tx_sctp_csum_offload="true" > > + tx_tcp_csum_offload="true" > > + tx_tcp_seg_offload="false" > > + tx_udp_csum_offload="true" > > + > > +To help diagnose the issue, those ports have some additional information in > > +their status field in ovsdb:: > > + > > + $ ovs-vsctl get interface vhost0 status | grep -o 'disabled_tso=[^ ]*"' > > + disabled_tso="true" > > + > > +To restore TSO for this guest ports, this guest QEMU process must be > > stopped, > > +then started again. OvS will then report:: > > + > > + $ ovs-vsctl get interface vhost0 status | grep -o '[^ ]*_offload=[^ > > ]*"' > > + tx_ip_csum_offload="true" > > + tx_sctp_csum_offload="true" > > + tx_tcp_csum_offload="true" > > + tx_tcp_seg_offload="true" > > + tx_udp_csum_offload="true" > > + $ ovs-vsctl get interface vhost0 status | grep -o 'disabled_tso=[^ ]*"' > > + > > ~~~~~~~~~~~ > > Limitations > > ~~~~~~~~~~~ > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index c6b41e8365..2dbb8b42e3 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -473,6 +473,15 @@ struct netdev_dpdk { > > /* True if vHost device is 'up' and has been reconfigured at least > > once */ > > bool vhost_reconfigured; > > > > + /* Set on driver start (which means after a vHost connection is > > + * accepted), and cleared when the vHost device gets configured. */ > > + bool vhost_initial_config; > > + > > + /* Set on disconnection if an initial configuration did not finish. > > + * This triggers a workaround for Virtio features negotiation, that > > + * makes TSO unavailable. */ > > + bool vhost_workaround_enable_ecn; > > + > > atomic_uint8_t vhost_tx_retries_max; > > /* 2 pad bytes here. */ > > These bytes are consumed now. Argh, I had made a mental note to fix this. > > > ); > > @@ -1359,6 +1368,7 @@ common_construct(struct netdev *netdev, dpdk_port_t > > port_no, > > dev->requested_lsc_interrupt_mode = 0; > > ovsrcu_index_init(&dev->vid, -1); > > dev->vhost_reconfigured = false; > > + dev->vhost_initial_config = false; > > dev->attached = false; > > dev->started = false; > > dev->reset_needed = false; > > @@ -3883,6 +3893,10 @@ netdev_dpdk_vhost_user_get_status(const struct > > netdev *netdev, > > xasprintf("%d", vring.size)); > > } > > > > + if (dev->vhost_workaround_enable_ecn) { > > + smap_add_format(args, "disabled_tso", "true"); > > I'm not sure if it's a good name, but I'm not sure how the good one > should look like either. Maybe "userspace-tso": "disabled" ? > We can put any string as a value. Either we go with your proposal, or we could add a mention in the documentation to check if the virtio features ECN bit is present. Something like: # if [ $(($(ovs-vsctl get interface vhost0 status:features | sed 's/"//g') & (1 << 13))) != 0 ]; then echo KO; else echo ok; fi This is not user friendly, but I don't think we really want users to start relying on a specific string for this ugly situation. WDYT? > > > + } > > + > > ovs_mutex_unlock(&dev->mutex); > > return 0; > > } > > @@ -4214,6 +4228,12 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev) > > free(enabled_queues); > > } > > > > +static bool > > +netdev_dpdk_vhost_tso_enabled(struct netdev_dpdk *dev) > > +{ > > + return userspace_tso_enabled() && !dev->vhost_workaround_enable_ecn; > > +} > > + > > /* > > * A new virtio-net device is added to a vhost port. > > */ > > @@ -4256,6 +4276,7 @@ new_device(int vid) > > } else { > > /* Reconfiguration not required. */ > > dev->vhost_reconfigured = true; > > + dev->vhost_initial_config = false; > > So, we're in the new_device() call. That means the guest successfully > negotiated features. Shouldn't we drop vhost_initial_config here > unconditionally? Why we need to wait for reconfiguration? Or does > QEMU disconnect after the new_device() ? Indeed, QEMU does not disconnect because of features once we reach the new_device stage. So it seems ok. > > > } > > > > if (rte_vhost_get_negotiated_features(vid, &features)) { > > @@ -4268,7 +4289,7 @@ new_device(int vid) > > dev->hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD; > > } > > > > - if (userspace_tso_enabled()) { > > + if (netdev_dpdk_vhost_tso_enabled(dev)) { > > if (features & (1ULL << VIRTIO_NET_F_GUEST_TSO4) > > && features & (1ULL << VIRTIO_NET_F_GUEST_TSO6)) { > > > > @@ -4501,6 +4522,21 @@ vring_state_changed(int vid, uint16_t queue_id, int > > enable) > > return 0; > > } > > > > +/* Requires that the 'path' socket had been registered before. */ > > +static bool > > +dpdk_vhost_has_ecn_feature(const char *path) > > +{ > > + /* In the past we had this feature enabled, report it by default. */ > > + bool has_ecn_feature = true; > > + uint64_t driver_features; > > + > > + if (rte_vhost_driver_get_features(path, &driver_features) == 0) { > > + has_ecn_feature = driver_features & (1ULL << > > VIRTIO_NET_F_HOST_ECN); > > + } > > + > > + return has_ecn_feature; > > +} > > + > > static void > > destroy_connection(int vid) > > { > > @@ -4515,6 +4551,7 @@ destroy_connection(int vid) > > ovs_mutex_lock(&dev->mutex); > > if (nullable_string_is_equal(ifname, dev->vhost_id)) { > > uint32_t qp_num = NR_QUEUE; > > + bool failed_without_ecn; > > > > if (netdev_dpdk_get_vid(dev) >= 0) { > > VLOG_ERR("Connection on socket '%s' destroyed while vhost " > > @@ -4528,6 +4565,26 @@ destroy_connection(int vid) > > dev->requested_n_txq = qp_num; > > netdev_request_reconfigure(&dev->up); > > } > > + > > + if (dev->vhost_initial_config) { > > + VLOG_WARN_RL(&rl, "Connection on socket '%s' closed during > > " > > + "initialization.", dev->vhost_id); > > + } > > + > > + failed_without_ecn = dev->vhost_initial_config > > + && !dpdk_vhost_has_ecn_feature(dev->vhost_id); > > + if (failed_without_ecn != dev->vhost_workaround_enable_ecn) { > > + /* Either an early disconnection was detected (and we can > > try > > + * to re-enable ECN for a next connection) or the > > disconnection > > + * does not seem incorrect (and we can try to disable ECN > > for a > > + * next connection). > > + * > > + * In both cases, this vhost socket must be reconfigured > > + * (see netdev_dpdk_vhost_client_reconfigure()). */ > > + dev->vhost_workaround_enable_ecn = failed_without_ecn; > > + netdev_request_reconfigure(&dev->up); > > + } > > If we reset vhost_initial_config unconditionally in new_device(), > vhost_initial_config == true here will mean that the guest, likely, failed > feature negotiation. We can't be sure for 100%, but we can't be sure with > the current code either. So, we could enable workaround in this case. > If vhost_initial_config == false, we already had a successful connection > sometime in the past, so we should not try the workaround. > > Does that make sense? It seems ok but I need to retest. > > > + > > ovs_mutex_unlock(&dev->mutex); > > exists = true; > > break; > > @@ -5431,6 +5488,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) > > > > if (dev->vhost_reconfigured == false) { > > dev->vhost_reconfigured = true; > > + dev->vhost_initial_config = false; > > This will not be needed if we clear in the new_device. Ok. > > > /* Carrier status may need updating. */ > > netdev_change_seq_changed(&dev->up); > > } > > @@ -5458,10 +5516,30 @@ static int > > netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) > > { > > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > + bool unregister = false; > > + bool enable_tso; > > + char *vhost_id; > > int err; > > > > ovs_mutex_lock(&dev->mutex); > > > > + enable_tso = netdev_dpdk_vhost_tso_enabled(dev); > > + if (dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT && dev->vhost_id > > + && enable_tso != !dpdk_vhost_has_ecn_feature(dev->vhost_id)) { > > In context of previous comments, the check should be; > > && dev->vhost_initial_config && dev->vhost_workaround_enable_ecn > > ? For a "registered" vhost-user port, we have two transitions to handle: enabling the workaround (for a "faulty" guest after a first try withouth ECN) and disabling it (after a "faulty" guest, which had successfully configured, is stopped and OVS prepares for the next connection). Checking dev->vhost_initial_config here would prevent from handling the second case. > > We might as well rename it to 'vhost_has_ever_connected' or something > and inverse the value. > > > + > > + dev->vhost_driver_flags &= ~RTE_VHOST_USER_CLIENT; > > + vhost_id = dev->vhost_id; > > + unregister = true; > > + } > > + > > + ovs_mutex_unlock(&dev->mutex); > > + > > + if (unregister) { > > + dpdk_vhost_driver_unregister(dev, vhost_id); > > + } > > + > > + ovs_mutex_lock(&dev->mutex); > > + > > /* Configure vHost client mode if requested and if the following > > criteria > > * are met: > > * 1. Device hasn't been registered yet. > > @@ -5491,7 +5569,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev > > *netdev) > > } > > > > /* Enable External Buffers if TCP Segmentation Offload is enabled. > > */ > > - if (userspace_tso_enabled()) { > > + if (enable_tso) { > > vhost_flags |= RTE_VHOST_USER_EXTBUF_SUPPORT; > > } > > > > @@ -5516,7 +5594,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev > > *netdev) > > goto unlock; > > } > > > > - if (userspace_tso_enabled()) { > > + if (enable_tso) { > > virtio_unsup_features = 1ULL << VIRTIO_NET_F_HOST_ECN > > | 1ULL << VIRTIO_NET_F_HOST_UFO; > > VLOG_DBG("%s: TSO enabled on vhost port", > > @@ -5539,6 +5617,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev > > *netdev) > > goto unlock; > > } > > > > + dev->vhost_initial_config = true; > > err = rte_vhost_driver_start(dev->vhost_id); > > if (err) { > > VLOG_ERR("rte_vhost_driver_start failed for vhost user " > -- David Marchand _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev