On 6/20/23 13:26, David Marchand wrote:
> At some point in OVS history, some virtio features were announced as
> supported (ECN and UFO virtio features).
> 
> The userspace TSO code, which has been added later, does not support
> those features and tries to disable them.
> 
> This breaks OVS upgrades: if an existing VM already negotiated such
> features, their lack on reconnection to an upgraded OVS triggers a
> vhost socket disconnection by Qemu.
> This results in an endless loop because Qemu then retries with the same
> set of virtio features.
> 
> This patch proposes to try and detect those vhost socket disconnection
> and fallback restoring the old virtio features (and disabling TSO for this
> vhost port).

I suppose, even if we can't fix live migration case, this patch sill
makes some sense, right?

We will not be able to enable TSO by default though without some extra
work, i.e. adding some knobs...

Some comments inline.

> 
> Signed-off-by: David Marchand <david.march...@redhat.com>
> Acked-by: Mike Pattrick <m...@redhat.com>
> ---
> Note: resending this v4 for CI (I added Mike ack).
> 
> Changelog since v3:
> - updated documentation now that the interface offloads status is reported
>   in ovsdb,
> - fixed one coding style issue,
> 
> Changelog since v2:
> - reported workaround presence in the ovsdb port status field and
>   updated documentation accordingly,
> - tried to use "better" names, to distinguish ECN virtio feature from
>   TSO OVS netdev feature, 
> 
> Changelog since v1:
> - added a note in the documentation,
> - fixed vhost unregister trigger (so that both disabling and re-enabling
>   TSO is handled),
> - cleared netdev features when disabling TSO,
> - changed level and ratelimited log message on vhost socket disconnect,
> 
> ---
>  Documentation/topics/userspace-tso.rst | 30 ++++++++-
>  lib/netdev-dpdk.c                      | 85 +++++++++++++++++++++++++-
>  2 files changed, 111 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/topics/userspace-tso.rst 
> b/Documentation/topics/userspace-tso.rst
> index 5a43c2e86b..d4111f1ab2 100644
> --- a/Documentation/topics/userspace-tso.rst
> +++ b/Documentation/topics/userspace-tso.rst
> @@ -68,7 +68,7 @@ as follows.
>  connection is established, `TSO` is thus advertised to the guest as an
>  available feature:
>  
> -QEMU Command Line Parameter::
> +1. QEMU Command Line Parameter::

If you want to fix the numbered list, we should also fix the second point.
It is broken in two lines incorrectly (indentation is wrond), so it's not
counted as part of the numbered list, while this one will.

>  
>      $ sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \
>      ...
> @@ -83,6 +83,34 @@ used to enable same::
>      $ ethtool -K eth0 tso on
>      $ ethtool -k eth0
>  
> +**Note:** Enabling this feature impacts the virtio features exposed by the 
> DPDK
> +vHost User backend to a guest. If a guest was already connected to OvS before
> +enabling TSO and restarting OvS, this guest ports won't have TSO available::
> +
> +    $ ovs-vsctl get interface vhost0 status | grep -o '[^ ]*_offload=[^ ]*"'
> +    tx_ip_csum_offload="true"
> +    tx_sctp_csum_offload="true"
> +    tx_tcp_csum_offload="true"
> +    tx_tcp_seg_offload="false"
> +    tx_udp_csum_offload="true"
> +
> +To help diagnose the issue, those ports have some additional information in
> +their status field in ovsdb::
> +
> +    $ ovs-vsctl get interface vhost0 status | grep -o 'disabled_tso=[^ ]*"'
> +    disabled_tso="true"
> +
> +To restore TSO for this guest ports, this guest QEMU process must be stopped,
> +then started again. OvS will then report::
> +
> +    $ ovs-vsctl get interface vhost0 status | grep -o '[^ ]*_offload=[^ ]*"'
> +    tx_ip_csum_offload="true"
> +    tx_sctp_csum_offload="true"
> +    tx_tcp_csum_offload="true"
> +    tx_tcp_seg_offload="true"
> +    tx_udp_csum_offload="true"
> +    $ ovs-vsctl get interface vhost0 status | grep -o 'disabled_tso=[^ ]*"'
> +
>  ~~~~~~~~~~~
>  Limitations
>  ~~~~~~~~~~~
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c6b41e8365..2dbb8b42e3 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -473,6 +473,15 @@ struct netdev_dpdk {
>          /* True if vHost device is 'up' and has been reconfigured at least 
> once */
>          bool vhost_reconfigured;
>  
> +        /* Set on driver start (which means after a vHost connection is
> +         * accepted), and cleared when the vHost device gets configured. */
> +        bool vhost_initial_config;
> +
> +        /* Set on disconnection if an initial configuration did not finish.
> +         * This triggers a workaround for Virtio features negotiation, that
> +         * makes TSO unavailable. */
> +        bool vhost_workaround_enable_ecn;
> +
>          atomic_uint8_t vhost_tx_retries_max;
>          /* 2 pad bytes here. */

These bytes are consumed now.

>      );
> @@ -1359,6 +1368,7 @@ common_construct(struct netdev *netdev, dpdk_port_t 
> port_no,
>      dev->requested_lsc_interrupt_mode = 0;
>      ovsrcu_index_init(&dev->vid, -1);
>      dev->vhost_reconfigured = false;
> +    dev->vhost_initial_config = false;
>      dev->attached = false;
>      dev->started = false;
>      dev->reset_needed = false;
> @@ -3883,6 +3893,10 @@ netdev_dpdk_vhost_user_get_status(const struct netdev 
> *netdev,
>                          xasprintf("%d", vring.size));
>      }
>  
> +    if (dev->vhost_workaround_enable_ecn) {
> +        smap_add_format(args, "disabled_tso", "true");

I'm not sure if it's a good name, but I'm not sure how the good one
should look like either.  Maybe "userspace-tso": "disabled" ?
We can put any string as a value.

> +    }
> +
>      ovs_mutex_unlock(&dev->mutex);
>      return 0;
>  }
> @@ -4214,6 +4228,12 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev)
>      free(enabled_queues);
>  }
>  
> +static bool
> +netdev_dpdk_vhost_tso_enabled(struct netdev_dpdk *dev)
> +{
> +    return userspace_tso_enabled() && !dev->vhost_workaround_enable_ecn;
> +}
> +
>  /*
>   * A new virtio-net device is added to a vhost port.
>   */
> @@ -4256,6 +4276,7 @@ new_device(int vid)
>              } else {
>                  /* Reconfiguration not required. */
>                  dev->vhost_reconfigured = true;
> +                dev->vhost_initial_config = false;

So, we're in the new_device() call.  That means the guest successfully
negotiated features.  Shouldn't we drop vhost_initial_config here
unconditionally?  Why we need to wait for reconfiguration?  Or does
QEMU disconnect after the new_device() ?

>              }
>  
>              if (rte_vhost_get_negotiated_features(vid, &features)) {
> @@ -4268,7 +4289,7 @@ new_device(int vid)
>                      dev->hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD;
>                  }
>  
> -                if (userspace_tso_enabled()) {
> +                if (netdev_dpdk_vhost_tso_enabled(dev)) {
>                      if (features & (1ULL << VIRTIO_NET_F_GUEST_TSO4)
>                          && features & (1ULL << VIRTIO_NET_F_GUEST_TSO6)) {
>  
> @@ -4501,6 +4522,21 @@ vring_state_changed(int vid, uint16_t queue_id, int 
> enable)
>      return 0;
>  }
>  
> +/* Requires that the 'path' socket had been registered before. */
> +static bool
> +dpdk_vhost_has_ecn_feature(const char *path)
> +{
> +    /* In the past we had this feature enabled, report it by default. */
> +    bool has_ecn_feature = true;
> +    uint64_t driver_features;
> +
> +    if (rte_vhost_driver_get_features(path, &driver_features) == 0) {
> +        has_ecn_feature = driver_features & (1ULL << VIRTIO_NET_F_HOST_ECN);
> +    }
> +
> +    return has_ecn_feature;
> +}
> +
>  static void
>  destroy_connection(int vid)
>  {
> @@ -4515,6 +4551,7 @@ destroy_connection(int vid)
>          ovs_mutex_lock(&dev->mutex);
>          if (nullable_string_is_equal(ifname, dev->vhost_id)) {
>              uint32_t qp_num = NR_QUEUE;
> +            bool failed_without_ecn;
>  
>              if (netdev_dpdk_get_vid(dev) >= 0) {
>                  VLOG_ERR("Connection on socket '%s' destroyed while vhost "
> @@ -4528,6 +4565,26 @@ destroy_connection(int vid)
>                  dev->requested_n_txq = qp_num;
>                  netdev_request_reconfigure(&dev->up);
>              }
> +
> +            if (dev->vhost_initial_config) {
> +                VLOG_WARN_RL(&rl, "Connection on socket '%s' closed during "
> +                             "initialization.", dev->vhost_id);
> +            }
> +
> +            failed_without_ecn = dev->vhost_initial_config
> +                && !dpdk_vhost_has_ecn_feature(dev->vhost_id);
> +            if (failed_without_ecn != dev->vhost_workaround_enable_ecn) {
> +                /* Either an early disconnection was detected (and we can try
> +                 * to re-enable ECN for a next connection) or the 
> disconnection
> +                 * does not seem incorrect (and we can try to disable ECN 
> for a
> +                 * next connection).
> +                 *
> +                 * In both cases, this vhost socket must be reconfigured
> +                 * (see netdev_dpdk_vhost_client_reconfigure()). */
> +                dev->vhost_workaround_enable_ecn = failed_without_ecn;
> +                netdev_request_reconfigure(&dev->up);
> +            }

If we reset vhost_initial_config unconditionally in new_device(),
vhost_initial_config == true here will mean that the guest, likely, failed
feature negotiation.  We can't be sure for 100%, but we can't be sure with
the current code either.  So, we could enable workaround in this case.
If vhost_initial_config == false, we already had a successful connection
sometime in the past, so we should not try the workaround.

Does that make sense?

> +
>              ovs_mutex_unlock(&dev->mutex);
>              exists = true;
>              break;
> @@ -5431,6 +5488,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
>  
>          if (dev->vhost_reconfigured == false) {
>              dev->vhost_reconfigured = true;
> +            dev->vhost_initial_config = false;

This will not be needed if we clear in the new_device.

>              /* Carrier status may need updating. */
>              netdev_change_seq_changed(&dev->up);
>          }
> @@ -5458,10 +5516,30 @@ static int
>  netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    bool unregister = false;
> +    bool enable_tso;
> +    char *vhost_id;
>      int err;
>  
>      ovs_mutex_lock(&dev->mutex);
>  
> +    enable_tso = netdev_dpdk_vhost_tso_enabled(dev);
> +    if (dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT && dev->vhost_id
> +        && enable_tso != !dpdk_vhost_has_ecn_feature(dev->vhost_id)) {

In context of previous comments, the check should be;

           && dev->vhost_initial_config && dev->vhost_workaround_enable_ecn

?

We might as well rename it to 'vhost_has_ever_connected' or something
and inverse the value.

> +
> +        dev->vhost_driver_flags &= ~RTE_VHOST_USER_CLIENT;
> +        vhost_id = dev->vhost_id;
> +        unregister = true;
> +    }
> +
> +    ovs_mutex_unlock(&dev->mutex);
> +
> +    if (unregister) {
> +        dpdk_vhost_driver_unregister(dev, vhost_id);
> +    }
> +
> +    ovs_mutex_lock(&dev->mutex);
> +
>      /* Configure vHost client mode if requested and if the following criteria
>       * are met:
>       *  1. Device hasn't been registered yet.
> @@ -5491,7 +5569,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev 
> *netdev)
>          }
>  
>          /* Enable External Buffers if TCP Segmentation Offload is enabled. */
> -        if (userspace_tso_enabled()) {
> +        if (enable_tso) {
>              vhost_flags |= RTE_VHOST_USER_EXTBUF_SUPPORT;
>          }
>  
> @@ -5516,7 +5594,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev 
> *netdev)
>              goto unlock;
>          }
>  
> -        if (userspace_tso_enabled()) {
> +        if (enable_tso) {
>              virtio_unsup_features = 1ULL << VIRTIO_NET_F_HOST_ECN
>                                      | 1ULL << VIRTIO_NET_F_HOST_UFO;
>              VLOG_DBG("%s: TSO enabled on vhost port",
> @@ -5539,6 +5617,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev 
> *netdev)
>              goto unlock;
>          }
>  
> +        dev->vhost_initial_config = true;
>          err = rte_vhost_driver_start(dev->vhost_id);
>          if (err) {
>              VLOG_ERR("rte_vhost_driver_start failed for vhost user "

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to