On Mon, Sep 12, 2022 at 5:52 AM David Marchand <david.march...@redhat.com>
wrote:

> Hi Flavio,
>
> On Fri, Sep 9, 2022 at 7:58 PM Flavio Leitner <f...@redhat.com> wrote:
> >
> > Thanks for working on this patch!
> >
> > It seems possible to change this patch later when the other TSO series
> > gets merged to disable TSO only on the affected port.
> > Mike, any thoughts?
>
> It should be what this patch already does: disable TSO only for the
> affected port as I mentionned in the commitlog below.
> Can you clarify?


Oops, I should have added more context. The issue is that TSO is
all-or-nothing
because there is no software fallback currently [see
netdev_send_prepare_packet()].
So, if the problem happens and the reconnection with disabled TSO succeeds,
all
TSO packets to that port will be dropped. The question is "should we allow
the update
but run the risk of dropping packets or fail to update?"

The software fallback  is provided in the posted TSO series. In that case,
we can
have a per-port TSO knob because if it gets disabled, the packets will be
segmented
in software and not dropped.
So, the patch makes sense with the posted TSO series applied, but I am not
so sure
with the current code.

What do you think?

Thanks,
fbl


>
> > I made a couple comments below.
> >
> > On Fri, Sep 9, 2022 at 10:57 AM David Marchand <
> david.march...@redhat.com> wrote:
> >>
> >> At some point in OVS history, some virtio features were announced as
> >> supported (ECN and UFO virtio features).
> >>
> >> The userspace TSO code, which has been added later, does not support
> >> those features and tries to disable them.
> >>
> >> This breaks OVS upgrades: if an existing VM already negotiated such
> >> features, their lack on reconnection to an upgraded OVS triggers a
> >> vhost socket disconnection by Qemu.
> >> This results in an endless loop because Qemu then retries with the same
> >> set of virtio features.
> >>
> >> This patch proposes to try and detect those vhost socket disconnection
> >> and fallback restoring the old virtio features (and disabling TSO for
> this
> >> vhost port).
>
> ^^^
>
>
> >>
> >> Signed-off-by: David Marchand <david.march...@redhat.com>
> >> ---
> >>  lib/netdev-dpdk.c | 52 +++++++++++++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 50 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> index 0dd655507b..13d7ed3d62 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -465,6 +465,15 @@ struct netdev_dpdk {
> >>          /* True if vHost device is 'up' and has been reconfigured at
> least once */
> >>          bool vhost_reconfigured;
> >>
> >> +        /* Set on driver start (which means after a vHost connection is
> >> +         * accepted), and cleared when the vHost device gets
> configured. */
> >> +        bool vhost_initial_config;
> >> +
> >> +        /* Set on disconnection if an initial configuration did not
> finish.
> >> +         * This triggers a workaround for Virtio features negotiation,
> that
> >> +         * makes TSO unavailable. */
> >> +        bool vhost_workaround_disable_tso;
> >> +
> >>          atomic_uint8_t vhost_tx_retries_max;
> >>          /* 2 pad bytes here. */
> >>      );
> >> @@ -1293,6 +1302,7 @@ common_construct(struct netdev *netdev,
> dpdk_port_t port_no,
> >>      dev->requested_lsc_interrupt_mode = 0;
> >>      ovsrcu_index_init(&dev->vid, -1);
> >>      dev->vhost_reconfigured = false;
> >> +    dev->vhost_initial_config = false;
> >>      dev->attached = false;
> >>      dev->started = false;
> >>      dev->reset_needed = false;
> >> @@ -3986,6 +3996,7 @@ new_device(int vid)
> >>              } else {
> >>                  /* Reconfiguration not required. */
> >>                  dev->vhost_reconfigured = true;
> >> +                dev->vhost_initial_config = false;
> >>              }
> >>
> >>              ovsrcu_index_set(&dev->vid, vid);
> >> @@ -4154,6 +4165,16 @@ destroy_connection(int vid)
> >>                  dev->requested_n_txq = qp_num;
> >>                  netdev_request_reconfigure(&dev->up);
> >>              }
> >> +
> >> +            if (dev->vhost_initial_config) {
> >> +                VLOG_ERR("Connection on socket '%s' seems
> prematurately "
> >> +                         "closed.", dev->vhost_id);
> >
> >
> > Perhaps use a message more specific like below?
> > "Connection on socket '%s' closed during initialization."
> > Or change to below if that makes sense:
> > "Connection on socket '%s' closed during feature negotiation."
>
> Yes, worth rewording.
>
> I'll adjust the log level too: "systemctl restart openvswitch"
> complains about catching an error in logs otherwise.
> It is also worth adding some ratelimit on it.
>
>
> >
> >
> >>
> >> +                dev->vhost_workaround_disable_tso = true;
> >> +                netdev_request_reconfigure(&dev->up);
> >> +            } else {
> >> +                dev->vhost_workaround_disable_tso = false;
> >> +            }
> >> +
> >>              ovs_mutex_unlock(&dev->mutex);
> >>              exists = true;
> >>              break;
> >> @@ -5058,6 +5079,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk
> *dev)
> >>
> >>          if (dev->vhost_reconfigured == false) {
> >>              dev->vhost_reconfigured = true;
> >> +            dev->vhost_initial_config = false;
> >>              /* Carrier status may need updating. */
> >>              netdev_change_seq_changed(&dev->up);
> >>          }
> >> @@ -5086,6 +5108,31 @@ netdev_dpdk_vhost_client_reconfigure(struct
> netdev *netdev)
> >>      int err;
> >>      uint64_t vhost_flags = 0;
> >>      uint64_t vhost_unsup_flags;
> >> +    bool tso_enabled = userspace_tso_enabled();
> >> +
> >> +    if (tso_enabled) {
> >> +        bool unregister;
> >> +        char *vhost_id;
> >> +
> >> +        ovs_mutex_lock(&dev->mutex);
> >> +        unregister = dev->vhost_id != NULL;
> >> +        unregister &= dev->vhost_workaround_disable_tso;
> >> +        if (unregister) {
> >> +            /* There was an issue with a previous connection that did
> not
> >> +             * finish initialising, one common reason is virtio
> features
> >> +             * negotiation. Disable TSO as a workaround. */
> >> +            tso_enabled = false;
> >> +            dev->vhost_driver_flags &= ~RTE_VHOST_USER_CLIENT;
> >> +            vhost_id = dev->vhost_id;
>
> Eeeeouch, it is not the right patch: it will loop on unregistering and
> never finish the port configuration.
> This will be fixed in v2.
>
> While testing a bit more... I'll try to make this mechanism more
> robust too (avoiding unneeded reconfigure calls).
>
>
> >> +            VLOG_WARN("Disabling TSO for %s as a workaround because of
> "
> >> +                      "previous connection drop.", dev->up.name);
> >
> >
> > Should we update Documentation/topics/userspace-tso.rst with
> > this new condition since it is external and visible to the users?
>
> +1
>
>
> --
> David Marchand
>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to