Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop

Stefan Hajnoczi Thu, 27 Jul 2023 13:44:10 -0700

On Thu, Jul 27, 2023 at 02:49:04PM +0200, Eugenio Perez Martin wrote:
> On Wed, Jul 26, 2023 at 8:57 AM Hanna Czenczek <hre...@redhat.com> wrote:
> >
> > On 25.07.23 20:53, Eugenio Perez Martin wrote:
> > > On Tue, Jul 25, 2023 at 3:09 PM Hanna Czenczek <hre...@redhat.com> wrote:
> > >> On 25.07.23 12:03, Eugenio Perez Martin wrote:
> > >>> On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hre...@redhat.com> 
> > >>> wrote:
> > >>>> On 24.07.23 17:48, Eugenio Perez Martin wrote:
> > >>>>> On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hre...@redhat.com> 
> > >>>>> wrote:
> > >>>>>> On 21.07.23 17:25, Eugenio Perez Martin wrote:
> > >>>>>>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hre...@redhat.com> 
> > >>>>>>> wrote:
> > >>>>>>>> Move the `suspended` field from vhost_vdpa into the global 
> > >>>>>>>> vhost_dev
> > >>>>>>>> struct, so vhost_dev_stop() can check whether the back-end has been
> > >>>>>>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`.  If it has,
> > >>>>>>>> there is no need to reset it; the reset is just a fall-back to stop
> > >>>>>>>> device operations for back-ends that do not support suspend.
> > >>>>>>>>
> > >>>>>>>> Unfortunately, for vDPA specifically, RESUME is not yet 
> > >>>>>>>> implemented, so
> > >>>>>>>> when the device is re-started, we still have to do the reset to 
> > >>>>>>>> have it
> > >>>>>>>> un-suspend.
> > >>>>>>>>
> > >>>>>>>> Signed-off-by: Hanna Czenczek <hre...@redhat.com>
> > >>>>>>>> ---
> > >>>>>>>>      include/hw/virtio/vhost-vdpa.h |  2 --
> > >>>>>>>>      include/hw/virtio/vhost.h      |  8 ++++++++
> > >>>>>>>>      hw/virtio/vhost-vdpa.c         | 11 +++++++----
> > >>>>>>>>      hw/virtio/vhost.c              |  8 +++++++-
> > >>>>>>>>      4 files changed, 22 insertions(+), 7 deletions(-)
> > >>>>>>>>
> > >>>>>>>> diff --git a/include/hw/virtio/vhost-vdpa.h 
> > >>>>>>>> b/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> index e64bfc7f98..72c3686b7f 100644
> > >>>>>>>> --- a/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> +++ b/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa {
> > >>>>>>>>          bool shadow_vqs_enabled;
> > >>>>>>>>          /* Vdpa must send shadow addresses as IOTLB key for data 
> > >>>>>>>> queues, not GPA */
> > >>>>>>>>          bool shadow_data;
> > >>>>>>>> -    /* Device suspended successfully */
> > >>>>>>>> -    bool suspended;
> > >>>>>>>>          /* IOVA mapping used by the Shadow Virtqueue */
> > >>>>>>>>          VhostIOVATree *iova_tree;
> > >>>>>>>>          GPtrArray *shadow_vqs;
> > >>>>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > >>>>>>>> index 6a173cb9fa..69bf59d630 100644
> > >>>>>>>> --- a/include/hw/virtio/vhost.h
> > >>>>>>>> +++ b/include/hw/virtio/vhost.h
> > >>>>>>>> @@ -120,6 +120,14 @@ struct vhost_dev {
> > >>>>>>>>          uint64_t backend_cap;
> > >>>>>>>>          /* @started: is the vhost device started? */
> > >>>>>>>>          bool started;
> > >>>>>>>> +    /**
> > >>>>>>>> +     * @suspended: Whether the vhost device is currently 
> > >>>>>>>> suspended.  Set
> > >>>>>>>> +     * and reset by implementations (vhost-user, vhost-vdpa, 
> > >>>>>>>> ...), which
> > >>>>>>>> +     * are supposed to automatically suspend/resume in their
> > >>>>>>>> +     * vhost_dev_start handlers as required.  Must also be 
> > >>>>>>>> cleared when
> > >>>>>>>> +     * the device is reset.
> > >>>>>>>> +     */
> > >>>>>>>> +    bool suspended;
> > >>>>>>>>          bool log_enabled;
> > >>>>>>>>          uint64_t log_size;
> > >>>>>>>>          Error *migration_blocker;
> > >>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>>>>>>> index 7b7dee468e..f7fd19a203 100644
> > >>>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > >>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > >>>>>>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct 
> > >>>>>>>> vhost_dev *dev,
> > >>>>>>>>
> > >>>>>>>>      static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> > >>>>>>>>      {
> > >>>>>>>> -    struct vhost_vdpa *v = dev->opaque;
> > >>>>>>>>          int ret;
> > >>>>>>>>          uint8_t status = 0;
> > >>>>>>>>
> > >>>>>>>>          ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, 
> > >>>>>>>> &status);
> > >>>>>>>>          trace_vhost_vdpa_reset_device(dev);
> > >>>>>>>> -    v->suspended = false;
> > >>>>>>>> +    dev->suspended = false;
> > >>>>>>>>          return ret;
> > >>>>>>>>      }
> > >>>>>>>>
> > >>>>>>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct 
> > >>>>>>>> vhost_dev *dev)
> > >>>>>>>>              if (unlikely(r)) {
> > >>>>>>>>                  error_report("Cannot suspend: %s(%d)", 
> > >>>>>>>> g_strerror(errno), errno);
> > >>>>>>>>              } else {
> > >>>>>>>> -            v->suspended = true;
> > >>>>>>>> +            dev->suspended = true;
> > >>>>>>>>                  return;
> > >>>>>>>>              }
> > >>>>>>>>          }
> > >>>>>>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct 
> > >>>>>>>> vhost_dev *dev, bool started)
> > >>>>>>>>                  return -1;
> > >>>>>>>>              }
> > >>>>>>>>              vhost_vdpa_set_vring_ready(dev);
> > >>>>>>>> +        if (dev->suspended) {
> > >>>>>>>> +            /* TODO: When RESUME is available, use it instead of 
> > >>>>>>>> resetting */
> > >>>>>>>> +            vhost_vdpa_reset_status(dev);
> > >>>>>>> How is that we reset the status at each vhost_vdpa_dev_start? That
> > >>>>>>> will clean all the vqs configured, features negotiated, etc. in the
> > >>>>>>> vDPA device. Or am I missing something?
> > >>>>>> What alternative do you propose?  We don’t have RESUME for vDPA in 
> > >>>>>> qemu,
> > >>>>>> but we somehow need to lift the previous SUSPEND so the device will
> > >>>>>> again respond to guest requests, do we not?
> > >>>>>>
> > >>>>> Reset also clears the suspend state in vDPA, and it should be called
> > >>>>> at vhost_dev_stop. So the device should never be in suspended state
> > >>>>> here. Does that solve your concerns?
> > >>>> My intention with this patch was precisely not to reset in
> > >>>> vhost_dev_stop when suspending is supported.  So now I’m more confused
> > >>>> than before.
> > >>>>
> > >>> At this moment, I think that should be focused as an optimization and
> > >>> not to be included in the main series.
> > >> It is absolutely not an optimization but vital for my use case.
> > >> virtiofsd does not currently implement resetting, but if it did (and we
> > >> want this support in the future), resetting it during stop/cont would be
> > >> catastrophic.  There must be a way to have qemu not issue a reset.  This
> > >> patch is the reason why this series exists.
> > >>
> > > Sorry, I see I confused things with the first reply. Let me do a recap.
> > >
> > > If I understand the problem correctly, your use case requires that
> > > qemu does not reset the device before the device state is fetched with
> > > virtio_save in the case of a migration.
> >
> > That is only part of the problem, the bigger picture has nothing to do
> > with migration at all.  The problem is that when the VM is paused
> > (stop), we invoke vhost_dev_stop(), and when it is resumed (cont), we
> > invoke vhost_dev_start().  To me, it therefore sounds absolutely wrong
> > to reset the back-end in either of these functions.  For stateless
> > devices, it was determined to not be an issue (I still find it extremely
> > strange), and as far as I’ve understood, we’ve come to the agreement
> > that it’s basically a fallback for when there is no other way to stop
> > the back-end.  But stateful devices like virtio-fs would be completely
> > broken by resetting them there.
> >
> > Therefore, if virtiofsd did implement reset through SET_STATUS,
> > stop/cont would break it today.  Maybe other vhost-user devices, too,
> > which just implement RESET_OWNER/RESET_DEVICE, which aren’t even called
> > when the device is supposed to be reset in vhost_dev_stop() (patch 6).
> >
> > So not just because of migration, but in general, there must be a way
> > for back-ends to force qemu not to reset them in vhost_dev_start()/stop().
> >
> > Or we stop using vhost_dev_start()/stop() when the VM is paused/resumed
> > (stop/cont).
> >
> 
> Yes, that comes back to the thread [1].
> 
> As a third alternative, you can keep vhost_dev_start and let the
> function check the current state and initialize the device only if
> needed. But you can keep symmetrical functions and call one or another
> at the device's code, of course. Not sure what is cleaner or requires
> less changes.
> 
> > > This is understandable and I
> > > think we have a solution for that: to move the vhost_ops call to
> > > virtio_reset and the end of virtio_save.
> >
> > Why would we reset the device in virtio_save()?
> >
> 
> If the VM continues in the source because of whatever reason,
> vhost_dev_start would expect the device to be clean. You can test it
> with the command "cont" after the LM.
> 
> > > To remove the reset call from
> > > vhost_dev_stop is somehow mandatory, as it is called before
> > > virtio_save.
> > >
> > > But we cannot move to vhost_vdpa_dev_start, as proposed here. The reasons 
> > > are:
> > > * All the features, vq parameters, etc are set before any
> > > vhost_vdpa_dev_start call. To reset at any vhost_vdpa_dev_start would
> > > wipe them.
> > > * The device needs to hold all the resources until it is reset. Things
> > > like descriptor status etc.
> > >
> > > And, regarding the comment "When RESUME is available, use it instead
> > > of resetting", we cannot use resume to replace reset in all cases.
> > > This is because the semantics are different.
> > >
> > > For example, vhost_dev_stop and vhost_dev_start are also called when
> > > the guest reset by itself the device. You can check it rmmoding and
> > > modprobing virtio-net driver, for example. In this case, the driver
> > > expects to find all vqs to start with 0, but the resume must not reset
> > > these indexes.
> >
> > This isn’t quite clear to me.  I understand this to mean that there must
> > be a reset somewhere in vhost_dev_stop() and/or vhost_dev_start().  But
> > above, you proposed moving the reset from vhost_dev_stop() into
> > virtio_reset().  Is virtio_reset() called in addition to
> > vhost_dev_stop() and vhost_dev_start() when the guest driver is changed?
> >
> 
> Right. Maybe another option is virtio_set_status?
> 
> The point is that other parts of qemu / guest trust the device to be
> reset after (current version of) vhost_dev_stop, so if we are going to
> move the reset it must be added to the callers at least. To trace
> these callers is needed, so maybe it is easy to add another function
> (vhost_dev_suspend?), your first alternative.
> 
> > Because we can’t have an always-present reset in vhost_dev_stop() or
> > vhost_dev_start().  It just doesn’t work with stop/cont.  At the same
> > time, I understand that you say we must have it because
> > vhost_dev_{stop,start}() are also used when the guest driver changes.
> > Consequently, it sounds clear to me that using the exact same functions
> > in stop/cont and when the guest driver is unloaded/loaded is and has
> > always been wrong.  Because in stop/cont, the guest driver never
> > changes, so we shouldn’t tell the back-end that it did (by sending
> > SET_STATUS(0)).
> >
> 
> Talking just about vhost_net here, the device dumps all the state
> (like vqs) to qemu in vhost_dev_stop. That is what allows to have, for
> example, an unified code in migrating: qemu only needs to know how to
> migrate its emulated device, and vhost code just writes or read at
> suspend / continue or live migrating. To suspend and continue is the
> same operation actually for vhost_net.


Hi Longpeng,
This discussion made me realize that --device vhost-vdpa-device-pci does
not support stateful vDPA devices.

For example, a vdpa-net device that implements the controlq will lose
state (e.g. packet receive filtering) when the guest is stopped with the
QEMU 'stop' command.

QEMU needs to use the VHOST_VDPA_RESUME ioctl so it can resume stateful
devices instead of resetting them across 'stop'/'cont'.

Whatever solution we agree on in this thread should also work for
vhost-vdpa and solve the issue for --device vhost-vdpa-device-pci.

Stefan

signature.asc
Description: PGP signature

Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop

Reply via email to