On Tue, Nov 15, 2022 at 09:18:27AM +0100, Christian Borntraeger wrote: > > Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: > > On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: > > > > > > > > > Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: > > > > On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: > > > > > > > > > > > > > > > Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: > > > > > > On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger > > > > > > wrote: > > > > > > > Am 08.11.22 um 10:23 schrieb Alex Bennée: > > > > > > > > The previous fix to virtio_device_started revealed a problem in > > > > > > > > its > > > > > > > > use by both the core and the device code. The core code should > > > > > > > > be able > > > > > > > > to handle the device "starting" while the VM isn't running to > > > > > > > > handle > > > > > > > > the restoration of migration state. To solve this dual use > > > > > > > > introduce a > > > > > > > > new helper for use by the vhost-user backends who all use it to > > > > > > > > feed a > > > > > > > > should_start variable. > > > > > > > > > > > > > > > > We can also pick up a change vhost_user_blk_set_status while we > > > > > > > > are at > > > > > > > > it which follows the same pattern. > > > > > > > > > > > > > > > > Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to > > > > > > > > virtio_device_started) > > > > > > > > Fixes: 27ba7b027f (hw/virtio: add boilerplate for > > > > > > > > vhost-user-gpio device) > > > > > > > > Signed-off-by: Alex Bennée <alex.ben...@linaro.org> > > > > > > > > Cc: "Michael S. Tsirkin" <m...@redhat.com> > > > > > > > > > > > > > > Hmmm, is this > > > > > > > commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 > > > > > > > Author: Alex Bennée <alex.ben...@linaro.org> > > > > > > > AuthorDate: Mon Nov 7 12:14:07 2022 +0000 > > > > > > > Commit: Michael S. Tsirkin <m...@redhat.com> > > > > > > > CommitDate: Mon Nov 7 14:08:18 2022 -0500 > > > > > > > > > > > > > > hw/virtio: introduce virtio_device_should_start > > > > > > > > > > > > > > and older version? > > > > > > > > > > > > This is what got merged: > > > > > > https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org > > > > > > This patch was sent after I merged the RFC. > > > > > > I think the only difference is the commit log but I might be missing > > > > > > something. > > > > > > > > > > > > > This does not seem to fix the regression that I have reported. > > > > > > > > > > > > This was applied on top of 9f6bcfd99f which IIUC does, right? > > > > > > > > > > > > > > > > > > > > > > QEMU master still fails for me for suspend/resume to disk: > > > > > > > > > > #0 0x000003ff8e3980a6 in __pthread_kill_implementation () at > > > > > /lib64/libc.so.6 > > > > > #1 0x000003ff8e348580 in raise () at /lib64/libc.so.6 > > > > > #2 0x000003ff8e32b5c0 in abort () at /lib64/libc.so.6 > > > > > #3 0x000003ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 > > > > > #4 0x000003ff8e340a4e in () at /lib64/libc.so.6 > > > > > #5 0x000002aa1ffa8966 in vhost_vsock_common_pre_save > > > > > (opaque=<optimized out>) at ../hw/virtio/vhost-vsock-common.c:203 > > > > > #6 0x000002aa1fe5e0ee in vmstate_save_state_v > > > > > (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 > > > > > <vmstate_virtio_vhost_vsock>, opaque=0x2aa21bac9f8, > > > > > vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at > > > > > ../migration/vmstate.c:329 > > > > > #7 0x000002aa1fe5ebf8 in vmstate_save_state > > > > > (f=f@entry=0x2aa21bdc170, vmsd=<optimized out>, opaque=<optimized > > > > > out>, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at > > > > > ../migration/vmstate.c:317 > > > > > #8 0x000002aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, > > > > > se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at > > > > > ../migration/savevm.c:908 > > > > > #9 0x000002aa1fe79584 in > > > > > qemu_savevm_state_complete_precopy_non_iterable > > > > > (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, > > > > > inactivate_disks=inactivate_disks@entry=true) > > > > > at ../migration/savevm.c:1393 > > > > > #10 0x000002aa1fe79a96 in qemu_savevm_state_complete_precopy > > > > > (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, > > > > > inactivate_disks=inactivate_disks@entry=true) at > > > > > ../migration/savevm.c:1459 > > > > > #11 0x000002aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at > > > > > ../migration/migration.c:3314 > > > > > #12 migration_iteration_run (s=0x2aa218ef600) at > > > > > ../migration/migration.c:3761 > > > > > #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at > > > > > ../migration/migration.c:3989 > > > > > #14 0x000002aa201f0b8c in qemu_thread_start (args=<optimized out>) at > > > > > ../util/qemu-thread-posix.c:505 > > > > > #15 0x000003ff8e396248 in start_thread () at /lib64/libc.so.6 > > > > > #16 0x000003ff8e41183e in thread_start () at /lib64/libc.so.6 > > > > > > > > > > Michael, your previous branch did work if I recall correctly. > > > > > > > > That one was failing under github CI though (for reasons we didn't > > > > really address, such as disconnect during stop causing a recursive > > > > call to stop, but there you are). > > > Even the double revert of everything? > > > > I don't remember at this point. > > > > > So how do we proceed now? > > > > I'm hopeful Alex will come up with a fix. > > > The initial fix changed to qemu/master does still work for me > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index a973811cbfc6..fb3072838119 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -411,14 +411,14 @@ static inline bool virtio_device_started(VirtIODevice > *vdev, uint8_t status) > */ > static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t > status) > { > - if (vdev->use_started) { > - return vdev->started; > - } > - > if (!vdev->vm_running) { > return false; > } > + if (vdev->use_started) { > + return vdev->started; > + } > + > return status & VIRTIO_CONFIG_S_DRIVER_OK; > }
Hmm this makes sense to me. And with the new API the follout should be minimal. Let's see how it behaves on github. It would be nice to fix the recursive stop problem properly too but I"m not optimistic on that for this release. -- MST