Hi

On Tue, Apr 30, 2024 at 4:31 PM Fiona Ebner <f.eb...@proxmox.com> wrote:
>
> Am 12.03.24 um 15:02 schrieb marcandre.lur...@redhat.com:
> > From: Marc-André Lureau <marcandre.lur...@redhat.com>
> >
> > The current post-loading code for scanout has a FIXME: it doesn't take
> > the resource region/rect into account. But there is more, when adding
> > blob migration support in commit f66767f75c9, I didn't realize that blob
> > resources could be used for scanouts. This situationn leads to a crash
> > during post-load, as they don't have an associated res->image.
> >
> > virtio_gpu_do_set_scanout() handle all cases, but requires the
> > associated virtio_gpu_framebuffer, which is currently not saved during
> > migration.
> >
> > Add a v2 of "virtio-gpu-one-scanout" with the framebuffer fields, so we
> > can restore blob scanouts, as well as fixing the existing FIXME.
> >
> > Signed-off-by: Marc-André Lureau <marcandre.lur...@redhat.com>
> > Reviewed-by: Sebastian Ott <seb...@redhat.com>
>
> Hi,
> unfortunately, this broke migration from pre-9.0 to 9.0:
>
> > vmstate_load_state_field virtio-gpu:virtio-gpu
> > vmstate_load_state_field virtio-gpu-scanouts:parent_obj.enable
> > vmstate_load_state_field virtio-gpu-scanouts:parent_obj.conf.max_outputs
> > vmstate_load_state_field virtio-gpu-scanouts:parent_obj.scanout
> > vmstate_load_state_field virtio-gpu-one-scanout:resource_id
> > vmstate_load_state_field virtio-gpu-one-scanout:width
> > vmstate_load_state_field virtio-gpu-one-scanout:height
> > vmstate_load_state_field virtio-gpu-one-scanout:x
> > vmstate_load_state_field virtio-gpu-one-scanout:y
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.resource_id
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.hot_x
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.hot_y
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.pos.x
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.pos.y
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.format
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.bytes_pp
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.width
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.height
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.stride
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.offset
> > qemu-system-x86_64: Missing section footer for 0000:00:02.0/virtio-gpu
> > qemu-system-x86_64: Error -22 while loading VM state
>
> It wrongly tries to load the fb fields even though they should be
> guarded by version 2.
>
> Looking at it with GDB, in vmstate_load_state(), when we come to
> field->name == parent_obj.scanout, the
>
> >                 } else if (field->flags & VMS_STRUCT) {
> >                     ret = vmstate_load_state(f, field->vmsd, curr_elem,
> >                                              field->vmsd->version_id);
>
> branch will be taken and suddenly we'll have a call to
> vmstate_load_state() for vmsd==vmstate_virtio_gpu_scanout with
> version_id==2 rather than version_id==1, because that is
> field->vmsd->version_id (i.e. the .version_id in VMStateDescription
> vmstate_virtio_gpu_scanout).
>
> Would it have been necessary to version the VMStateDescription
> vmstate_virtio_gpu_scanouts too using VMS_VSTRUCT (or am I
> misinterpreting the use case for that)?
>

Sigh.. this is embarrassing. This patch didn't receive enough testing
and breaks pre-9.0/9.0 forward and backward migrations.

I think it would be reasonable to revert the patch, but it's already
in 9.0. Is that acceptable?

There are various issues about versioning:
- VMStateDescription (vmsd) version_id are not in the stream (only
start & full section headers), so any nested version bump should
somehow be bubbled up/down..
- machine versions don't have specific vmsd version_id associations
(that I can see, or is there any way to do that?)
- virtio-gpu uses even lower-level vmstate API and hardcodes version
to 1 in general

Ideally, I wished adding versionized fields the way done in this patch
would simply work, but this is really, sadly, not supported atm.

And it would be great to have some testing for back/forward migrations.

Any thoughts, corrections, encouragement perhaps? :)

thanks


--
Marc-André Lureau

Reply via email to