Re: [PATCH v2 4/5] ramfb: make migration conditional

Marc-André Lureau Tue, 03 Oct 2023 01:24:26 -0700

Hi

On Tue, Oct 3, 2023 at 11:43 AM Cédric Le Goater <c...@redhat.com> wrote:
>
> On 10/2/23 22:38, Alex Williamson wrote:
> > On Mon, 2 Oct 2023 21:41:55 +0200
> > Laszlo Ersek <ler...@redhat.com> wrote:
> >
> >> On 10/2/23 21:26, Alex Williamson wrote:
> >>> On Mon, 2 Oct 2023 20:24:11 +0200
> >>> Laszlo Ersek <ler...@redhat.com> wrote:
> >>>
> >>>> On 10/2/23 16:41, Alex Williamson wrote:
> >>>>> On Mon, 2 Oct 2023 15:38:10 +0200
> >>>>> Cédric Le Goater <c...@redhat.com> wrote:
> >>>>>
> >>>>>> On 10/2/23 13:11, marcandre.lur...@redhat.com wrote:
> >>>>>>> From: Marc-André Lureau <marcandre.lur...@redhat.com>
> >>>>>>>
> >>>>>>> RAMFB migration was unsupported until now, let's make it conditional.
> >>>>>>> The following patch will prevent machines <= 8.1 to migrate it.
> >>>>>>>
> >>>>>>> Signed-off-by: Marc-André Lureau <marcandre.lur...@redhat.com>
> >>>>>> Maybe localize the new 'ramfb_migrate' attribute close to 
> >>>>>> 'enable_ramfb'
> >>>>>> in VFIOPCIDevice. Anyhow,
> >>>>>
> >>>>> Shouldn't this actually be tied to whether the device is migratable
> >>>>> (which for GVT-g - the only ramfb user afaik - it's not)?  What does it
> >>>>> mean to have a ramfb-migrate=true property on a device that doesn't
> >>>>> support migration, or false on a device that does support migration.  I
> >>>>> don't understand why this is a user controllable property.  Thanks,
> >>>>
> >>>> The comments in <https://bugzilla.redhat.com/show_bug.cgi?id=1859424>
> >>>> (which are unfortunately not public :/ ) suggest that ramfb migration
> >>>> was simply forgotten when vGPU migration was implemented. So, "now
> >>>> that vGPU migration is done", this should be added.
> >>>>
> >>>> Comment 8 suggests that the following domain XML snippet
> >>>>
> >>>>      <hostdev mode='subsystem' type='mdev' managed='no'
> >>>> model='vfio-pci' display='on' ramfb='on'> <source>
> >>>>          <address uuid='b155147a-663a-4009-ae7f-e9a96805b3ce'/>
> >>>>        </source>
> >>>>        <alias name='ua-b155147a-663a-4009-ae7f-e9a96805b3ce'/>
> >>>>        <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
> >>>> function='0x0'/> </hostdev>
> >>>>
> >>>> is migratable, but the ramfb device malfunctions on the destination
> >>>> host.
> >>>>
> >>>> There's also a huge QEMU cmdline in comment#0 of the bug; I've not
> >>>> tried to read that.
> >>>>
> >>>> AIUI BTW the property is not for the user to control, it's just a
> >>>> compat knob for versioned machine types. AIUI those are usually
> >>>> implemented with such (user-visible / -tweakable) device properties.
> >>>
> >>> If it's not for user control it's unfortunate that we expose it to the
> >>> user at all, but should it at least use the "x-" prefix to indicate that
> >>> it's not intended to be an API?
> >>
> >> I *think* it was your commit db32d0f43839 ("vfio/pci: Add option to
> >> disable GeForce quirks", 2018-02-06) that hda introduced me to the "x-"
> >> prefixed properties!
> >>
> >> For some reason though, machine type compat knobs are never named like
> >> that, AFAIR.
> >
> > Maybe I'm misunderstanding your comment, but it appears quite common to
> > use "x-" prefix things in the compat tables...
> >
> > GlobalProperty hw_compat_8_0[] = {
> >      { "migration", "multifd-flush-after-each-section", "on"},
> >      { TYPE_PCI_DEVICE, "x-pcie-ari-nextfn-1", "on" },
> >      { TYPE_VIRTIO_NET, "host_uso", "off"},
> >      { TYPE_VIRTIO_NET, "guest_uso4", "off"},
> >      { TYPE_VIRTIO_NET, "guest_uso6", "off"},
> > };
> > const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);
> >
> > GlobalProperty hw_compat_7_2[] = {
> >      { "e1000e", "migrate-timadj", "off" },
> >      { "virtio-mem", "x-early-migration", "false" },
> >      { "migration", "x-preempt-pre-7-2", "true" },
> >      { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" },
> > };
> > const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
> > [etc]
> >
> >>> It's still odd to think that we can
> >>> have scenarios of a non-migratable vfio device registering a migratable
> >>> ramfb, and vice versa, but I suppose in the end it doesn't matter.
> >>
> >> I do think it matters! For one, if migration is not possible with
> >> vfio-pci-nohotplug, then how can QE (or anyone else) *test* the patch
> >> (i.e. that it makes a difference)? In that case, the ramfb_setup() call
> >> from vfio-pci-nohotplug should just open-code "false" for the
> >> "migratable" parameter.
> >
> > Some vfio devices support migration, most don't.  I was thinking
> > ramfb_setup might be called with something like:
> >
> >       (vdev->ramfb_migrate && vdev->enable_migration)
> >
> > so that at least the ramfb migration state matches the device, but I
> > think ultimately it only saves a little bit of overhead in registering
> > the vmstate, either one not supporting migration should block migration.
> >
> > Hmm, since enable_migration is auto/on/off, it seems like device
> > realize should fail if set to 'on' and ramfb_migrate is false.  I think
> > that's the only way the device options don't become self contradictory.
>
> Why isn't VFIODisplay a QOM object ? vfio_display_probe() is more or
> less a realize routine, and we have a reset and finalize handlers for it.
>
> (thinking aloud) the "ramfb-migrate" property could then be moved
> down VFIODisplay, along with the other specific display properties.
> Compatibility could be handled with property aliases. "enable_migration"
> could set "ramfb-migrate".This looks like it would be nice model cleanup.
>
> May be not the right time ?


Yes, I thought about some similar changes (though I am not sure QOM is
necessary).

Now I am trying to test my changes that add a VFIODisplay migration
subsection, but I don't think I have a GVT-g GPU (TGL GT1). When I try
with a random PCI device, I get "VFIO migration is not supported in
kernel". I can try to comment out some code, but that seems hazardous.





--
Marc-André Lureau

Re: [PATCH v2 4/5] ramfb: make migration conditional

Reply via email to