On Wed, Sep 08, 2021 at 03:41:35PM +0200, Stefano Garzarella wrote:
> On Tue, Sep 07, 2021 at 03:47:56PM +0200, Stefano Garzarella wrote:
> > On Tue, Sep 07, 2021 at 02:22:24PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Sep 07, 2021 at 02:49:35PM +0200, Stefano Garzarella wrote:
> > > > Commit 1e08fd0a46 ("vhost-vsock: SOCK_SEQPACKET feature bit support")
> > > > enabled the SEQPACKET feature bit.
> > > > This commit is released with QEMU 6.1, so if we try to migrate a VM
> > > > where
> > > > the host kernel supports SEQPACKET but machine type version is less than
> > > > 6.1, we get the following errors:
> > > >
> > > > Features 0x130000002 unsupported. Allowed features: 0x179000000
> > > > Failed to load virtio-vhost_vsock:virtio
> > > > error while loading state for instance 0x0 of device
> > > > '0000:00:05.0/virtio-vhost_vsock'
> > > > load of migration failed: Operation not permitted
> > > >
> > > > Let's disable the feature bit for machine types < 6.1, adding a
> > > > `features` field to VHostVSock to simplify the handling of upcoming
> > > > features we will support.
> > >
> > > IIUC, this will still leave migration broken for anyone migrating
> > > a >= 6.1 machine type between a kernel that supports SEQPACKET and
> > > a kernel lacking that, or vica-verca.
> >
> > This should be true for migrating from kernel that supports SEQPACKET to
> > a kernel lacking that.
> >
> > For vice-versa I'm not sure, since vhost_get_features() will disable
> > that feature if the host kernel doesn't support it, and the guest will
> > not have acked it.
>
> I did some testing and the migration is only broken in the case of
> kernel 5.14+ (SEQPACKET supported) -> kernel 5.13 (SEQPACKET not supported).
>
> Vice-versa works well because the feature is not acked.
>
> >
> > >
> > > If a feature is dependant on a host kernel feature we can't turn
> > > that on automatically as part of the machine type, as we need
> > > ABI stability across migration indepdant of kernel version.
> > >
> >
> > How do we typically handle this?
> >
> > I wrongly thought it was an expected behavior that migrating a guest
> > using a vhost device from a new kernel to an old one can fail if not all
> > features are supported.
> >
> > I need to take a look at the other vhost devices.
>
> I took a look at vhost-net and vhost-scsi and we don't seem to handle this
> case. Maybe I'm missing something...
We've never done very well at having a consistent story wrt deps
on kernel features. So I wouldn't be surprised to see differences
or omissions anywhere and people not notice the issue.
> So following your advice, the best thing would be to have this feature
> disabled by default and require the user to enable it explicitly so we are
> sure it is needed. At this point a migration to a kernel that doesn't
> support it is rightly broken.
>
> Or is there something better we can do?
>
> @Michael @Jason any thoughts?
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|