Hi:

Thanks for unfolding your idea. The picture is clearer to me now. I didn't 
realize that you also want to support cross hardware migration. Well, I thought 
for a while, the cross hardware migration might be not popular in vGPU case but 
could be quite popular in other mdev cases.

Let me continue my summary:

Mdev dev type has already included a parent driver name/a group name/physical 
device version/configuration type. For example i915-GVTg_V5_4. The driver name 
and the group name could already distinguish the vendor and the product between 
different mdevs, e.g. between Intel and Nvidia, between vGPU or vOther.

Each device provides a collection of the version of device state of data stream 
in a preferred order in a mdev type, as newer version of device state might 
contains more information which might help on performances. 

Let's say a new device N and an old device O, they both support mdev_type M.

For example:
Device N is newer and supports the versions of device state: [ 6.3  6.2 .6.1 ] 
in mdev type M
Device O is older and supports the versions of device state: [ 5.3 5.2 5.1 ] in 
mdev type M

- Version scheme of device state in backwards compatibility case: Migrate a VM 
from a VM with device O to a VM with device N, the mdev type is M.

Device N: [ 6.3 6.2 6.1 5.3 ] in M
Device O: [ 5.3 5.2 5.1 ] in M
Version used in migration: 5.3
The new device directly supports mdev_type M with the preferred version on 
Device O. Good, best situation.

Device N: [ 6.3 6.2 6.1 5.2 ] in M
Device O: [ 5.3 5.2 5.1 ] in M
Version used in migration: 5.2
The new device supports mdev_type M, but not the preferred version. After the 
migration, the vendor driver might have to disable some features which is not 
mentioned in 5.2 device state. But this totally depends on the vendor driver. 
If user wish to achieve the best experience, he should update the vendor driver 
in device N, which supports the preferred version on device O.

Device N: [ 6.3 6.2 6.1 ] in M
Device O: [ 5.3 5.2 5.1 ] in M
Version used in migration: None
No version is matched. Migration would fail. User should update the vendor 
driver on device N and device O.

- Version scheme of device state in forwards compatibility case: Migrate a VM 
from a VM with N to a VM with device O, the mdev type is M.

Device N: [ 6.3 6.2 .6.1 ] in M
Device O: [ 5.3 5.2 5.1 ] in M, but the user updates the vendor driver on 
device O. Now device O could support [ 5.3 5.2 5.1 6.1 ] (As an old device, the 
Device O still prefers version 5.3)
Version used in migration: 6.1
As the new device states is going to migrate to an old device, the vendor 
driver on old device might have to specially dealing with the new version of 
device state. It depends on the vendor driver. 

- QEMU has to figure out and choose the version of device states before reading 
device state from the region. (Perhaps we can put the option of selection in 
the control part of the region as well)
- Libvirt will check if there is any match of the version in the collection in 
device O and device N before migration.
- Each mdev_type has its own collection of versions. (Device can support 
different versions in different types)
- Better the collection is not a range, better they could be a collection of 
the version strings. (The vendor driver might drop some versions during the 
upgrade since they are not ideal)

That's the picture so far in my mind.

Thanks,
Zhi.

-----Original Message-----
From: Alex Williamson [mailto:alex.william...@redhat.com] 
Sent: Wednesday, August 1, 2018 8:19 PM
To: Wang, Zhi A <zhi.a.w...@intel.com>
Cc: libvir-list@redhat.com; kwankh...@nvidia.com
Subject: Re: Matching the type of mediated devices in the migration

On Wed, 1 Aug 2018 10:22:39 +0000
"Wang, Zhi A" <zhi.a.w...@intel.com> wrote:

> Hi:
> 
> Let me summarize the understanding so far I got from the discussions since I 
> am new to this discussion.
> 
> The mdev_type would be a generic stuff since we don't want userspace 
> application to be confused. The example of mdev_type is:

I don't think 'generic' is the right term here.  An mdev_type is a specific 
thing with a defined interface, we just don't define what that interface is.
 
> There are several pre-defined mdev_types with different configurations, let's 
> say MDEV_TYPE A/B/C. The HW 1.0 might only support MDEV_TYPE A, the HW 2.0 
> might support both MDEV_TYPE A and B, but due to HW difference, we cannot 
> migrate MDEV_TYPE A with HW 1.0 to MDEV_TYPE A with HW 2.0 even they have the 
> same MDEV_TYPE. So we need a device version either in the existing MDEV_TYPE 
> or a new sysfs entry.

This is correct, if a foo_type_a is exposed by the same vendor driver on 
different hardware, then the vendor driver is guaranteeing those mdev devices 
are software compatible to the user.  Whether the vendor driver is willing or 
able to support migration across the underlying hardware is a separate 
question.  Migration compatibility and user compatibility are separate features.

> Libvirt would have to check MDEV_TYPE match between source machine and 
> destination machine, then the device version. If any of them is different, 
> then it fails the migration.

Device version of what?  The hardware?  The mdev?  If the device version 
represents a different software interface, then the mdev type should be 
different.  If the device version represents a migration interface 
compatibility then we should define it as such.

> If my above understanding is correct, for VFIO part, we could define the 
> device version as string or a magic number. For example, the vendor mdev 
> driver could pass the vendor/device id and a version to VFIO and VFIO could 
> expose them in the UUID sysfs no matter through a new sysfs entry or through 
> existing MDEV_TYPE.

As above, why are we trying to infer migration compatibility from a device 
version?  What does a device version imply?  What if a vendor driver wants to 
support cross version migration?

> I prefer to expose it in the mdev_supported_types, since the libvirt node 
> device list could extract the device version when it enumerating the host PCI 
> devices or other devices, which supports mdev. We can also put it into UUID 
> sysfs, but the user might have to first logon the target machine and then 
> check the UUID and the device version by themselves, based on current code of 
> libvirty. I suppose all the host device management would be in node device in 
> libvirt, which provides remotely management of the host devices.
> 
> For the format of a device version, an example would be:
> 
> Vendor ID(16bit)Device ID(16bit)Class ID(16bit)Version(16bit)

This is no different from the mdev type, these are user visible attributes of 
the device which should not change without also changing the type.  Why do 
these necessarily convey that the migration stream is also compatible?

> For string version of the device version, I guess we have to define the max 
> string length, which is hard to say yet. Also, a magic number is easier to be 
> put into the state data header during the migration.

I don't think we've accomplished anything with this "device version".
If anything, I think we're looking for a sysfs representation of a migration 
stream version where userspace would match the vendor, type, and migration 
stream version to determine compatibility.  For vendor drivers that want to 
provide backwards compatibility, perhaps an optional minimum migration stream 
version would be provided, which would therefore imply that the format of the 
version can be parsed into a monotonically increasing value so that userspace 
can compare a stream produced by a source to a range supported by a target.  
Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Reply via email to