Hi: Let me summarize the understanding so far I got from the discussions since I am new to this discussion.
The mdev_type would be a generic stuff since we don't want userspace application to be confused. The example of mdev_type is: There are several pre-defined mdev_types with different configurations, let's say MDEV_TYPE A/B/C. The HW 1.0 might only support MDEV_TYPE A, the HW 2.0 might support both MDEV_TYPE A and B, but due to HW difference, we cannot migrate MDEV_TYPE A with HW 1.0 to MDEV_TYPE A with HW 2.0 even they have the same MDEV_TYPE. So we need a device version either in the existing MDEV_TYPE or a new sysfs entry. Libvirt would have to check MDEV_TYPE match between source machine and destination machine, then the device version. If any of them is different, then it fails the migration. If my above understanding is correct, for VFIO part, we could define the device version as string or a magic number. For example, the vendor mdev driver could pass the vendor/device id and a version to VFIO and VFIO could expose them in the UUID sysfs no matter through a new sysfs entry or through existing MDEV_TYPE. I prefer to expose it in the mdev_supported_types, since the libvirt node device list could extract the device version when it enumerating the host PCI devices or other devices, which supports mdev. We can also put it into UUID sysfs, but the user might have to first logon the target machine and then check the UUID and the device version by themselves, based on current code of libvirty. I suppose all the host device management would be in node device in libvirt, which provides remotely management of the host devices. For the format of a device version, an example would be: Vendor ID(16bit)Device ID(16bit)Class ID(16bit)Version(16bit) For string version of the device version, I guess we have to define the max string length, which is hard to say yet. Also, a magic number is easier to be put into the state data header during the migration. Thanks, Zhi. -----Original Message----- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Tuesday, July 31, 2018 12:49 AM To: Wang, Zhi A <zhi.a.w...@intel.com> Cc: libvir-list@redhat.com; kwankh...@nvidia.com Subject: Re: Matching the type of mediated devices in the migration On Tue, 31 Jul 2018 04:05:11 +0800 Zhi Wang <zhi.a.w...@intel.com> wrote: > On 07/30/18 23:56, Alex Williamson wrote: > > On Sun, 29 Jul 2018 21:19:41 +0000 > > "Wang, Zhi A" <zhi.a.w...@intel.com> wrote: > > > >> BACKGROUND > >> > >> As the live migration of mdev is going to be supported in VFIO, a scheme > >> of deciding if a mdev could be migratable between the source machine and > >> the destination machine is needed. Mostly, this email is going to discuss > >> a possible solution which needs fewer modifications of libvirt/VFIO. > >> > >> The configuration of a mdev is located in the domain XML, which guides > >> libvirt how to find the mdev and generating the command line for QEMU. It > >> basically only includes the UUID of a mdev. The domain XML of the source > >> machine and destination machine are going to be compared before the > >> migration really happens. Each configuration item would be compared and > >> checked by libvirt. If one item of the source machine is different from > >> the item of destination machine, the migration fails. For mdev, there is > >> no any check/match before the migration happens yet. > >> > >> The user could use the node device list of libvirt to list the host > >> devices and see the capabilities of those devices. The current node device > >> code of libvirt has already been able to extract the supported mdev types > >> from a host PCI device, plus some basic information, like max supported > >> mdev instance of a host PCI device. > >> > >> THE SOLUTION > >> > >> To strictly check the mdev type and make sure the migration happens > >> between the compatible mediated devices, three new mandatory elements in > >> the domain XML below the hostdev element would be introduced: > >> > >> vendorid: The vendor ID of the mdev, which comes from the host PCI device. > >> A user could obtain this information from the host PCI device which > >> supports mdev in the node device list. > >> productid: The product ID of the mdev, which also comes from the host PCI > >> device. A user could obtain this information from the same approach above. > >> > > > > The parent of an mdev device is not necessarily a PCI device. > Good point. I didn't get that. > > > >> mdevtype: The type of the mdev. As the creation of the mdev is managed by > >> the user, the user knows the type of the mdev and would be responsible for > >> filling out this information. > >> > >> These three elements are only needed when the device API of a mdev is > >> "vfio-PCI". Take the example of mdev configuration from > >> https://libvirt.org/formatdomain.html to illustrate the modification: > >> > >> <devices> > >> <hostdev mode='subsystem' type='mdev' model='vfio-pci'> > >> <source> > >> <address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'/> > >> <vendorid>0xdead</vendorid> <!-- The VID of the host PCI device > >> which supports this mdev --> > >> <productid>0xbeef</productid> <!-- The PID of the host PCI device > >> which supports this mdev --> > >> <mdevtype>type</mdevtype> <!-- The vendor-specific mdev type string > >> --> > >> </source> > >> </hostdev> > >> > >> With the newly introduced elements above, the flow of the creation of a > >> domain XML with mdev will be like: > >> > >> 1. The user obtains the vendorid/productid from node device list 2. > >> The user fills the vendorid/productid/mdevtype in the domain XML 3. > >> When a migration happens, libvirt check these elements. If one item is > >> different between two domain XML, then migration fails. > > > > I don't see how this solves anything. The vendor and product are > > redundant and specific to PCI hosted mdev devices. These do nothing > > to enhance the definition of an mdev type, where we've decided the > > mdev type is a guest software compatible definition of a device. > > Simply knowing the type doesn't help me know that the state data > > between source and target is compatible. This is the difference > > between knowing I'm migrating from machine 'pc-440fx' to 'pc-440fx' > > versus 'pc-i440fx-2.12' to 'pc-440fx-2.11'. We need somehow to > > define a version of a device, what we consider to be compatible > > versions for migration, and hopefully some standard(ish) mechanism > > libvirt could use to determine this. Thanks, > > > > I see your point. We could combine these stuff together and improve > "mdev" type, not by introducing new stuff to decide the compatibility. > Let me know if I misunderstood. > > I guess you are now talking about "the thing" we should give libvirt. > Are you implying that the mdev type we give in libvirt should be a > string? If we could take the inspiration of PCI device? Like: > > class name - vendor name - product name - version > > mdev type gpu-intel-gen9-11 > gpu-nvidia-grid-11 > > Then every mdev driver needs to fill these information and VFIO could > combine and expose them as the name of folder in mdev_supported_types. > Libvirt could address the mdev type by reading the mdev_type in UUID folder. I don't think this is practical, the mdev vendor driver already guarantees that a given mdev type is software compatible regardless of the underlying hardware or driver version. If it's not compatible in these ways, different mdev types should be used. If we then cross that definition with migration compatibility then the mdev type changes arbitrarily based on the version of the vendor driver in use. How would a user scripts accommodate that a kernel update changes the available mdev types? Also would such a scheme even resolve our problem, for example are vendor drivers going to maintain compatibility with previous versions in their latest driver? Does a version imply that we can only migrate to an identical version or does it imply any newer version? > BTW, > > As far as I read the code, the migration check function would check > quite a lot of things before migration really happens, not only > machine type. > > Mdev is listed as a sub-hierarchy of hostdev in the migration check > function. "hostdev" in the code means "a host device", like a > passthrough PCI device. The function would check the compatibility of > source device and destination device by types. e.g. for PCI > passthrough device, it would check the BDF. Probably an example of how this code has never been used, matching BDF between source and target is pretty much only relevant to the XML, it has nothing to do with the compatibility of the device itself. > For mdev, it doesn't check anything > right now. That's how this idea come out: Let libvirt have something > to check and know if the mdevs between source machine and destination > machine are compatible. > > Simply knowing the type is not enough currently and we need prepare > something to let libvirt check the compatibility. > > For how libvirt could check the compatibility of mdev, the above > investigation might be a hint. It's good that there at least exists some framework for testing device compatibility in libvirt, but we need to take it from the stub it seems to be now for hostdev to something that actually provides some reliability and robustness. I'm also not sure if libvirt is the only place we need to address this, QEMU itself should be able to attach mdev defined meta data to the vmstate for a device. I don't trust vendor drivers enough to let them bury this inside their opaque device state stream. Thanks, Alex -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list