On Fri, Oct 30, 2020 at 9:46 AM Jason Wang <jasow...@redhat.com> wrote: > On 2020/10/30 下午2:21, Stefan Hajnoczi wrote: > > On Fri, Oct 30, 2020 at 3:04 AM Alex Williamson > > <alex.william...@redhat.com> wrote: > >> It's great to revisit ideas, but proclaiming a uAPI is bad solely > >> because the data transfer is opaque, without defining why that's bad, > >> evaluating the feasibility and implementation of defining a well > >> specified data format rather than protocol, including cross-vendor > >> support, or proposing any sort of alternative is not so helpful imo. > > The migration approaches in VFIO and vDPA/vhost were designed for > > different requirements and I think this is why there are different > > perspectives on this. Here is a comparison and how VFIO could be > > extended in the future. I see 3 levels of device state compatibility: > > > > 1. The device cannot save/load state blobs, instead userspace fetches > > and restores specific values of the device's runtime state (e.g. last > > processed ring index). This is the vhost approach. > > > > 2. The device can save/load state in a standard format. This is > > similar to #1 except that there is a single read/write blob interface > > instead of fine-grained get_FOO()/set_FOO() interfaces. This approach > > pushes the migration state parsing into the device so that userspace > > doesn't need knowledge of every device type. With this approach it is > > possible for a device from vendor A to migrate to a device from vendor > > B, as long as they both implement the same standard migration format. > > The limitation of this approach is that vendor-specific state cannot > > be transferred. > > > > 3. The device can save/load opaque blobs. This is the initial VFIO > > approach. > > > I still don't get why it must be opaque.
If the device state format needs to be in the VMM then each device needs explicit enablement in each VMM (QEMU, cloud-hypervisor, etc). Let's invert the question: why does the VMM need to understand the device state of a _passthrough_ device? > > A device from vendor A cannot migrate to a device from > > vendor B because the format is incompatible. This approach works well > > when devices have unique guest-visible hardware interfaces so the > > guest wouldn't be able to handle migrating a device from vendor A to a > > device from vendor B anyway. > > > For VFIO I guess cross vendor live migration can't succeed unless we do > some cheats in device/vendor id. Yes. I haven't looked into the details of PCI (Sub-)Device/Vendor IDs and how to best enable migration but I hope that can be solved. The simplest approach is to override the IDs and make them part of the guest configuration. > For at least virtio, they will still go with virtio/vDPA. The advantages > are: > > 1) virtio/vDPA can serve kernel subsystems which VFIO can't, this is > very important for containers I'm not sure I understand this. If the kernel wants to use the device then it doesn't use VFIO, it runs the kernel driver instead. One part I believe is missing from VFIO/mdev is attaching an mdev device to the kernel. That seems to be an example of the limitation you mentioned. > 2) virtio/vDPA is bus independent, we can present a virtio-mmio device > which is based on vDPA PCI hardware for e.g microvm Yes. This is neat although microvm supports PCI now (https://www.kraxel.org/blog/2020/10/qemu-microvm-acpi/). > I'm not familiar with NVME but they should go with the same way instead > of depending on VFIO. There are pros/cons with both approaches. I'm not even sure all VIRTIO hardware vendors will use vDPA. Two examples: 1. A tiny VMM with strict security requirements. The VFIO approach is less complex because the VMM is much less involved with the device. 2. A vendor shipping a hardware VIRTIO PCI device as a PF - no SR-IOV, no software VFs, just a single instance. A passthrough PCI device is a much simpler way to deliver this device than vDPA + vhost + VMM support. vDPA is very useful but there are situations when the VFIO approach is attractive too. Stefan