On Thu, 2015-10-22 at 00:52 +0800, Lan Tianyu wrote: > This patchset is Qemu part for live migration support for SRIOV NIC. > kernel part patch information is in the following link. > http://marc.info/?l=kvm&m=144544635330193&w=2 > > > Lan Tianyu (3): > Qemu: Add pci-assign.h to share functions and struct definition with > new file > Qemu: Add post_load_state() to run after restoring CPU state > Qemu: Introduce pci-sriov device type to support VF live migration > > hw/i386/kvm/Makefile.objs | 2 +- > hw/i386/kvm/pci-assign.c | 113 +---------------------- > hw/i386/kvm/pci-assign.h | 109 +++++++++++++++++++++++ > hw/i386/kvm/sriov.c | 213 > ++++++++++++++++++++++++++++++++++++++++++++ > include/migration/vmstate.h | 2 + > migration/savevm.c | 15 ++++ > 6 files changed, 344 insertions(+), 110 deletions(-) > create mode 100644 hw/i386/kvm/pci-assign.h > create mode 100644 hw/i386/kvm/sriov.c >
Hi Lan, Seems like there are a couple immediate problems with this approach. The first is that you're modifying legacy KVM device assignment, which is deprecated upstream and not even enabled by some distros. VFIO is the supported mechanism for doing PCI device assignment now and any features like this need to be added there first. It's not only more secure than legacy KVM device assignment, but it also doesn't limit this to an x86-only solution. Surely you want to support 82599 VF migration on other platforms as well. Using sysfs to interact with the PF is also problematic since that means that libvirt needs to grant qemu access to these files, adding one more layer to the stack. If we were to use VFIO, we could potentially enable this through a save-state region on the device file descriptor and if necessary, virtual interrupt channels for the device as well. This of course implies that the kernel internal channels are made as general as possible in order to support any PF driver. That said, there are some nice features here. Using unused PCI config bytes to communicate with the guest driver and enable guest-based page dirtying is a nice hack. However, if we want to add this capability to other devices, we're not always going to be able to use fixed addresses 0xf0 and 0xf1. I would suggest that we probably want to create a virtual capability in the config space of the VF, perhaps a Vendor Specific capability. Obviously some devices won't have room for a full capability in the standard config space, so we may need to optionally expose it in extended config space. Those device would be limited to only supporting migration in PCI-e configurations in the guest. Also, plenty of devices make use of undefined PCI config space, so we may not be able to simply add a capability to a region we think is unused, maybe it needs to happen through reserved space in another capability or perhaps defining a virtual BAR that unenlightened guest drivers would ignore. The point is that we somehow need to standardize that so that rather than implicitly know that it's at 0xf0/0xf1 on 82599 VFs. Also, I haven't looked at the kernel-side patches yet, but the saved state received from and loaded into the PF driver needs to be versioned and maybe we need some way to know whether versions are compatible. Migration version information is difficult enough for QEMU, it's a completely foreign concept in the kernel. Thanks, Alex