On Tue, 2016-01-26 at 21:21 +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > Sent: Wednesday, January 27, 2016 12:37 AM > > > > On Tue, 2016-01-26 at 22:05 +0800, Yang Zhang wrote: > > > On 2016/1/26 15:41, Jike Song wrote: > > > > On 01/26/2016 05:30 AM, Alex Williamson wrote: > > > > > [cc +Neo @Nvidia] > > > > > > > > > > Hi Jike, > > > > > > > > > > On Mon, 2016-01-25 at 19:34 +0800, Jike Song wrote: > > > > > > On 01/20/2016 05:05 PM, Tian, Kevin wrote: > > > > > > > I would expect we can spell out next level tasks toward above > > > > > > > direction, upon which Alex can easily judge whether there are > > > > > > > some common VFIO framework changes that he can help :-) > > > > > > > > > > > > Hi Alex, > > > > > > > > > > > > Here is a draft task list after a short discussion w/ Kevin, > > > > > > would you please have a look? > > > > > > > > > > > > Bus Driver > > > > > > > > > > > > { in i915/vgt/xxx.c } > > > > > > > > > > > > - define a subset of vfio_pci interfaces > > > > > > - selective pass-through (say aperture) > > > > > > - trap MMIO: interface w/ QEMU > > > > > > > > > > What's included in the subset? Certainly the bus reset ioctls really > > > > > don't apply, but you'll need to support the full device interface, > > > > > right? That includes the region info ioctl and access through the > > > > > vfio > > > > > device file descriptor as well as the interrupt info and setup ioctls. > > > > > > > > > > > > > [All interfaces I thought are via ioctl:) For other stuff like file > > > > descriptor we'll definitely keep it.] > > > > > > > > The list of ioctl commands provided by vfio_pci: > > > > > > > > - VFIO_DEVICE_GET_PCI_HOT_RESET_INFO > > > > - VFIO_DEVICE_PCI_HOT_RESET > > > > > > > > As you said, above 2 don't apply. But for this: > > > > > > > > - VFIO_DEVICE_RESET > > > > > > > > In my opinion it should be kept, no matter what will be provided in > > > > the bus driver. > > > > > > > > - VFIO_PCI_ROM_REGION_INDEX > > > > - VFIO_PCI_VGA_REGION_INDEX > > > > > > > > I suppose above 2 don't apply neither? For a vgpu we don't provide a > > > > ROM BAR or VGA region. > > > > > > > > - VFIO_DEVICE_GET_INFO > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > - VFIO_DEVICE_GET_IRQ_INFO > > > > - VFIO_DEVICE_SET_IRQS > > > > > > > > Above 4 are needed of course. > > > > > > > > We will need to extend: > > > > > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > > > > > > > > > a) adding a flag: DONT_MAP. For example, the MMIO of vgpu > > > > should be trapped instead of being mmap-ed. > > > > > > I may not in the context, but i am curious how to handle the DONT_MAP in > > > vfio driver? Since there are no real MMIO maps into the region and i > > > suppose the access to the region should be handled by vgpu in i915 > > > driver, but currently most of the mmio accesses are handled by Qemu. > > > > VFIO supports the following region attributes: > > > > #define VFIO_REGION_INFO_FLAG_READ (1 << 0) /* Region supports read */ > > #define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports write */ > > #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */ > > > > If MMAP is not set, then the QEMU driver will do pread and/or pwrite to > > the specified offsets of the device file descriptor, depending on what > > accesses are supported. This is all reported through the REGION_INFO > > ioctl for a given index. If mmap is supported, the VM will have direct > > access to the area, without faulting to KVM other than to populate the > > mapping. Without mmap support, a VM MMIO access traps into KVM, which > > returns out to QEMU to service the request, which then finds the > > MemoryRegion serviced through vfio, which will then perform a > > pread/pwrite through to the kernel vfio bus driver to handle the > > access. Thanks, > > > > Today KVMGT (not using VFIO yet) registers I/O emulation callbacks to > KVM, so VM MMIO access will be forwarded to KVMGT directly for > emulation in kernel. If we reuse above R/W flags, the whole emulation > path would be unnecessarily long with obvious performance impact. We > either need a new flag here to indicate in-kernel emulation (bias from > passthrough support), or just hide the region alternatively (let KVMGT > to handle I/O emulation itself like today).
That sounds like a future optimization TBH. There's very strict layering between vfio and kvm. Physical device assignment could make use of it as well, avoiding a round trip through userspace when an ioread/write would do. Userspace also needs to orchestrate those kinds of accelerators, there might be cases where userspace wants to see those transactions for debugging or manipulating the device. We can't simply take shortcuts to provide such direct access. Thanks, Alex