On Tue, 2016-01-26 at 21:50 +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > Sent: Wednesday, January 27, 2016 5:44 AM > > > > On Tue, 2016-01-26 at 21:21 +0000, Tian, Kevin wrote: > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > Sent: Wednesday, January 27, 2016 12:37 AM > > > > > > > > On Tue, 2016-01-26 at 22:05 +0800, Yang Zhang wrote: > > > > > On 2016/1/26 15:41, Jike Song wrote: > > > > > > On 01/26/2016 05:30 AM, Alex Williamson wrote: > > > > > > > [cc +Neo @Nvidia] > > > > > > > > > > > > > > Hi Jike, > > > > > > > > > > > > > > On Mon, 2016-01-25 at 19:34 +0800, Jike Song wrote: > > > > > > > > On 01/20/2016 05:05 PM, Tian, Kevin wrote: > > > > > > > > > I would expect we can spell out next level tasks toward above > > > > > > > > > direction, upon which Alex can easily judge whether there are > > > > > > > > > some common VFIO framework changes that he can help :-) > > > > > > > > > > > > > > > > Hi Alex, > > > > > > > > > > > > > > > > Here is a draft task list after a short discussion w/ Kevin, > > > > > > > > would you please have a look? > > > > > > > > > > > > > > > > Bus Driver > > > > > > > > > > > > > > > > { in i915/vgt/xxx.c } > > > > > > > > > > > > > > > > - define a subset of vfio_pci interfaces > > > > > > > > - selective pass-through (say aperture) > > > > > > > > - trap MMIO: interface w/ QEMU > > > > > > > > > > > > > > What's included in the subset? Certainly the bus reset ioctls > > > > > > > really > > > > > > > don't apply, but you'll need to support the full device interface, > > > > > > > right? That includes the region info ioctl and access through > > > > > > > the vfio > > > > > > > device file descriptor as well as the interrupt info and setup > > > > > > > ioctls. > > > > > > > > > > > > > > > > > > > [All interfaces I thought are via ioctl:) For other stuff like file > > > > > > descriptor we'll definitely keep it.] > > > > > > > > > > > > The list of ioctl commands provided by vfio_pci: > > > > > > > > > > > > - VFIO_DEVICE_GET_PCI_HOT_RESET_INFO > > > > > > - VFIO_DEVICE_PCI_HOT_RESET > > > > > > > > > > > > As you said, above 2 don't apply. But for this: > > > > > > > > > > > > - VFIO_DEVICE_RESET > > > > > > > > > > > > In my opinion it should be kept, no matter what will be provided in > > > > > > the bus driver. > > > > > > > > > > > > - VFIO_PCI_ROM_REGION_INDEX > > > > > > - VFIO_PCI_VGA_REGION_INDEX > > > > > > > > > > > > I suppose above 2 don't apply neither? For a vgpu we don't provide a > > > > > > ROM BAR or VGA region. > > > > > > > > > > > > - VFIO_DEVICE_GET_INFO > > > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > > > - VFIO_DEVICE_GET_IRQ_INFO > > > > > > - VFIO_DEVICE_SET_IRQS > > > > > > > > > > > > Above 4 are needed of course. > > > > > > > > > > > > We will need to extend: > > > > > > > > > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > > > > > > > > > > > > > > > a) adding a flag: DONT_MAP. For example, the MMIO of vgpu > > > > > > should be trapped instead of being mmap-ed. > > > > > > > > > > I may not in the context, but i am curious how to handle the DONT_MAP > > > > > in > > > > > vfio driver? Since there are no real MMIO maps into the region and i > > > > > suppose the access to the region should be handled by vgpu in i915 > > > > > driver, but currently most of the mmio accesses are handled by Qemu. > > > > > > > > VFIO supports the following region attributes: > > > > > > > > #define VFIO_REGION_INFO_FLAG_READ (1 << 0) /* Region supports > > > > read */ > > > > #define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports > > > > write */ > > > > #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports > > > > mmap */ > > > > > > > > If MMAP is not set, then the QEMU driver will do pread and/or pwrite to > > > > the specified offsets of the device file descriptor, depending on what > > > > accesses are supported. This is all reported through the REGION_INFO > > > > ioctl for a given index. If mmap is supported, the VM will have direct > > > > access to the area, without faulting to KVM other than to populate the > > > > mapping. Without mmap support, a VM MMIO access traps into KVM, which > > > > returns out to QEMU to service the request, which then finds the > > > > MemoryRegion serviced through vfio, which will then perform a > > > > pread/pwrite through to the kernel vfio bus driver to handle the > > > > access. Thanks, > > > > > > > > > > Today KVMGT (not using VFIO yet) registers I/O emulation callbacks to > > > KVM, so VM MMIO access will be forwarded to KVMGT directly for > > > emulation in kernel. If we reuse above R/W flags, the whole emulation > > > path would be unnecessarily long with obvious performance impact. We > > > either need a new flag here to indicate in-kernel emulation (bias from > > > passthrough support), or just hide the region alternatively (let KVMGT > > > to handle I/O emulation itself like today). > > > > That sounds like a future optimization TBH. There's very strict > > layering between vfio and kvm. Physical device assignment could make > > use of it as well, avoiding a round trip through userspace when an > > ioread/write would do. Userspace also needs to orchestrate those kinds > > of accelerators, there might be cases where userspace wants to see those > > transactions for debugging or manipulating the device. We can't simply > > take shortcuts to provide such direct access. Thanks, > > > > But we have to balance such debugging flexibility and acceptable performance. > To me the latter one is more important otherwise there'd be no real usage > around this technique, while for debugging there are other alternative (e.g. > ftrace) Consider some extreme case with 100k traps/second and then see > how much impact a 2-3x longer emulation path can bring...
Are you jumping to the conclusion that it cannot be done with proper layering in place? Performance is important, but it's not an excuse to abandon designing interfaces between independent components. Thanks, Alex