On Wed, 2016-01-27 at 09:52 +0800, Yang Zhang wrote:
> On 2016/1/27 6:56, Alex Williamson wrote:
> > On Tue, 2016-01-26 at 22:39 +0000, Tian, Kevin wrote:
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Wednesday, January 27, 2016 6:27 AM
> > > > 
> > > > On Tue, 2016-01-26 at 22:15 +0000, Tian, Kevin wrote:
> > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > Sent: Wednesday, January 27, 2016 6:08 AM
> > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Today KVMGT (not using VFIO yet) registers I/O emulation 
> > > > > > > > > callbacks to
> > > > > > > > > KVM, so VM MMIO access will be forwarded to KVMGT directly for
> > > > > > > > > emulation in kernel. If we reuse above R/W flags, the whole 
> > > > > > > > > emulation
> > > > > > > > > path would be unnecessarily long with obvious performance 
> > > > > > > > > impact. We
> > > > > > > > > either need a new flag here to indicate in-kernel emulation 
> > > > > > > > > (bias from
> > > > > > > > > passthrough support), or just hide the region alternatively 
> > > > > > > > > (let KVMGT
> > > > > > > > > to handle I/O emulation itself like today).
> > > > > > > > 
> > > > > > > > That sounds like a future optimization TBH.  There's very strict
> > > > > > > > layering between vfio and kvm.  Physical device assignment 
> > > > > > > > could make
> > > > > > > > use of it as well, avoiding a round trip through userspace when 
> > > > > > > > an
> > > > > > > > ioread/write would do.  Userspace also needs to orchestrate 
> > > > > > > > those kinds
> > > > > > > > of accelerators, there might be cases where userspace wants to 
> > > > > > > > see those
> > > > > > > > transactions for debugging or manipulating the device.  We 
> > > > > > > > can't simply
> > > > > > > > take shortcuts to provide such direct access.  Thanks,
> > > > > > > > 
> > > > > > > 
> > > > > > > But we have to balance such debugging flexibility and acceptable 
> > > > > > > performance.
> > > > > > > To me the latter one is more important otherwise there'd be no 
> > > > > > > real usage
> > > > > > > around this technique, while for debugging there are other 
> > > > > > > alternative (e.g.
> > > > > > > ftrace) Consider some extreme case with 100k traps/second and 
> > > > > > > then see
> > > > > > > how much impact a 2-3x longer emulation path can bring...
> > > > > > 
> > > > > > Are you jumping to the conclusion that it cannot be done with proper
> > > > > > layering in place?  Performance is important, but it's not an 
> > > > > > excuse to
> > > > > > abandon designing interfaces between independent components.  
> > > > > > Thanks,
> > > > > > 
> > > > > 
> > > > > Two are not controversial. My point is to remove unnecessary long trip
> > > > > as possible. After another thought, yes we can reuse existing 
> > > > > read/write
> > > > > flags:
> > > > >       - KVMGT will expose a private control variable whether in-kernel
> > > > > delivery is required;
> > > > 
> > > > But in-kernel delivery is never *required*.  Wouldn't userspace want to
> > > > deliver in-kernel any time it possibly could?
> > > > 
> > > > >       - when the variable is true, KVMGT will register in-kernel MMIO
> > > > > emulation callbacks then VM MMIO request will be delivered to KVMGT
> > > > > directly;
> > > > >       - when the variable is false, KVMGT will not register anything.
> > > > > VM MMIO request will then be delivered to Qemu and then ioread/write
> > > > > will be used to finally reach KVMGT emulation logic;
> > > > 
> > > > No, that means the interface is entirely dependent on a backdoor through
> > > > KVM.  Why can't userspace (QEMU) do something like register an MMIO
> > > > region with KVM handled via a provided file descriptor and offset,
> > > > couldn't KVM then call the file ops without a kernel exit?  Thanks,
> > > > 
> > > 
> > > Could you elaborate this thought? If it can achieve the purpose w/o
> > > a kernel exit definitely we can adapt to it. :-)
> > 
> > I only thought of it when replying to the last email and have been doing
> > some research, but we already do quite a bit of synchronization through
> > file descriptors.  The kvm-vfio pseudo device uses a group file
> > descriptor to ensure a user has access to a group, allowing some degree
> > of interaction between modules.  Eventfds and irqfds already make use of
> > f_ops on file descriptors to poke data.  So, if KVM had information that
> > an MMIO region was backed by a file descriptor for which it already has
> > a reference via fdget() (and verified access rights and whatnot), then
> > it ought to be a simple matter to get to f_ops->read/write knowing the
> > base offset of that MMIO region.  Perhaps it could even simply use
> > __vfs_read/write().  Then we've got a proper reference to the file
> > descriptor for ownership purposes and we've transparently jumped across
> > modules without any implicit knowledge of the other end.  Could it work?
> > Thanks,
> 
> ioeventfd is a good example.
> As i known, all access to the MMIO of IGD is trapped into kernel. Also, 
> the pci config space is emulated by Qemu. Same the for VGA, which is 
> emulated too. I guest interrupt also is emulated(This means we cannot 
> benifit from VT-d pi). The most important is that KVMGT doesn't required 
> hardware IOMMU. As we known, VFIO is for the direct device assignment, 
> but most of thing for KVMGT are emulated, why we should use VFIO for it?

What is a vGPU?  It's a PCI device exposed to QEMU that needs to support
emulated and direct MMIO paths into the kernel driver, PCI config space
emulation, and various interrupt models.  What does the VFIO API
provide?  Exactly those things.

Yes, vfio is typically used for assigning physical devices, but it has a
very modular infrastructure which allows sub-drivers to be written that
can do much more complicated and device specific passthrough and
emulation in the kernel.  vfio typically works with a platform IOMMU,
but any devices that can provide isolation and translation services will
work.  In the case of graphics cards, there's effectively already an
IOMMU on the device, in the case of vGPU, this is mediated through the
physical GPU driver.

So what's the benefit?  VFIO already has the IOMMU and device access
interfaces, is already supported by QEMU and libvirt, and re-using these
for vGPU avoids a proliferation of new vendor specific devices, each
with their own implementation of these interfaces and each requiring
unique libvirt and upper level management device specific knowledge.
That's why.  Thanks,

Alex


Reply via email to