On 01/29/2016 03:20 PM, Jike Song wrote: > This discussion becomes a little difficult for a newbie like me :( > > On 01/28/2016 11:23 PM, Alex Williamson wrote: >> On Thu, 2016-01-28 at 14:00 +0800, Jike Song wrote: >>> On 01/28/2016 12:19 AM, Alex Williamson wrote: >>>> On Wed, 2016-01-27 at 13:43 +0800, Jike Song wrote: >>> {snip} >>> >>>>> Had a look at eventfd, I would say yes, technically we are able to >>>>> achieve the goal: introduce a fd, with fop->{read|write} defined in KVM, >>>>> call into vgpu device-model, also an iodev registered for a MMIO GPA >>>>> range to invoke the fop->{read|write}. I just didn't understand why >>>>> userspace can't register an iodev via API directly. >>>> >>>> Please elaborate on how it would work via iodev. >>>> >>> >>> QEMU forwards BAR0 write to the bus driver, in the bus driver, if >>> found that MEM bit is enabled, register an iodev to KVM: with an >>> ops: >>> >>> const struct kvm_io_device_ops trap_mmio_ops = { >>> .read = kvmgt_guest_mmio_read, >>> .write = kvmgt_guest_mmio_write, >>> }; >>> >>> I may not be able to illustrated it clearly with descriptions but this >>> should not be a problem, thanks to your explanation, I can understand >>> and adopt it for KVMGT. >> >> You're still crossing modules with direct callbacks, right? What's the >> advantage versus using the file descriptor + offset approach which could >> offer the same performance and improve KVM overall by creating a new >> option for generically handling MMIO? >> > > Yes, the method I gave above is the current way: calling kvm_io_device_ops > from KVM hypervisor, and then going to vgpu device-model directly. > > From KVMGT's side this is almost the same as what you suggested, I don't > think now we have a problem here. I will adopt your suggestion. > >>>>> Besides, this doesn't necessarily require another thread, right? >>>>> I guess it can be within the VCPU thread? >>>> >>>> I would think so too, the vcpu is blocked on the MMIO access, we should >>>> be able to service it in that context. I hope. >>>> >>> >>> Thanks for confirmation. >>> >>>>> And this brought another question: except the vfio bus drvier and >>>>> iommu backend (and the page_track ulitiy used for guest memory >>>>> write-protection), >>>>> is it KVMGT allowed to call into kvm.ko (or modify)? Though we are >>>>> becoming less and less willing to do that with VFIO, it's still better >>>>> to know that before going wrong. >>>> >>>> kvm and vfio are separate modules, for the most part, they know nothing >>>> about each other and have no hard dependencies between them. We do have >>>> various accelerations we can use to avoid paths through userspace, but >>>> these are all via APIs that are agnostic of the party on the other end. >>>> For example, vfio signals interrups through eventfds and has no concept >>>> of whether that eventfd terminates in userspace or into an irqfd in KVM. >>>> vfio supports direct access to device MMIO regions via mmaps, but vfio >>>> has no idea if that mmap gets directly mapped into a VM address space. >>>> Even with posted interrupts, we've introduced an irq bypass manager >>>> allowing interrupt producers and consumers to register independently to >>>> form a connection without directly knowing anything about the other >>>> module. That sort or proper software layering needs to continue. It >>>> would be wrong for a vfio bus driver to assume KVM is the user and >>>> directly call into KVM interfaces. Thanks, >>>> >>> >>> I understand and agree with your point, it's bad if the bus driver >>> assume KVM is the user and/or call into KVM interfaces. >>> >>> However, the vgpu device-model, in intel case also a part of i915 driver, >>> will always need to call some hypervisor-specific interfaces. >> >> No, think differently. >> >>> For example, when a guest gfx driver submit GPU commands, the device-model >>> may want to scan it for security or whatever-else purpose: >>> >>> - get a GPA (from GPU page tables) >>> - want to read 16 bytes from that GPA >>> - call hypervisor-specific read_gpa() method >>> - for Xen, the GPA belongs to a foreign domain, it must find >>> a way to map & read it - beyond our scope here; >>> - for KVM, the GPA can converted to HVA, copy_from_user (if >>> called from vcpu thread) or access_remote_vm (if called from >>> other threads); >>> >>> Please note that this is not from the vfio bus driver, but from the vgpu >>> device-model; also this is not DMA addr from GPU talbes, but real GPA. >> >> This is exactly why we're proposing that the vfio IOMMU interface be >> used as a database of guest translations. >> The type1 IOMMU model in QEMU >> maps all of guest memory through the IOMMU, in the vGPU model type1 is >> simply collecting these and they map GPA to process virtual memory. > > GPA to HVA mappings are maintained in KVM/QEMU, via memslots. > Do you mean making type1 to duplicate the GPA <-> HVA/HPA translations from > KVM? Even technically this could be done, how to synchronize it with KVM > hypervisor? e.g. What is expected if guest hot-add a memslot? > > What's more, GPA is totally a virtualization term. When VFIO is used for > device assignment, it uses GPA as IOVA, maps it to HPA, that's true. > But for KVMGT, since vGPU doesn't have its own DMA requester ID, VFIO > won't call IOMMU-API, but DMA-API instead. GPAs from different guests > may be identical, while IGD can only have 1 single IOMMU domain ... > > >> When the GPU driver wants to get a GPA, it does so from this database. >> If it wants to read from it, it could get the mm and read from the >> virtual memory or pin the page for a GPA to HPA translation and read >> from the HPA. There is no reason to poke directly through to the >> hypervisor here. Let's design what you need into the vgpu version of >> the type1 IOMMU instead. Thanks, > > For KVM, to access a GPA, having it translated to HVA is enough. > > IIUC this may be the only remaining problem between us: where should > a GPA be translated to HVA, KVM or VFIO? >
Unfortunately it's not the only one. Another example is, device-model may want to write-protect a gfn (RAM). In case that this request goes to VFIO .. how it is supposed to reach KVM MMU? > -- Thanks, Jike