> From: Zhiyuan Lv > Sent: Tuesday, February 02, 2016 3:35 PM > > Hi Gerd/Alex, > > On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote: > > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote: > > > Hi, > > > > > > > > Unfortunately it's not the only one. Another example is, device-model > > > > > may want to write-protect a gfn (RAM). In case that this request goes > > > > > to VFIO .. how it is supposed to reach KVM MMU? > > > > > > > > Well, let's work through the problem. How is the GFN related to the > > > > device? Is this some sort of page table for device mappings with a base > > > > register in the vgpu hardware? > > > > > > IIRC this is needed to make sure the guest can't bypass execbuffer > > > verification and works like this: > > > > > > (1) guest submits execbuffer. > > > (2) host makes execbuffer readonly for the guest > > > (3) verify the buffer (make sure it only accesses resources owned by > > > the vm). > > > (4) pass on execbuffer to the hardware. > > > (5) when the gpu is done with it make the execbuffer writable again. > > > > Ok, so are there opportunities to do those page protections outside of > > KVM? We should be able to get the vma for the buffer, can we do > > something with that to make it read-only. Alternatively can the vgpu > > driver copy it to a private buffer and hardware can execute from that? > > I'm not a virtual memory expert, but it doesn't seem like an > > insurmountable problem. Thanks, > > Originally iGVT-g used write-protection for privilege execbuffers, as Gerd > described. Now the latest implementation has removed wp to do buffer copy > instead, since the privilege command buffers are usually small. So that part > is fine. > > But we need write-protection for graphics page table shadowing as well. Once > guest driver modifies gpu page table, we need to know that and manipulate > shadow page table accordingly. buffer copy cannot help here. Thanks! >
After walking through the whole thread again, let me do a summary here so everyone can be on the same page. First, Jike told me before his vacation, that we cannot do any change to KVM module according to community comments. Now I think it's not true. We can do necessary changes, as long as it is done in a structural/layered approach, w/o hard assumption on KVMGT as the only user. That's the guideline we need to obey. :-) Mostly we care about two aspects regarding to a vgpu driver: - services/callbacks which vgpu driver provides to external framework (e.g. vgpu core driver and VFIO); - services/callbacks which vgpu driver relies on for proper emulation (e.g. from VFIO and/or hypervisor); The former is being discussed in another thread. So here let's focus on the latter. In general Intel GVT-g requires below services for emulation: 1) Selectively pass-through a region to a VM -- This can be supported by today's VFIO framework, by setting VFIO_REGION_INFO_FLAG_MMAP for concerned regions. Then Qemu will mmap that region which will finally be added to the EPT table of the target VM 2) Trap-and-emulate a region -- Similarly, this can be easily achieved by clearing MMAP flag for concerned regions. Then every access from VM will go through Qemu and then VFIO and finally reach vgpu driver. The only concern is in the performance part. We need some general mechanism to allow delivering I/O emulation request directly from KVM in kernel. For example, Alex mentioned some flavor based on file descriptor + offset. Likely let's move forward with the default Qemu forwarding, while brainstorming exit-less delivery in parallel. 3) Inject a virtual interrupt -- We can leverage existing VFIO IRQ injection interface, including configuration and irqfd interface. 4) Map/unmap guest memory -- It's there for KVM. 5) Pin/unpin guest memory -- IGD or any PCI passthru should have same requirement. So we should be able to leverage existing code in VFIO. The only tricky thing (Jike may elaborate after he is back), is that KVMGT requires to pin EPT entry too, which requires some further change in KVM side. But I'm not sure whether it still holds true after some design changes made in this thread. So I'll leave to Jike to further comment. 6) Write-protect a guest memory page -- The primary purpose is for GPU page table shadowing. We need to track modifications on guest GPU page table, so shadow part can be synchronized accordingly. Just think about CPU page table shadowing. And old example as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes not necessary now. So we need KVM to provide an interface so some agents can request such write-protection action (not just for KVMGT. could be for other tracking usages). Guangrong has been working on a general page tracking mechanism, upon which write-protection can be easily built on. The review is still in progress. 7) GPA->IOVA/HVA translation -- It's required in various places, e.g.: - read a guest structure according to GPA - replace GPA with IOVA in various shadow structures We can maintain both translations in vfio-iommu-type1 driver, since necessary information is ready at map interface. And we should use MemoryListener to update the database. It's already there for physical device passthru (Qemu uses MemoryListener and then rely to vfio). vfio-vgpu will expose query interface, thru vgpu core driver, so that vgpu driver can use above database for whatever purpose. ---- Well, then I realize pretty much opens have been covered with a solution when ending this write-up. Then we should move forward to come up a prototype upon which we can then identify anything missing or overlooked (definitely there would be), and also discuss several remaining opens atop (such as exit-less emulation, pin/unpin, etc.). Another thing we need to think is whether this new design is still compatible to Xen side. Thanks a lot all for the great discussion (especially Alex with many good inputs)! I believe it becomes much clearer now than 2 weeks ago, about how to integrate KVMGT with VFIO. :-) Thanks Kevin