On Fri, 30 Aug 2019 08:06:32 +0000 "Tian, Kevin" <kevin.t...@intel.com> wrote:
> > From: Tian, Kevin > > Sent: Friday, August 30, 2019 3:26 PM > > > [...] > > > How does QEMU handle the fact that IOVAs are potentially dynamic while > > > performing the live portion of a migration? For example, each time a > > > guest driver calls dma_map_page() or dma_unmap_page(), a > > > MemoryRegionSection pops in or out of the AddressSpace for the device > > > (I'm assuming a vIOMMU where the device AddressSpace is not > > > system_memory). I don't see any QEMU code that intercepts that change > > > in the AddressSpace such that the IOVA dirty pfns could be recorded and > > > translated to GFNs. The vendor driver can't track these beyond getting > > > an unmap notification since it only knows the IOVA pfns, which can be > > > re-used with different GFN backing. Once the DMA mapping is torn down, > > > it seems those dirty pfns are lost in the ether. If this works in QEMU, > > > please help me find the code that handles it. > > > > I'm curious about this part too. Interestingly, I didn't find any log_sync > > callback registered by emulated devices in Qemu. Looks dirty pages > > by emulated DMAs are recorded in some implicit way. But KVM always > > reports dirty page in GFN instead of IOVA, regardless of the presence of > > vIOMMU. If Qemu also tracks dirty pages in GFN for emulated DMAs > > (translation can be done when DMA happens), then we don't need > > worry about transient mapping from IOVA to GFN. Along this way we > > also want GFN-based dirty bitmap being reported through VFIO, > > similar to what KVM does. For vendor drivers, it needs to translate > > from IOVA to HVA to GFN when tracking DMA activities on VFIO > > devices. IOVA->HVA is provided by VFIO. for HVA->GFN, it can be > > provided by KVM but I'm not sure whether it's exposed now. > > > > HVA->GFN can be done through hva_to_gfn_memslot in kvm_host.h. I thought it was bad enough that we have vendor drivers that depend on KVM, but designing a vfio interface that only supports a KVM interface is more undesirable. I also note without comment that gfn_to_memslot() is a GPL symbol. Thanks, Alex > Above flow works for software-tracked dirty mechanism, e.g. in > KVMGT, where GFN-based 'dirty' is marked when a guest page is > mapped into device mmu. IOVA->HPA->GFN translation is done > at that time, thus immune from further IOVA->GFN changes. > > When hardware IOMMU supports D-bit in 2nd level translation (e.g. > VT-d rev3.0), there are two scenarios: > > 1) nested translation: guest manages 1st-level translation (IOVA->GPA) > and host manages 2nd-level translation (GPA->HPA). The 2nd-level > is not affected by guest mapping operations. So it's OK for IOMMU > driver to retrieve GFN-based dirty pages by directly scanning the 2nd- > level structure, upon request from user space. > > 2) shadowed translation (IOVA->HPA) in 2nd level: in such case the dirty > information is tied to IOVA. the IOMMU driver is expected to maintain > an internal dirty bitmap. Upon any change of IOVA->GPA notification > from VFIO, the IOMMU driver should flush dirty status of affected 2nd-level > entries to the internal GFN-based bitmap. At this time, again IOVA->HVA > ->GPA translation required for GFN-based recording. When userspace > queries dirty bitmap, the IOMMU driver needs to flush latest 2nd-level > dirty status to internal bitmap, which is then copied to user space. > > Given the trickiness of 2), we aim to enable 1) on intel-iommu driver. > > Thanks > Kevin