> From: Tian, Kevin > Sent: Friday, August 30, 2019 3:26 PM > [...] > > How does QEMU handle the fact that IOVAs are potentially dynamic while > > performing the live portion of a migration? For example, each time a > > guest driver calls dma_map_page() or dma_unmap_page(), a > > MemoryRegionSection pops in or out of the AddressSpace for the device > > (I'm assuming a vIOMMU where the device AddressSpace is not > > system_memory). I don't see any QEMU code that intercepts that change > > in the AddressSpace such that the IOVA dirty pfns could be recorded and > > translated to GFNs. The vendor driver can't track these beyond getting > > an unmap notification since it only knows the IOVA pfns, which can be > > re-used with different GFN backing. Once the DMA mapping is torn down, > > it seems those dirty pfns are lost in the ether. If this works in QEMU, > > please help me find the code that handles it. > > I'm curious about this part too. Interestingly, I didn't find any log_sync > callback registered by emulated devices in Qemu. Looks dirty pages > by emulated DMAs are recorded in some implicit way. But KVM always > reports dirty page in GFN instead of IOVA, regardless of the presence of > vIOMMU. If Qemu also tracks dirty pages in GFN for emulated DMAs > (translation can be done when DMA happens), then we don't need > worry about transient mapping from IOVA to GFN. Along this way we > also want GFN-based dirty bitmap being reported through VFIO, > similar to what KVM does. For vendor drivers, it needs to translate > from IOVA to HVA to GFN when tracking DMA activities on VFIO > devices. IOVA->HVA is provided by VFIO. for HVA->GFN, it can be > provided by KVM but I'm not sure whether it's exposed now. >
HVA->GFN can be done through hva_to_gfn_memslot in kvm_host.h. Above flow works for software-tracked dirty mechanism, e.g. in KVMGT, where GFN-based 'dirty' is marked when a guest page is mapped into device mmu. IOVA->HPA->GFN translation is done at that time, thus immune from further IOVA->GFN changes. When hardware IOMMU supports D-bit in 2nd level translation (e.g. VT-d rev3.0), there are two scenarios: 1) nested translation: guest manages 1st-level translation (IOVA->GPA) and host manages 2nd-level translation (GPA->HPA). The 2nd-level is not affected by guest mapping operations. So it's OK for IOMMU driver to retrieve GFN-based dirty pages by directly scanning the 2nd- level structure, upon request from user space. 2) shadowed translation (IOVA->HPA) in 2nd level: in such case the dirty information is tied to IOVA. the IOMMU driver is expected to maintain an internal dirty bitmap. Upon any change of IOVA->GPA notification from VFIO, the IOMMU driver should flush dirty status of affected 2nd-level entries to the internal GFN-based bitmap. At this time, again IOVA->HVA ->GPA translation required for GFN-based recording. When userspace queries dirty bitmap, the IOMMU driver needs to flush latest 2nd-level dirty status to internal bitmap, which is then copied to user space. Given the trickiness of 2), we aim to enable 1) on intel-iommu driver. Thanks Kevin