tugouxp <[email protected]> writes: (I'll preface this with my not-an-expert hat on, adding some iommu maintainers to the CC who might correct my ramblings)
> Hi folks: > Hello everyone, I have a few questions regarding the QEMU device > passthrough feature that I’d like to ask for help with. > > Both my HOST OS and GUEST OS are running Ubuntu 20.4.6. I passed through a > dedicated NVIDIA MX250 GPU from the HOST to the > GUEST OS. On the HOST, I installed the VFIO-PCI driver for passthrough, while > the GUEST OS uses the default Nouveau driver from > Ubuntu 20.4.6. I also enabled IOMMU in the GUEST OS and checked the IOMMU > group layout from > sysfs"/sys/kernel/iommu_group/xxxx/type". The passthrough MX250 operates in > “DMA” translation mode,which means the translation > really work. Thanks to your excellent work, the setup process went smoothly > and everything runs well. However, I have a couple of > questions: > > 1 Is the IOMMU (DMAR) in the GUEST OS emulated by QEMU, or does it > share the same IOMMU as the HOST OS? Generally the guest IOMMU is emulated. You do not want the guest to be able to directly program the host HW because that would open up security issues. However for simplicity the IOMMU presented to the guest it usually the same as the host hardware - whatever the architecturally mandated IOMMU is. There are fully virtual IOMMU's (e.g. virtio-iommu) which completely abstract the host hardware away. In both these cases it is QEMUs responsibility to take the guest programming and apply those changes to the host hardware to ensure the mappings work properly. There are also host IOMMU's which virtualise some of the interfaces to so the guest can directly program them (within certain bounds) for their mappings. I have no idea if the intel-iommu is one of these. > > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, when the MX250 > performs DMA, it should go through two-stage > page table translation—first in the GUEST OS and then in the HOST OS—with > VFIO-PCI assisting in this process, correct? If so, are > both stages handled by hardware? I understand that the second stage is > definitely hardware-assisted, but I’m uncertain about the > first stage: whether the translation from IOVA to GPA (IPA) within > the GUEST OS is also hardware-assisted. I think this will depend on the implementation details of the particular IOMMU. The guest will create/manage tables to map IOVA -> GPA. There are two options for QEMU now. The first is monitor the guest page tables for changes and then create a shadow page table that mirrors the guest but maps the IOVA directly to the final host physical address (HPA). This would be a single stage translation. I think this is how intel-iommu,caching-mode=on works. The second option is for IOMMU's that support a full two-stage HW translation (much in the same way as hypervisors have a second stage). The initial lookup would be via the guests iommu table (IOVA->GPA) before a second stage controlled by the host would map to the final address (GPA->HPA). I think two stage IOMMU's are a requirement if you are handling nested VMs. > Those are my two questions. Thank you very much for your help! > some information about my env: > > 1 Qemu Launch VM command: qemu-system-x86_64 -cpu > qemu64,+mtrr,+ssse3,sse4.1,+sse4.2 -m 4096 -smp 4 --enable-kvm - > drive file=./test-vm-1.qcow2,if=virtio -machine q35,kernel-irqchip=split > -device intel-iommu,intremap=on,caching-mode=on - > device vfio-pci,host=02:00.0 > 2 vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ ls > 0000:00:00.0 0000:00:01.0 0000:00:02.0 0000:00:03.0 0000:00:04.0 > 0000:00:1f.0 0000:00:1f.2 0000:00:1f.3 > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ lspci > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM > Controller > 00:01.0 VGA compatible controller: Device 1234:1111 (rev 02) > 00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network > Connection > 00:03.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1) > 00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device > 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller > (rev 02) > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port > SATA Controller [AHCI mode] (rev 02) > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev > 02) > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat > /sys/kernel/iommu_groups/ > 0/ 1/ 2/ 3/ 4/ 5/ > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat > /sys/kernel/iommu_groups/*/type > DMA > DMA > DMA > DMA > DMA > DMA > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices > > BRs > zlc -- Alex Bennée Virtualisation Tech Lead @ Linaro
