On Wed, 2025-12-17 at 11:54 +0000, Alex Bennée wrote: > Caution: External email. Do not open attachments or click links, unless this > email comes from a known sender and you know the content is safe. > > > tugouxp <[[email protected]](mailto:[email protected])> writes: > > (I'll preface this with my not-an-expert hat on, adding some iommu > maintainers to the CC who might correct my ramblings) > > > > Hi folks: > > Hello everyone, I have a few questions regarding the QEMU device > > passthrough feature that I’d like to ask for help with. > > > > Both my HOST OS and GUEST OS are running Ubuntu 20.4.6. I passed through a > > dedicated NVIDIA MX250 GPU from the HOST to the > > GUEST OS. On the HOST, I installed the VFIO-PCI driver for passthrough, > > while the GUEST OS uses the default Nouveau driver from > > Ubuntu 20.4.6. I also enabled IOMMU in the GUEST OS and checked the IOMMU > > group layout from > > sysfs"/sys/kernel/iommu_group/xxxx/type". The passthrough MX250 operates in > > “DMA” translation mode,which means the translation > > really work. Thanks to your excellent work, the setup process went smoothly > > and everything runs well. However, I have a couple of > > questions: > > > > 1 Is the IOMMU (DMAR) in the GUEST OS emulated by QEMU, or does it > > share the same IOMMU as the HOST OS? > > > Generally the guest IOMMU is emulated. You do not want the guest to be > able to directly program the host HW because that would open up security > issues. However for simplicity the IOMMU presented to the guest it > usually the same as the host hardware - whatever the architecturally > mandated IOMMU is. > > There are fully virtual IOMMU's (e.g. virtio-iommu) which completely > abstract the host hardware away. > > In both these cases it is QEMUs responsibility to take the guest > programming and apply those changes to the host hardware to ensure the > mappings work properly. > > There are also host IOMMU's which virtualise some of the interfaces to > so the guest can directly program them (within certain bounds) for their > mappings. I have no idea if the intel-iommu is one of these. > > > > > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, when the > > MX250 performs DMA, it should go through two-stage > > page table translation—first in the GUEST OS and then in the HOST OS—with > > VFIO-PCI assisting in this process, correct? If so, are > > both stages handled by hardware? I understand that the second stage is > > definitely hardware-assisted, but I’m uncertain about the > > first stage: whether the translation from IOVA to GPA (IPA) within > > the GUEST OS is also hardware-assisted. > > > I think this will depend on the implementation details of the particular > IOMMU. > > The guest will create/manage tables to map IOVA -> GPA. > > There are two options for QEMU now. > > The first is monitor the guest page tables for changes and then create a > shadow page table that mirrors the guest but maps the IOVA directly to > the final host physical address (HPA). This would be a single stage > translation. I think this is how intel-iommu,caching-mode=on works.
Indeed, caching mode allows us to trap and hook where needed to build the shadow page table. A new mode based on nested translation is under development. I recommend reading this if you want more details: https://lists.nongnu.org/archive/html/qemu-devel/2025-12/msg01796.html > > The second option is for IOMMU's that support a full two-stage HW > translation (much in the same way as hypervisors have a second stage). > The initial lookup would be via the guests iommu table (IOVA->GPA) > before a second stage controlled by the host would map to the final > address (GPA->HPA). I think two stage IOMMU's are a requirement if you > are handling nested VMs. > > > > Those are my two questions. Thank you very much for your help! > > some information about my env: > > > > 1 Qemu Launch VM command: qemu-system-x86_64 -cpu > > qemu64,+mtrr,+ssse3,sse4.1,+sse4.2 -m 4096 -smp 4 --enable-kvm - > > drive file=./test-vm-1.qcow2,if=virtio -machine q35,kernel-irqchip=split > > -device intel-iommu,intremap=on,caching-mode=on - > > device vfio-pci,host=02:00.0 > > 2 vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ ls > > 0000:00:00.0 0000:00:01.0 0000:00:02.0 0000:00:03.0 0000:00:04.0 > > 0000:00:1f.0 0000:00:1f.2 0000:00:1f.3 > > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ lspci > > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM > > Controller > > 00:01.0 VGA compatible controller: Device 1234:1111 (rev 02) > > 00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network > > Connection > > 00:03.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1) > > 00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device > > 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface > > Controller (rev 02) > > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 > > port SATA Controller [AHCI mode] (rev 02) > > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev > > 02) > > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat > > /sys/kernel/iommu_groups/ > > 0/ 1/ 2/ 3/ 4/ 5/ > > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat > > /sys/kernel/iommu_groups/*/type > > DMA > > DMA > > DMA > > DMA > > DMA > > DMA > > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices > > > > BRs > > zlc > > > -- > Alex Bennée > Virtualisation Tech Lead @ Linaro
