On Wed, 2025-12-17 at 11:54 +0000, Alex Bennée wrote:
> Caution: External email. Do not open attachments or click links, unless this 
> email comes from a known sender and you know the content is safe.
> 
> 
> tugouxp <[[email protected]](mailto:[email protected])> writes:
> 
> (I'll preface this with my not-an-expert hat on, adding some iommu  
> maintainers to the CC who might correct my ramblings)
> 
> 
> > Hi folks:
> >   Hello everyone, I have a few questions regarding the QEMU device 
> > passthrough feature that I’d like to ask for help with.
> > 
> > Both my HOST OS and GUEST OS are running Ubuntu 20.4.6. I passed through a 
> > dedicated NVIDIA MX250 GPU from the HOST to the
> > GUEST OS. On the HOST, I installed the VFIO-PCI driver for passthrough, 
> > while the GUEST OS uses the default Nouveau driver from
> > Ubuntu 20.4.6. I also enabled IOMMU in the GUEST OS and checked the IOMMU 
> > group layout from
> > sysfs"/sys/kernel/iommu_group/xxxx/type". The passthrough MX250 operates in 
> > “DMA” translation mode,which means the translation
> > really work. Thanks to your excellent work, the setup process went smoothly 
> > and everything runs well. However, I have a couple of
> > questions:
> > 
> > 1 Is the IOMMU (DMAR) in the GUEST OS emulated by QEMU, or does it
> > share the same IOMMU as the HOST OS?
> 
> 
> Generally the guest IOMMU is emulated. You do not want the guest to be  
> able to directly program the host HW because that would open up security  
> issues. However for simplicity the IOMMU presented to the guest it  
> usually the same as the host hardware - whatever the architecturally  
> mandated IOMMU is.
> 
> There are fully virtual IOMMU's (e.g. virtio-iommu) which completely  
> abstract the host hardware away.
> 
> In both these cases it is QEMUs responsibility to take the guest  
> programming and apply those changes to the host hardware to ensure the  
> mappings work properly.
> 
> There are also host IOMMU's which virtualise some of the interfaces to  
> so the guest can directly program them (within certain bounds) for their  
> mappings. I have no idea if the intel-iommu is one of these.
> 
> 
> 
> > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, when the 
> > MX250 performs DMA, it should go through two-stage
> >  page table translation—first in the GUEST OS and then in the HOST OS—with 
> > VFIO-PCI assisting in this process, correct? If so, are
> >  both stages handled by hardware? I understand that the second stage is 
> > definitely hardware-assisted, but I’m uncertain about the
> >  first stage: whether the translation from IOVA to GPA (IPA) within
> > the GUEST OS is also hardware-assisted.
> 
> 
> I think this will depend on the implementation details of the particular  
> IOMMU.
> 
> The guest will create/manage tables to map IOVA -> GPA.
> 
> There are two options for QEMU now.
> 
> The first is monitor the guest page tables for changes and then create a  
> shadow page table that mirrors the guest but maps the IOVA directly to  
> the final host physical address (HPA). This would be a single stage  
> translation. I think this is how intel-iommu,caching-mode=on works.

Indeed, caching mode allows us to trap and hook where needed to build the 
shadow page table.

A new mode based on nested translation is under development.  
I recommend reading this if you want more details: 
https://lists.nongnu.org/archive/html/qemu-devel/2025-12/msg01796.html

> 
> The second option is for IOMMU's that support a full two-stage HW  
> translation (much in the same way as hypervisors have a second stage).  
> The initial lookup would be via the guests iommu table (IOVA->GPA)  
> before a second stage controlled by the host would map to the final  
> address (GPA->HPA). I think two stage IOMMU's are a requirement if you  
> are handling nested VMs.
> 
> 
> > Those are my two questions. Thank you very much for your help!
> > some information about my env:
> > 
> > 1 Qemu Launch VM command:    qemu-system-x86_64 -cpu 
> > qemu64,+mtrr,+ssse3,sse4.1,+sse4.2 -m 4096 -smp 4 --enable-kvm -
> >  drive file=./test-vm-1.qcow2,if=virtio -machine q35,kernel-irqchip=split 
> > -device intel-iommu,intremap=on,caching-mode=on -
> >  device vfio-pci,host=02:00.0
> > 2 vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ ls
> > 0000:00:00.0  0000:00:01.0  0000:00:02.0  0000:00:03.0  0000:00:04.0  
> > 0000:00:1f.0  0000:00:1f.2  0000:00:1f.3
> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ lspci
> > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM 
> > Controller
> > 00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
> > 00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network 
> > Connection
> > 00:03.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1)
> > 00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
> > 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface 
> > Controller (rev 02)
> > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 
> > port SATA Controller [AHCI mode] (rev 02)
> > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 
> > 02)
> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat 
> > /sys/kernel/iommu_groups/
> > 0/ 1/ 2/ 3/ 4/ 5/
> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat 
> > /sys/kernel/iommu_groups/*/type
> > DMA
> > DMA
> > DMA
> > DMA
> > DMA
> > DMA
> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices
> > 
> > BRs
> > zlc
> 
> 
> --  
> Alex Bennée  
> Virtualisation Tech Lead @ Linaro

Reply via email to