Thanks for your kindly help, it seems much clear now!
So it seems that the QEMU parameters -device intel-iommu and virtio-iommu
you said both implement purely software-emulated IOMMUs, is that correct? I
have another question: Both Intel IOMMU and ARM SMMU support two-stage
translation, where the second stage is managed by VFIO to handle the
translation from IPA to HPA. Then, who manages the first stage? I find it hard
to believe that the first stage is directly managed by the VM OS because, as
you mentioned earlier, simultaneous access to the IOMMU hardware by both the VM
and the host would pose security issues. Therefore, it is highly likely that
the first stage is also managed by QEMU. However, in both QEMU's code and
VFIO's code, I only see calls for creating second-stage IOMMU domains, and I
haven’t traced any calls related to creating a first-stage IOMMU domain. This
is where my understanding gets stuck. Am I misunderstanding something here?
Thank you for your guidance.
BRs
zlcao
At 2025-12-17 22:12:47, "CLEMENT MATHIEU--DRIF"
<[email protected]> wrote:
>
>On Wed, 2025-12-17 at 11:54 +0000, Alex Bennée wrote:
>> Caution: External email. Do not open attachments or click links, unless this
>> email comes from a known sender and you know the content is safe.
>>
>>
>> tugouxp <[[email protected]](mailto:[email protected])> writes:
>>
>> (I'll preface this with my not-an-expert hat on, adding some iommu
>> maintainers to the CC who might correct my ramblings)
>>
>>
>> > Hi folks:
>> > Hello everyone, I have a few questions regarding the QEMU device
>> > passthrough feature that I’d like to ask for help with.
>> >
>> > Both my HOST OS and GUEST OS are running Ubuntu 20.4.6. I passed through a
>> > dedicated NVIDIA MX250 GPU from the HOST to the
>> > GUEST OS. On the HOST, I installed the VFIO-PCI driver for passthrough,
>> > while the GUEST OS uses the default Nouveau driver from
>> > Ubuntu 20.4.6. I also enabled IOMMU in the GUEST OS and checked the IOMMU
>> > group layout from
>> > sysfs"/sys/kernel/iommu_group/xxxx/type". The passthrough MX250 operates
>> > in “DMA” translation mode,which means the translation
>> > really work. Thanks to your excellent work, the setup process went
>> > smoothly and everything runs well. However, I have a couple of
>> > questions:
>> >
>> > 1 Is the IOMMU (DMAR) in the GUEST OS emulated by QEMU, or does it
>> > share the same IOMMU as the HOST OS?
>>
>>
>> Generally the guest IOMMU is emulated. You do not want the guest to be
>> able to directly program the host HW because that would open up security
>> issues. However for simplicity the IOMMU presented to the guest it
>> usually the same as the host hardware - whatever the architecturally
>> mandated IOMMU is.
>>
>> There are fully virtual IOMMU's (e.g. virtio-iommu) which completely
>> abstract the host hardware away.
>>
>> In both these cases it is QEMUs responsibility to take the guest
>> programming and apply those changes to the host hardware to ensure the
>> mappings work properly.
>>
>> There are also host IOMMU's which virtualise some of the interfaces to
>> so the guest can directly program them (within certain bounds) for their
>> mappings. I have no idea if the intel-iommu is one of these.
>>
>>
>>
>> > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, when the
>> > MX250 performs DMA, it should go through two-stage
>> > page table translation—first in the GUEST OS and then in the HOST OS—with
>> > VFIO-PCI assisting in this process, correct? If so, are
>> > both stages handled by hardware? I understand that the second stage is
>> > definitely hardware-assisted, but I’m uncertain about the
>> > first stage: whether the translation from IOVA to GPA (IPA) within
>> > the GUEST OS is also hardware-assisted.
>>
>>
>> I think this will depend on the implementation details of the particular
>> IOMMU.
>>
>> The guest will create/manage tables to map IOVA -> GPA.
>>
>> There are two options for QEMU now.
>>
>> The first is monitor the guest page tables for changes and then create a
>> shadow page table that mirrors the guest but maps the IOVA directly to
>> the final host physical address (HPA). This would be a single stage
>> translation. I think this is how intel-iommu,caching-mode=on works.
>
>Indeed, caching mode allows us to trap and hook where needed to build the
>shadow page table.
>
>A new mode based on nested translation is under development.
>I recommend reading this if you want more details:
>https://lists.nongnu.org/archive/html/qemu-devel/2025-12/msg01796.html
>
>>
>> The second option is for IOMMU's that support a full two-stage HW
>> translation (much in the same way as hypervisors have a second stage).
>> The initial lookup would be via the guests iommu table (IOVA->GPA)
>> before a second stage controlled by the host would map to the final
>> address (GPA->HPA). I think two stage IOMMU's are a requirement if you
>> are handling nested VMs.
>>
>>
>> > Those are my two questions. Thank you very much for your help!
>> > some information about my env:
>> >
>> > 1 Qemu Launch VM command: qemu-system-x86_64 -cpu
>> > qemu64,+mtrr,+ssse3,sse4.1,+sse4.2 -m 4096 -smp 4 --enable-kvm -
>> > drive file=./test-vm-1.qcow2,if=virtio -machine q35,kernel-irqchip=split
>> > -device intel-iommu,intremap=on,caching-mode=on -
>> > device vfio-pci,host=02:00.0
>> > 2 vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ ls
>> > 0000:00:00.0 0000:00:01.0 0000:00:02.0 0000:00:03.0 0000:00:04.0
>> > 0000:00:1f.0 0000:00:1f.2 0000:00:1f.3
>> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ lspci
>> > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM
>> > Controller
>> > 00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
>> > 00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
>> > Connection
>> > 00:03.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1)
>> > 00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>> > 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface
>> > Controller (rev 02)
>> > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6
>> > port SATA Controller [AHCI mode] (rev 02)
>> > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller
>> > (rev 02)
>> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat
>> > /sys/kernel/iommu_groups/
>> > 0/ 1/ 2/ 3/ 4/ 5/
>> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat
>> > /sys/kernel/iommu_groups/*/type
>> > DMA
>> > DMA
>> > DMA
>> > DMA
>> > DMA
>> > DMA
>> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices
>> >
>> > BRs
>> > zlc
>>
>>
>> --
>> Alex Bennée
>> Virtualisation Tech Lead @ Linaro