Hi yi:
Thank you very much. Through our previous communication, I think I have a general understanding now. Please help confirm if my understanding is correct: In my application scenario , I am passing through a GPU to the VM, and the VM OS does not have IOMMU enabled. As I understand it, the current mainline QEMU and Linux kernel should be able to achieve this. However, if I want the passthrough GPU device to support both Stage 1 and Stage 2 translations, I must use a special development branch in QEMU, is that correct? By the way, I mentioned earlier that I was using QEMU 4.2.1. After double-checking, I found that the passthrough GPU did not successfully install its own driver; instead, it was using the emulated BOCHS DRM driver. I am not sure if this is related to the lack of support for nested IOMMU, but it seems highly likely that it is. Thank you once again very much. At 2025-12-18 14:47:10, "Yi Liu" <[email protected]> wrote: >On 2025/12/18 09:33, tugouxp wrote: >> >> Thanks for your kindly help, it seems much clear now! >> >> So it seems that the QEMU parameters |-device intel-iommu| and >> |virtio-iommu| you said both implement purely software-emulated IOMMUs, >> is that correct? I have another question: Both Intel IOMMU and ARM SMMU >> support two-stage translation, where the second stage is managed by VFIO >> to handle the translation from IPA to HPA. Then, who manages the first >> stage? > >In nested translation mode, guest manages the first stage. > >> I find it hard to believe that the first stage is directly >> managed by the VM OS because, as you mentioned earlier, simultaneous >> access to the IOMMU hardware by both the VM and the host would pose >> security issues. > >In nested translation, any output of first stage translation is >subjected to the second stage, and second stage is under VMM's >control. So guest cannot harm the system even it manages first >stage. > >> Therefore, it is highly likely that the first stage is >> also managed by QEMU. However, in both QEMU's code and VFIO's code, I >> only see calls for creating second-stage IOMMU domains, and I haven’t >> traced any calls related to creating a first-stage IOMMU domain. This is >> where my understanding gets stuck. Am I misunderstanding something here? > >nested translation mode is wip. You can get a full view by referring the >below links. > >[1] >https://lore.kernel.org/qemu-devel/[email protected]/ >[2] >https://lore.kernel.org/qemu-devel/[email protected]/ > > >>> > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, >when the MX250 performs DMA, it should go through two-stage > >>> > page table translation—first in the GUEST OS and then in the >HOST OS—with VFIO-PCI assisting in this process, correct? If so, are > >>> > both stages handled by hardware? I understand that the second >stage is definitely hardware-assisted, but I’m uncertain about the > >>> > first stage: whether the translation from IOVA to GPA (IPA) within > >>> > the GUEST OS is also hardware-assisted. > >Alex has provided a comprehensive response to this quetion. I'd like to >emphasize one key point in case there are any remaining questions: For >passthrough devices, DMA address translation is invariably handled by >the hardware IOMMU. The VMM is responsible for configuring the >appropriate translation type and establishing the correct page table >mappings. > >Regards, >Yi Liu
