Hi yi:


Thank you very much. Through our previous communication, I think I have a 
general understanding now. Please help confirm if my understanding is correct:

In my application scenario , I am passing through a GPU to the VM, and the VM 
OS does not have IOMMU enabled. As I understand it, the current mainline QEMU 
and Linux kernel should be able to achieve this. However, if I want the 
passthrough GPU device to support both Stage 1 and Stage 2 translations, I must 
use a special development branch in QEMU, is that correct?

By the way, I mentioned earlier that I was using QEMU 4.2.1. After 
double-checking, I found that the passthrough GPU did not successfully install 
its own driver; instead, it was using the emulated BOCHS DRM driver. I am not 
sure if this is related to the lack of support for nested IOMMU, but it seems 
highly likely that it is.

Thank you once again very much.


At 2025-12-18 14:47:10, "Yi Liu" <[email protected]> wrote:
>On 2025/12/18 09:33, tugouxp wrote:
>> 
>> Thanks for your kindly help, it seems much clear now!
>> 
>>        So it seems that the QEMU parameters |-device intel-iommu| and 
>> |virtio-iommu| you said both implement purely software-emulated IOMMUs, 
>> is that correct? I have another question: Both Intel IOMMU and ARM SMMU 
>> support two-stage translation, where the second stage is managed by VFIO 
>> to handle the translation from IPA to HPA. Then, who manages the first 
>> stage?
>
>In nested translation mode, guest manages the first stage.
>
>> I find it hard to believe that the first stage is directly 
>> managed by the VM OS because, as you mentioned earlier, simultaneous 
>> access to the IOMMU hardware by both the VM and the host would pose 
>> security issues.
>
>In nested translation, any output of first stage translation is
>subjected to the second stage, and second stage is under VMM's
>control. So guest cannot harm the system even it manages first
>stage.
>
>> Therefore, it is highly likely that the first stage is 
>> also managed by QEMU. However, in both QEMU's code and VFIO's code, I 
>> only see calls for creating second-stage IOMMU domains, and I haven’t 
>> traced any calls related to creating a first-stage IOMMU domain. This is 
>> where my understanding gets stuck. Am I misunderstanding something here?
>
>nested translation mode is wip. You can get a full view by referring the
>below links.
>
>[1] 
>https://lore.kernel.org/qemu-devel/[email protected]/
>[2] 
>https://lore.kernel.org/qemu-devel/[email protected]/
>
> >>> > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, 
>when the MX250 performs DMA, it should go through two-stage
> >>> >  page table translation—first in the GUEST OS and then in the 
>HOST OS—with VFIO-PCI assisting in this process, correct? If so, are
> >>> >  both stages handled by hardware? I understand that the second 
>stage is definitely hardware-assisted, but I’m uncertain about the
> >>> >  first stage: whether the translation from IOVA to GPA (IPA) within
> >>> > the GUEST OS is also hardware-assisted.
>
>Alex has provided a comprehensive response to this quetion. I'd like to
>emphasize one key point in case there are any remaining questions: For
>passthrough devices, DMA address translation is invariably handled by
>the hardware IOMMU. The VMM is responsible for configuring the
>appropriate translation type and establishing the correct page table
>mappings.
>
>Regards,
>Yi Liu

Reply via email to