Hi tugouxp,

On 2025/12/22 14:14, tugouxp wrote:
Hi Tao:
    Thanks for your answer!

     So your environment involves simulating an AARCH64 VM OS on an X86 host, 
and then within that simulated AARCH64 VM, you've installed another KVM aarch64 
VM? Essentially, it's an environment with two nested aarch64 VMs running on an 
x86 host? I can hardly believe it — that's a genius idea! I'm quite concerned 
about its performance, though. Will it be very slow and laggy?

Yes, a virtualization-capable aarch64 TCG VM can be set up using -machine virt,virtualization=on regardless of the host architecture, including on an x86 host, thanks to QEMU’s TCG cross-architecture dynamic translation mechanism. TCG emulates the guest CPU and system well enough that the guest OS sees and can exercise ARM virtualization extensions (EL2 and Stage-2), so from the guest kernel’s perspective the machine does genuinely have virtualization support enabled. This makes setting up a KVM VM inside this TCG guest reasonable.

In fact, the Linux kernel and the qemu-system-* binaries running inside the TCG host VM are the same binaries you would run on a physical machine — there is no modification or special build for this TCG environment. Of course, performance will be slower than on real hardware, as TCG interprets and translates guest instructions in software rather than executing them directly on physical virtualization extensions; however, for workloads that are not heavily I/O-intensive, simple tests work fine. For example, I passed through a PCIe NVMe device to the nested VM, mounted it, and was able to read and write files on it just like on real hardware.


I'm sure it is doable to do this on aarch64. But I'm not sure how it is on x86_64/i386. I did a cursory search of the code and found it seems to be unsupported to setup a TCG VM with virtualization-capability enabled on x86_64/i386. But I'm not familiar with this so we may need an x86 specialist to help with this.


By the way, isn't the KVM (because you use --enale-kvm in the nested aarch64 
vm) in the aarch64 VM also emulated? After all, KVM can only be used on the 
same ISA architecture.

They are all emulated and all aarch64 machine.


Regards,

Tao



BRs

zlcao.










At 2025-12-22 11:49:33, "Tao Tang" <[email protected]> wrote:
Hi zlcao,

Hi guys:
     I want to learn about how Intel IOMMU second-level translation
works. Does anyone have any materials or pages on this topic, such as
WIP repositories, operational guides, etc.? Thanks!
BRs
zlcao.

 From my experience, the best approach to learn this is to setup a
nested virtualization environment, find the senario that can trigger
second-level translation, and then trace all Intel IOMMU events or use
gdb to trace line by line in hw/i386/intel_iommu.c. Also you may need to
get Intel IOMMU spec and find the related chapter which describes
second-level translation.


I worked in SMMU these days and had the same problem with you and found
that setup a nested virtualization environment then passthrough a PCIe
device from TCG Host VM into KVM Guest VM could be a good way to debug
IOMMU implementation. This link [1] show how it works in Arm SMMU.

[1]
https://lore.kernel.org/qemu-devel/[email protected]/


......

Non-Secure Regression: To ensure that existing functionality remains
intact, I ran a nested virtualization test. A TCG guest was created on
the host, with iommu=smmuv3 and with an emulated PCIe NVMe device assigned.
Command line of TCG VM is below:

qemu-system-aarch64 \
-machine virt,virtualization=on,gic-version=3,iommu=smmuv3 \
-cpu max -smp 1 -m 4080M \
-accel tcg,thread=single,tb-size=512 \
-kernel Image \
-append 'nokaslr root=/dev/vda rw rootfstype=ext4 iommu.passthrough=on' \
-device pcie-root-port,bus=pcie.0,id=rp0,addr=0x4.0,chassis=1,port=0x10 \
-device pcie-root-port,bus=pcie.0,id=rp1,addr=0x5.0,chassis=2,port=0x11 \
-drive if=none,file=u2204fs.img.qcow2,format=qcow2,id=hd0 \
-device virtio-blk-device,drive=hd0 \
-qmp unix:/tmp/qmp-sock12,server=on,wait=off \
-netdev user,id=eth0,hostfwd=tcp::10022-:22,hostfwd=tcp::59922-:5922 \
-device virtio-net-device,netdev=eth0 \
-drive if=none,file=nvme.img,format=raw,id=nvme0 \
-device nvme,drive=nvme0,serial=deadbeef \
-d unimp,guest_errors -trace events=smmu-events.txt -D qemu.log -nographic

Inside this TCG VM, a KVM guest was launched, and the same NVMe device was
re-assigned to it via VFIO.
Command line of KVM VM inside TCG VM is below:

sudo qemu-system-aarch64  \
-enable-kvm  -m 1024  -cpu host  -M virt \
-machine virt,gic-version=3 \
-cpu max -append "nokaslr" -smp 1 \
-monitor stdio \
-kernel 5.15.Image \
-initrd rootfs.cpio.gz \
-display vnc=:22,id=primary \
-device vfio-pci,host=00:01.0

The KVM guest was able to perform I/O on the device
correctly, confirming that the non-secure path is not broken.

......



I'm not familiar with Intel IOMMU so I'm not enable to help with the
right options that apply PCIe passthroughing on Intel IOMMU.


BTW, I have submitted a patch series introducing iommu-testdev [2] ,
which allows testing IOMMU functionality purely with QTest, without
setting up a complex software stack. Once you have a clear understanding
of the second-level translation, you are very welcome to share your
findings and help improve the Intel IOMMU implementation in iommu-testdev.


[2]
https://lore.kernel.org/qemu-devel/[email protected]/


Regards,

Tao


Reply via email to