> -----Original Message-----
> From: Duan, Zhenzhong <zhenzhong.d...@intel.com>
> Sent: Tuesday, July 15, 2025 11:46 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.th...@huawei.com>; qemu-...@nongnu.org;
> qemu-devel@nongnu.org
> Cc: eric.au...@redhat.com; peter.mayd...@linaro.org; j...@nvidia.com;
> nicol...@nvidia.com; ddut...@redhat.com; berra...@redhat.com;
> nath...@nvidia.com; mo...@nvidia.com; smost...@google.com; Linuxarm
> <linux...@huawei.com>; Wangzhou (B) <wangzh...@hisilicon.com>;
> jiangkunkun <jiangkun...@huawei.com>; Jonathan Cameron
> <jonathan.came...@huawei.com>; zhangfei....@linaro.org;
> shameerkolot...@gmail.com
> Subject: RE: [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-
> creatable accelerated SMMUv3
> 
> Hi Shameer,
> 
> >-----Original Message-----
> >From: Shameer Kolothum <shameerali.kolothum.th...@huawei.com>
> >Subject: [RFC PATCH v3 00/15] hw/arm/virt: Add support for
> >user-creatable accelerated SMMUv3
> >
> >Hi All,
> >
> >This patch series introduces initial support for a user-creatable,
> >accelerated SMMUv3 device (-device arm-smmuv3,accel=on) in QEMU.
> >
> >This is based on the user-creatable SMMUv3 device series [0].
> >
> >Why this is needed:
> >
> >On ARM, to enable vfio-pci pass-through devices in a VM, the host
> >SMMUv3 must be set up in nested translation mode (Stage 1 + Stage 2),
> >with Stage 1 (S1) controlled by the guest and Stage 2 (S2) managed by the
> host.
> >
> >This series introduces an optional accel property for the SMMUv3
> >device, indicating that the guest will try to leverage host SMMUv3
> >features for acceleration. By default, enabling accel configures the
> >host SMMUv3 in nested mode to support vfio-pci pass-through.
> >
> >This new accelerated, user-creatable SMMUv3 device lets you:
> >
> > -Set up a VM with multiple SMMUv3s, each tied to a different physical
> >SMMUv3
> >  on the host. Typically, you’d have multiple PCIe PXB root complexes
> >in the
> >  VM (one per virtual NUMA node), and each of them can have its own
> >SMMUv3.
> >  This setup mirrors the host's layout, where each NUMA node has its
> >own
> >  SMMUv3, and helps build VMs that are more aligned with the host's
> >NUMA
> >  topology.
> 
> Is it a must to mirror the host layout?
> Does this mirror include smmuv3.0 which linked to pcie.0?
> Do we have to create same number of smmuv3 as host smmuv3 for guest?
> What happen if we don't mirror correctly, e.g., vfio device linked to
> smmuv3.0 in guest while in host it linked to smmuv3.1?

It is not a must to mirror the host layout. But NUMA alignment will help you
achieve better performance when you have PCI pass-through devices assigned
to VM. Normally in a HW system each PCIe root complex and associated IOMMU 
will be associated with a particular NUMA node. So if you don't align correctly 
the 
the memory access won't be optimal.

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirt#sect-Virtualization_Tuning_Optimization_Guide-NUMA-Node_Locality_for_PCI

Thanks,
Shameer

> >
> > -The host–guest SMMUv3 association results in reduced invalidation
> >broadcasts
> >  and lookups for devices behind different physical SMMUv3s.
> >
> > -Simplifies handling of host SMMUv3s with differing feature sets.
> >
> > -Lays the groundwork for additional capabilities like vCMDQ support.
> >
> >Changes from RFCv2[1] and key points in RFCv3:
> >
> > -Unlike RFCv2, there is no arm-smmuv3-accel device now. The
> >accelerated
> >  mode is enabled using -device arm-smmuv3,accel=on.
> >
> > -When accel=on is specified, the SMMUv3 will allow only vfio-pci
> > endpoint
> >  devices and any non-endpoint devices like PCI bridges and root ports
> > used
> >  to plug in the vfio-pci. See patch#6
> >
> > -I have tried to keep this RFC simple and basic so we can focus on the
> >  structure of this new accelerated support. That means there is no
> > support
> >  for ATS, PASID, or PRI. Only vfio-pci devices that don’t require
> > these
> >  features will work.
> >
> > -Some clarity is still needed on the final approach to handle MSI
> translation.
> >  Hence, RMR support (which is required for this) is not included yet,
> > but
> >  available in the git branch provided below for testing.
> >
> > -At least one vfio-pci device must currently be cold-plugged to a PCIe
> >root
> >  complex associated with arm-smmuv3,accel=on. This is required to:
> >  1. associate a guest SMMUv3 with a host SMMUv3
> >  2. retrieve the host SMMUv3 feature registers for guest export
> >  This still needs discussion, as there were concerns previously about
> >this
> >  approach and it also breaks hotplug/unplug scenarios. See patch#14
> >
> > -This version does not yet support host SMMUv3 fault handling or other
> >event
> >  notifications. These will be addressed in a future patch series.
> >
> >Branch for testing:
> >
> >This is based on v8 of the SMMUv3 device series and has dependency on
> >the Intel series here [3].
> >
> >https://github.com/hisilicon/qemu/tree/smmuv3-dev-v8-accel-rfcv3
> >
> >
> >Tested on a HiSilicon platform with multiple SMMUv3s.
> >
> >./qemu-system-aarch64 \
> >  -machine virt,accel=kvm,gic-version=3 \
> >  -object iommufd,id=iommufd0 \
> >  -bios QEMU_EFI \
> >  -cpu host -smp cpus=4 -m size=16G,slots=4,maxmem=256G -nographic \
> >  -device virtio-blk-device,drive=fs \
> >  -drive if=none,file=ubuntu.img,id=fs \
> >  -kernel Image \
> >  -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0,accel=on \
> 
> Here accel=on, so only vfio device is allowed on pcie.0?
> 
> >  -device vfio-pci,host=0000:75:00.1,bus=pcie.0,iommufd=iommufd0 \
> >  -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \
> >  -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on \
> >  -device
> >pcie-root-port,id=pcie1.port1,chassis=2,bus=pcie.1,pref64-reserve=2M,io
> >-res
> >erve=1K \
> >  -device
> >vfio-pci,host=0000:7d:02.1,bus=pcie1.port1,iommufd=iommufd0,id=net1 \
> >  -append "rdinit=init console=ttyAMA0 root=/dev/vda rw
> >earlycon=pl011,0x9000000" \
> >  -device pxb-pcie,id=pcie.2,bus_nr=32,bus=pcie.0 \
> >  -device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.2 \
> >  -device pcie-root-port,id=pcie2.port1,chassis=8,bus=pcie.2 \
> >  -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie2.port1 \
> >  -fsdev local,id=p9fs,path=p9root,security_model=mapped \
> >  -net none \
> >  -nographic
> >
> >
> >Guest output:
> >
> >root@ubuntu:/# dmesg |grep smmu
> > arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> > arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features
> >0x00008305)
> > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> > arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> > arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features
> >0x00008305)
> > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
> > arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x0
> > arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit (features
> >0x00008305)
> > arm-smmu-v3 arm-smmu-v3.2.auto: allocated 65536 entries for cmdq
> > arm-smmu-v3 arm-smmu-v3.2.auto: allocated 32768 entries for evtq
> >root@ubuntu:/#
> >
> >root@ubuntu:/# lspci -tv
> >-+-[0000:20]---00.0-[21]----00.0  Red Hat, Inc Virtio filesystem
> > +-[0000:02]---00.0-[03]----00.0  Huawei Technologies Co., Ltd. Device
> >a22e
> > \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
> >             +-01.0  Huawei Technologies Co., Ltd. Device a251
> >             +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
> >             \-03.0  Red Hat, Inc. QEMU PCIe Expander bridge
> 
> Are these all the devices in this guest config?
> Will not qemu create some default devices implicitly even if we don't ask
> them in cmdline?
> 
> Thanks
> Zhenzhong

Reply via email to