> -----Original Message----- > From: Duan, Zhenzhong <zhenzhong.d...@intel.com> > Sent: Tuesday, July 15, 2025 11:46 AM > To: Shameerali Kolothum Thodi > <shameerali.kolothum.th...@huawei.com>; qemu-...@nongnu.org; > qemu-devel@nongnu.org > Cc: eric.au...@redhat.com; peter.mayd...@linaro.org; j...@nvidia.com; > nicol...@nvidia.com; ddut...@redhat.com; berra...@redhat.com; > nath...@nvidia.com; mo...@nvidia.com; smost...@google.com; Linuxarm > <linux...@huawei.com>; Wangzhou (B) <wangzh...@hisilicon.com>; > jiangkunkun <jiangkun...@huawei.com>; Jonathan Cameron > <jonathan.came...@huawei.com>; zhangfei....@linaro.org; > shameerkolot...@gmail.com > Subject: RE: [RFC PATCH v3 00/15] hw/arm/virt: Add support for user- > creatable accelerated SMMUv3 > > Hi Shameer, > > >-----Original Message----- > >From: Shameer Kolothum <shameerali.kolothum.th...@huawei.com> > >Subject: [RFC PATCH v3 00/15] hw/arm/virt: Add support for > >user-creatable accelerated SMMUv3 > > > >Hi All, > > > >This patch series introduces initial support for a user-creatable, > >accelerated SMMUv3 device (-device arm-smmuv3,accel=on) in QEMU. > > > >This is based on the user-creatable SMMUv3 device series [0]. > > > >Why this is needed: > > > >On ARM, to enable vfio-pci pass-through devices in a VM, the host > >SMMUv3 must be set up in nested translation mode (Stage 1 + Stage 2), > >with Stage 1 (S1) controlled by the guest and Stage 2 (S2) managed by the > host. > > > >This series introduces an optional accel property for the SMMUv3 > >device, indicating that the guest will try to leverage host SMMUv3 > >features for acceleration. By default, enabling accel configures the > >host SMMUv3 in nested mode to support vfio-pci pass-through. > > > >This new accelerated, user-creatable SMMUv3 device lets you: > > > > -Set up a VM with multiple SMMUv3s, each tied to a different physical > >SMMUv3 > > on the host. Typically, you’d have multiple PCIe PXB root complexes > >in the > > VM (one per virtual NUMA node), and each of them can have its own > >SMMUv3. > > This setup mirrors the host's layout, where each NUMA node has its > >own > > SMMUv3, and helps build VMs that are more aligned with the host's > >NUMA > > topology. > > Is it a must to mirror the host layout? > Does this mirror include smmuv3.0 which linked to pcie.0? > Do we have to create same number of smmuv3 as host smmuv3 for guest? > What happen if we don't mirror correctly, e.g., vfio device linked to > smmuv3.0 in guest while in host it linked to smmuv3.1?
It is not a must to mirror the host layout. But NUMA alignment will help you achieve better performance when you have PCI pass-through devices assigned to VM. Normally in a HW system each PCIe root complex and associated IOMMU will be associated with a particular NUMA node. So if you don't align correctly the the memory access won't be optimal. https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirt#sect-Virtualization_Tuning_Optimization_Guide-NUMA-Node_Locality_for_PCI Thanks, Shameer > > > > -The host–guest SMMUv3 association results in reduced invalidation > >broadcasts > > and lookups for devices behind different physical SMMUv3s. > > > > -Simplifies handling of host SMMUv3s with differing feature sets. > > > > -Lays the groundwork for additional capabilities like vCMDQ support. > > > >Changes from RFCv2[1] and key points in RFCv3: > > > > -Unlike RFCv2, there is no arm-smmuv3-accel device now. The > >accelerated > > mode is enabled using -device arm-smmuv3,accel=on. > > > > -When accel=on is specified, the SMMUv3 will allow only vfio-pci > > endpoint > > devices and any non-endpoint devices like PCI bridges and root ports > > used > > to plug in the vfio-pci. See patch#6 > > > > -I have tried to keep this RFC simple and basic so we can focus on the > > structure of this new accelerated support. That means there is no > > support > > for ATS, PASID, or PRI. Only vfio-pci devices that don’t require > > these > > features will work. > > > > -Some clarity is still needed on the final approach to handle MSI > translation. > > Hence, RMR support (which is required for this) is not included yet, > > but > > available in the git branch provided below for testing. > > > > -At least one vfio-pci device must currently be cold-plugged to a PCIe > >root > > complex associated with arm-smmuv3,accel=on. This is required to: > > 1. associate a guest SMMUv3 with a host SMMUv3 > > 2. retrieve the host SMMUv3 feature registers for guest export > > This still needs discussion, as there were concerns previously about > >this > > approach and it also breaks hotplug/unplug scenarios. See patch#14 > > > > -This version does not yet support host SMMUv3 fault handling or other > >event > > notifications. These will be addressed in a future patch series. > > > >Branch for testing: > > > >This is based on v8 of the SMMUv3 device series and has dependency on > >the Intel series here [3]. > > > >https://github.com/hisilicon/qemu/tree/smmuv3-dev-v8-accel-rfcv3 > > > > > >Tested on a HiSilicon platform with multiple SMMUv3s. > > > >./qemu-system-aarch64 \ > > -machine virt,accel=kvm,gic-version=3 \ > > -object iommufd,id=iommufd0 \ > > -bios QEMU_EFI \ > > -cpu host -smp cpus=4 -m size=16G,slots=4,maxmem=256G -nographic \ > > -device virtio-blk-device,drive=fs \ > > -drive if=none,file=ubuntu.img,id=fs \ > > -kernel Image \ > > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0,accel=on \ > > Here accel=on, so only vfio device is allowed on pcie.0? > > > -device vfio-pci,host=0000:75:00.1,bus=pcie.0,iommufd=iommufd0 \ > > -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \ > > -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on \ > > -device > >pcie-root-port,id=pcie1.port1,chassis=2,bus=pcie.1,pref64-reserve=2M,io > >-res > >erve=1K \ > > -device > >vfio-pci,host=0000:7d:02.1,bus=pcie1.port1,iommufd=iommufd0,id=net1 \ > > -append "rdinit=init console=ttyAMA0 root=/dev/vda rw > >earlycon=pl011,0x9000000" \ > > -device pxb-pcie,id=pcie.2,bus_nr=32,bus=pcie.0 \ > > -device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.2 \ > > -device pcie-root-port,id=pcie2.port1,chassis=8,bus=pcie.2 \ > > -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie2.port1 \ > > -fsdev local,id=p9fs,path=p9root,security_model=mapped \ > > -net none \ > > -nographic > > > > > >Guest output: > > > >root@ubuntu:/# dmesg |grep smmu > > arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0 > > arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features > >0x00008305) > > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq > > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq > > arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0 > > arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features > >0x00008305) > > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq > > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq > > arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x0 > > arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit (features > >0x00008305) > > arm-smmu-v3 arm-smmu-v3.2.auto: allocated 65536 entries for cmdq > > arm-smmu-v3 arm-smmu-v3.2.auto: allocated 32768 entries for evtq > >root@ubuntu:/# > > > >root@ubuntu:/# lspci -tv > >-+-[0000:20]---00.0-[21]----00.0 Red Hat, Inc Virtio filesystem > > +-[0000:02]---00.0-[03]----00.0 Huawei Technologies Co., Ltd. Device > >a22e > > \-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > > +-01.0 Huawei Technologies Co., Ltd. Device a251 > > +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > > \-03.0 Red Hat, Inc. QEMU PCIe Expander bridge > > Are these all the devices in this guest config? > Will not qemu create some default devices implicitly even if we don't ask > them in cmdline? > > Thanks > Zhenzhong