Hi All,

This patch series introduces initial support for a user-creatable,
accelerated SMMUv3 device (-device arm-smmuv3,accel=on) in QEMU.

This is based on the user-creatable SMMUv3 device series [0].

Why this is needed:

On ARM, to enable vfio-pci pass-through devices in a VM, the host SMMUv3
must be set up in nested translation mode (Stage 1 + Stage 2), with
Stage 1 (S1) controlled by the guest and Stage 2 (S2) managed by the host.

This series introduces an optional accel property for the SMMUv3 device,
indicating that the guest will try to leverage host SMMUv3 features for
acceleration. By default, enabling accel configures the host SMMUv3 in
nested mode to support vfio-pci pass-through.

This new accelerated, user-creatable SMMUv3 device lets you:

 -Set up a VM with multiple SMMUv3s, each tied to a different physical SMMUv3
  on the host. Typically, you’d have multiple PCIe PXB root complexes in the
  VM (one per virtual NUMA node), and each of them can have its own SMMUv3.
  This setup mirrors the host's layout, where each NUMA node has its own
  SMMUv3, and helps build VMs that are more aligned with the host's NUMA
  topology.

 -The host–guest SMMUv3 association results in reduced invalidation broadcasts
  and lookups for devices behind different physical SMMUv3s.

 -Simplifies handling of host SMMUv3s with differing feature sets.

 -Lays the groundwork for additional capabilities like vCMDQ support.

Changes from RFCv2[1] and key points in RFCv3:

 -Unlike RFCv2, there is no arm-smmuv3-accel device now. The accelerated
  mode is enabled using -device arm-smmuv3,accel=on.

 -When accel=on is specified, the SMMUv3 will allow only vfio-pci endpoint
  devices and any non-endpoint devices like PCI bridges and root ports used
  to plug in the vfio-pci. See patch#6 

 -I have tried to keep this RFC simple and basic so we can focus on the
  structure of this new accelerated support. That means there is no support
  for ATS, PASID, or PRI. Only vfio-pci devices that don’t require these
  features will work.

 -Some clarity is still needed on the final approach to handle MSI translation.
  Hence, RMR support (which is required for this) is not included yet, but
  available in the git branch provided below for testing.
 
 -At least one vfio-pci device must currently be cold-plugged to a PCIe root
  complex associated with arm-smmuv3,accel=on. This is required to:
  1. associate a guest SMMUv3 with a host SMMUv3
  2. retrieve the host SMMUv3 feature registers for guest export
  This still needs discussion, as there were concerns previously about this
  approach and it also breaks hotplug/unplug scenarios. See patch#14

 -This version does not yet support host SMMUv3 fault handling or other event
  notifications. These will be addressed in a future patch series.

Branch for testing:

This is based on v8 of the SMMUv3 device series and has dependency on the Intel
series here [3].

https://github.com/hisilicon/qemu/tree/smmuv3-dev-v8-accel-rfcv3


Tested on a HiSilicon platform with multiple SMMUv3s.

./qemu-system-aarch64 \
  -machine virt,accel=kvm,gic-version=3 \
  -object iommufd,id=iommufd0 \
  -bios QEMU_EFI \
  -cpu host -smp cpus=4 -m size=16G,slots=4,maxmem=256G -nographic \
  -device virtio-blk-device,drive=fs \
  -drive if=none,file=ubuntu.img,id=fs \
  -kernel Image \
  -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0,accel=on \
  -device vfio-pci,host=0000:75:00.1,bus=pcie.0,iommufd=iommufd0 \
  -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \
  -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on \
  -device 
pcie-root-port,id=pcie1.port1,chassis=2,bus=pcie.1,pref64-reserve=2M,io-reserve=1K
 \
  -device vfio-pci,host=0000:7d:02.1,bus=pcie1.port1,iommufd=iommufd0,id=net1 \
  -append "rdinit=init console=ttyAMA0 root=/dev/vda rw 
earlycon=pl011,0x9000000" \
  -device pxb-pcie,id=pcie.2,bus_nr=32,bus=pcie.0 \
  -device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.2 \
  -device pcie-root-port,id=pcie2.port1,chassis=8,bus=pcie.2 \
  -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie2.port1 \
  -fsdev local,id=p9fs,path=p9root,security_model=mapped \
  -net none \
  -nographic
  

Guest output:
  
root@ubuntu:/# dmesg |grep smmu
 arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
 arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008305)
 arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
 arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
 arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
 arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008305)
 arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
 arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
 arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x0
 arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit (features 0x00008305)
 arm-smmu-v3 arm-smmu-v3.2.auto: allocated 65536 entries for cmdq
 arm-smmu-v3 arm-smmu-v3.2.auto: allocated 32768 entries for evtq
root@ubuntu:/# 

root@ubuntu:/# lspci -tv
-+-[0000:20]---00.0-[21]----00.0  Red Hat, Inc Virtio filesystem
 +-[0000:02]---00.0-[03]----00.0  Huawei Technologies Co., Ltd. Device a22e
 \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
             +-01.0  Huawei Technologies Co., Ltd. Device a251
             +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
             \-03.0  Red Hat, Inc. QEMU PCIe Expander bridge
root@ubuntu:/# 

root@ubuntu:/# 
root@ubuntu:/# dmesg |grep Adding
 hns3 0000:03:00.0: Adding to iommu group 0
 hisi_zip 0000:00:01.0: Adding to iommu group 1
 pcieport 0000:20:00.0: Adding to iommu group 2
 pcieport 0000:02:00.0: Adding to iommu group 3
 virtio-pci 0000:21:00.0: Adding to iommu group 4

Further tests are always welcome.

Please take a look and let me know your feedback.

Thanks,
Shameer

[0] 
https://lore.kernel.org/qemu-devel/20250711084749.18300-1-shameerali.kolothum.th...@huawei.com/
[1] 
https://lore.kernel.org/qemu-devel/20250311141045.66620-1-shameerali.kolothum.th...@huawei.com/
[2] 
https://lore.kernel.org/qemu-devel/20250708110601.633308-1-zhenzhong.d...@intel.com/

Nicolin Chen (8):
  backends/iommufd: Introduce iommufd_backend_alloc_viommu
  backends/iommufd: Introduce iommufd_vdev_alloc
  hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
  hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache
    invalidations
  hw/arm/smmuv3: Forward invalidation commands to hw
  Read and validate host SMMUv3 feature bits

Shameer Kolothum (7):
  hw/arm/smmu-common: Factor out common helper functions and export
  hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper
  hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints
    with iommufd
  hw/arm/smmuv3: Implement get_viommu_cap() callback
  hw/pci/pci: Introduce optional get_msi_address_space() callback.
  hw/arm/smmu-common: Add accel property for SMMU dev

 backends/iommufd.c                  |  51 +++
 backends/trace-events               |   2 +
 hw/arm/meson.build                  |   3 +-
 hw/arm/smmu-common.c                |  70 ++-
 hw/arm/smmuv3-accel.c               | 631 ++++++++++++++++++++++++++++
 hw/arm/smmuv3-accel.h               |  93 ++++
 hw/arm/smmuv3-internal.h            |  27 ++
 hw/arm/smmuv3.c                     |  44 +-
 hw/arm/trace-events                 |   5 +
 hw/arm/virt.c                       |  12 +
 hw/pci-bridge/pci_expander_bridge.c |   1 -
 hw/pci/pci.c                        |  19 +
 include/hw/arm/smmu-common.h        |  10 +
 include/hw/arm/smmuv3.h             |   1 +
 include/hw/pci/pci.h                |  16 +
 include/hw/pci/pci_bridge.h         |   1 +
 include/system/iommufd.h            |  19 +
 target/arm/kvm.c                    |   2 +-
 18 files changed, 981 insertions(+), 26 deletions(-)
 create mode 100644 hw/arm/smmuv3-accel.c
 create mode 100644 hw/arm/smmuv3-accel.h

-- 
2.34.1


Reply via email to