Hi Nocolin, > -----Original Message----- > From: Nicolin Chen <[email protected]> > Sent: 30 September 2025 01:11 > To: Shameer Kolothum <[email protected]> > Cc: [email protected]; [email protected]; > [email protected]; [email protected]; Jason Gunthorpe > <[email protected]>; [email protected]; [email protected]; Nathan > Chen <[email protected]>; Matt Ochs <[email protected]>; > [email protected]; [email protected]; > [email protected]; [email protected]; > [email protected]; [email protected]; [email protected]; > [email protected] > Subject: Re: [PATCH v4 06/27] hw/arm/smmuv3-accel: Restrict accelerated > SMMUv3 to vfio-pci endpoints with iommufd > > On Mon, Sep 29, 2025 at 02:36:22PM +0100, Shameer Kolothum wrote: > > Accelerated SMMUv3 is only useful when the device can take advantage of > > the host's SMMUv3 in nested mode. To keep things simple and correct, we > > only allow this feature for vfio-pci endpoint devices that use the iommufd > > backend. We also allow non-endpoint emulated devices like PCI bridges and > > root ports, so that users can plug in these vfio-pci devices. We can only > > enforce this if devices are cold plugged. For hotplug cases, give > > appropriate > > warnings. > > > > Another reason for this limit is to avoid problems with IOTLB > > invalidations. Some commands (e.g., CMD_TLBI_NH_ASID) lack an > associated > > SID, making it difficult to trace the originating device. If we allowed > > emulated endpoint devices, QEMU would have to invalidate both its own > > software IOTLB and the host's hardware IOTLB, which could slow things > > down. > > > > Since vfio-pci devices in nested mode rely on the host SMMUv3's nested > > translation (S1+S2), their get_address_space() callback must return the > > system address space so that VFIO core can setup correct S2 mappings > > for guest RAM. > > > > So in short: > > - vfio-pci devices(with iommufd as backend) return the system address > > space. > > - bridges and root ports return the IOMMU address space. > > > > Signed-off-by: Shameer Kolothum <[email protected]> > > Reviewed-by: Nicolin Chen <[email protected]> > > With some nits: > > > + /* > > + * We return the system address for vfio-pci devices(with iommufd as > > + * backend) so that the VFIO core can set up Stage-2 (S2) mappings for > > + * guest RAM. This is needed because, in the accelerated SMMUv3 case, > > + * the host SMMUv3 runs in nested (S1 + S2) mode where the guest > > + * manages its own S1 page tables while the host manages S2. > > + * > > + * We are using the global &address_space_memory here, as this will > ensure > > + * same system address space pointer for all devices behind the > accelerated > > + * SMMUv3s in a VM. That way VFIO/iommufd can reuse a single IOAS ID > in > > + * iommufd_cdev_attach(), allowing the Stage-2 page tables to be > shared > > + * within the VM instead of duplicating them for every SMMUv3 > instance. > > + */ > > + if (vfio_pci) { > > + return &address_space_memory; > > How about: > > /* > * In the accelerated case, a vfio-pci device passed through via the > iommufd > * backend must stay in the system address space, as it is always > translated > * by its physical SMMU (using a stage-2-only STE or a nested STE), in > which > * case the stage-2 nesting parent page table is allocated by the vfio > core, > * backing up the system address space. > * > * So, return the system address space via the global > address_space_memory. > * The shared address_space_memory also allows devices under different > vSMMU > * instances in a VM to reuse a single nesting parent HWPT in the vfio > core. > */ > ?
Ok. I will go through the descriptions and comments in this series again and will try to improve it. Thanks, Shameer
