Hi Nocolin,

> -----Original Message-----
> From: Nicolin Chen <[email protected]>
> Sent: 30 September 2025 01:11
> To: Shameer Kolothum <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Jason Gunthorpe
> <[email protected]>; [email protected]; [email protected]; Nathan
> Chen <[email protected]>; Matt Ochs <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH v4 06/27] hw/arm/smmuv3-accel: Restrict accelerated
> SMMUv3 to vfio-pci endpoints with iommufd
> 
> On Mon, Sep 29, 2025 at 02:36:22PM +0100, Shameer Kolothum wrote:
> > Accelerated SMMUv3 is only useful when the device can take advantage of
> > the host's SMMUv3 in nested mode. To keep things simple and correct, we
> > only allow this feature for vfio-pci endpoint devices that use the iommufd
> > backend. We also allow non-endpoint emulated devices like PCI bridges and
> > root ports, so that users can plug in these vfio-pci devices. We can only
> > enforce this if devices are cold plugged. For hotplug cases, give 
> > appropriate
> > warnings.
> >
> > Another reason for this limit is to avoid problems with IOTLB
> > invalidations. Some commands (e.g., CMD_TLBI_NH_ASID) lack an
> associated
> > SID, making it difficult to trace the originating device. If we allowed
> > emulated endpoint devices, QEMU would have to invalidate both its own
> > software IOTLB and the host's hardware IOTLB, which could slow things
> > down.
> >
> > Since vfio-pci devices in nested mode rely on the host SMMUv3's nested
> > translation (S1+S2), their get_address_space() callback must return the
> > system address space so that VFIO core can setup correct S2 mappings
> > for guest RAM.
> >
> > So in short:
> >  - vfio-pci devices(with iommufd as backend) return the system address
> >    space.
> >  - bridges and root ports return the IOMMU address space.
> >
> > Signed-off-by: Shameer Kolothum <[email protected]>
> 
> Reviewed-by: Nicolin Chen <[email protected]>
> 
> With some nits:
> 
> > +    /*
> > +     * We return the system address for vfio-pci devices(with iommufd as
> > +     * backend) so that the VFIO core can set up Stage-2 (S2) mappings for
> > +     * guest RAM. This is needed because, in the accelerated SMMUv3 case,
> > +     * the host SMMUv3 runs in nested (S1 + S2)  mode where the guest
> > +     * manages its own S1 page tables while the host manages S2.
> > +     *
> > +     * We are using the global &address_space_memory here, as this will
> ensure
> > +     * same system address space pointer for all devices behind the
> accelerated
> > +     * SMMUv3s in a VM. That way VFIO/iommufd can reuse a single IOAS ID
> in
> > +     * iommufd_cdev_attach(), allowing the Stage-2 page tables to be
> shared
> > +     * within the VM instead of duplicating them for every SMMUv3
> instance.
> > +     */
> > +    if (vfio_pci) {
> > +        return &address_space_memory;
> 
> How about:
> 
>     /*
>      * In the accelerated case, a vfio-pci device passed through via the 
> iommufd
>      * backend must stay in the system address space, as it is always 
> translated
>      * by its physical SMMU (using a stage-2-only STE or a nested STE), in 
> which
>      * case the stage-2 nesting parent page table is allocated by the vfio 
> core,
>      * backing up the system address space.
>      *
>      * So, return the system address space via the global
> address_space_memory.
>      * The shared address_space_memory also allows devices under different
> vSMMU
>      * instances in a VM to reuse a single nesting parent HWPT in the vfio 
> core.
>      */
> ?

Ok. I will go through the descriptions and comments in this series again and 
will try to improve it.

Thanks,
Shameer


Reply via email to