On Thu, Sep 18, 2025 at 04:57:44AM -0400, Zhenzhong Duan wrote:
> Introduce a new PCIIOMMUOps optional callback, get_viommu_flags() which
> allows to retrieve flags exposed by a vIOMMU. The first planned vIOMMU
> device flag is VIOMMU_FLAG_WANT_NESTING_PARENT that advertises the
> support of HW nested stage translation scheme and wants other sub-system
> like VFIO's cooperation to create nesting parent HWPT.
> 
> pci_device_get_viommu_flags() is a wrapper that can be called on a PCI
> device potentially protected by a vIOMMU.
> 
> get_viommu_flags() is designed to return 64bit bitmap of purely vIOMMU
> flags which are only determined by user's configuration, no host
> capabilities involved. Reasons are:
> 
> 1. host may has heterogeneous IOMMUs, each with different capabilities
> 2. this is migration friendly, return value is consistent between source
>    and target.
> 3. host IOMMU capabilities are passed to vIOMMU through set_iommu_device()
>    interface which have to be after attach_device(), when get_viommu_flags()
>    is called in attach_device(), there is no way for vIOMMU to get host
>    IOMMU capabilities yet, so only pure vIOMMU flags can be returned.

"no way" sounds too strong..

There is an iommufd_backend_get_device_info() call there. So, we
could have passed the host IOMMU capabilities to a vIOMMU. Just,
we chose not to (assuming for migration reason?).

>    See below sequence:
> 
>      vfio_device_attach():
>          iommufd_cdev_attach():
>              pci_device_get_viommu_flags() for HW nesting cap
>              create a nesting parent HWPT
>              attach device to the HWPT
>              vfio_device_hiod_create_and_realize() creating hiod
>      ...
>      pci_device_set_iommu_device(hiod)
> 
> Suggested-by: Yi Liu <yi.l....@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com>

Despite some nits, patch looks good to me:

Reviewed-by: Nicolin Chen <nicol...@nvidia.com>

> +enum {
> +    /* Nesting parent HWPT will be reused by vIOMMU to create nested HWPT */
> +     VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
> +};

How about adding a name and move the note here:

/*
 * Theoretical vIOMMU flags. Only determined by the vIOMMU device properties and
 * independent on the actual host IOMMU capabilities they may depend on.
 */
enum viommu_flags {
        ...
};

> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index bde9dca8e2..c54f2b53ae 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -462,6 +462,23 @@ typedef struct PCIIOMMUOps {
>       * @devfn: device and function number of the PCI device.
>       */
>      void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
> +    /**
> +     * @get_viommu_flags: get vIOMMU flags
> +     *
> +     * Optional callback, if not implemented, then vIOMMU doesn't support
> +     * exposing flags to other sub-system, e.g., VFIO. Each flag can be
> +     * an expectation or request to other sub-system or just a pure vIOMMU
> +     * capability. vIOMMU can choose which flags to expose.

The 2nd statement is somewhat redundant. Perhaps we could squash
it into the notes at enum viommu_flags above, if we really need.

> +     *
> +     * @opaque: the data passed to pci_setup_iommu().
> +     *
> +     * Returns: 64bit bitmap with each bit represents a flag that vIOMMU
> +     * wants to expose. See VIOMMU_FLAG_* in include/hw/iommu.h for all
> +     * possible flags currently used. These flags are theoretical which
> +     * are only determined by vIOMMU device properties and independent on
> +     * the actual host capabilities they may depend on.
> +     */
> +    uint64_t (*get_viommu_flags)(void *opaque);

With the notes above, we could simplify this:

     * Returns: bitmap with each representing a vIOMMU flag defined in
     * enum viommu_flags

> +/**
> + * pci_device_get_viommu_flags: get vIOMMU flags.
> + *
> + * Returns a 64bit bitmap with each bit represents a vIOMMU exposed
> + * flags, 0 if vIOMMU doesn't support that.
> + *
> + * @dev: PCI device pointer.
> + */
> +uint64_t pci_device_get_viommu_flags(PCIDevice *dev);
 
and could make this aligned too:

     * Returns: bitmap with each representing a vIOMMU flag defined in
     * enum viommu_flags. Or 0 if vIOMMU doesn't report any.

Nicolin

Reply via email to