Hi Eric, >-----Original Message----- >From: Eric Auger <eric.au...@redhat.com> >Sent: Wednesday, July 16, 2025 8:09 PM >To: Duan, Zhenzhong <zhenzhong.d...@intel.com>; >qemu-devel@nongnu.org >Cc: alex.william...@redhat.com; c...@redhat.com; m...@redhat.com; >jasow...@redhat.com; pet...@redhat.com; ddut...@redhat.com; >j...@nvidia.com; nicol...@nvidia.com; >shameerali.kolothum.th...@huawei.com; joao.m.mart...@oracle.com; >clement.mathieu--d...@eviden.com; Tian, Kevin <kevin.t...@intel.com>; Liu, >Yi L <yi.l....@intel.com>; Peng, Chao P <chao.p.p...@intel.com> >Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >IOMMUFD backed device when x-flts=on > >Hi Zhenzhong, > >On 7/16/25 12:31 PM, Duan, Zhenzhong wrote: >> Hi Eric, >> >>> -----Original Message----- >>> From: Eric Auger <eric.au...@redhat.com> >>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >>> IOMMUFD backed device when x-flts=on >>> >>> Hi Zhenzhong, >>> >>> On 7/8/25 1:05 PM, Zhenzhong Duan wrote: >>>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page >table >>>> is passed to host to construct nested page table. We need to check >>>> compatibility of some critical IOMMU capabilities between vIOMMU and >>>> host IOMMU to ensure guest stage-1 page table could be used by host. >>>> >>>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but >host >>>> does not, then this IOMMUFD backed device should fail. >>>> >>>> Even of the checks pass, for now we willingly reject the association >>>> because all the bits are not there yet. >>>> >>>> Signed-off-by: Yi Liu <yi.l....@intel.com> >>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>> --- >>>> hw/i386/intel_iommu.c | 30 >>> +++++++++++++++++++++++++++++- >>>> hw/i386/intel_iommu_internal.h | 1 + >>>> 2 files changed, 30 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>>> index e90fd2f28f..c57ca02cdd 100644 >>>> --- a/hw/i386/intel_iommu.c >>>> +++ b/hw/i386/intel_iommu.c >>>> @@ -40,6 +40,7 @@ >>>> #include "kvm/kvm_i386.h" >>>> #include "migration/vmstate.h" >>>> #include "trace.h" >>>> +#include "system/iommufd.h" >>>> >>>> /* context entry operations */ >>>> #define VTD_CE_GET_RID2PASID(ce) \ >>>> @@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState >*s, >>> HostIOMMUDevice *hiod, >>>> return true; >>>> } >>>> >>>> - error_setg(errp, "host device is uncompatible with stage-1 >>> translation"); >>>> +#ifdef CONFIG_IOMMUFD >>>> + struct HostIOMMUDeviceCaps *caps = &hiod->caps; >>>> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd; >>> I am now confused about how this relates to vtd_get_viommu_cap(). >>> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls >>> vtd_check_hiod() >>> viommu might return HW_NESTED_CAP through >>> PCIIOMMUOps.get_viommu_cap >>> without making sure the underlying HW IOMMU does support it. Is that a >>> correct understanding? Maybe we should clarify the calling order between >>> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU >>> prerequisites in vtd_get_viommu_cap() by enforcing this is called after >>> set_iommu_device. I think we should clarify the exact semantic of >>> get_viommu_cap().Thanks Eric >> My understanding get_viommu_cap() returns pure vIOMMU's capabilities >> with no host IOMMU's capabilities involved. >> >> For example, returned HW_NESTED_CAP means this vIOMMU has code >> to support creating nested hwpt and attaching, no matter if host IOMMU >> supports nesting or not. > >Then I think you need to refine the description in 2/20 to make this clear. >stating explicitly get_viommu_cap returns theoretical capabilities which >are independent on the actual host capabilities they may depend on.
Will do. For virtual vtd, we are unable to return capabilities depending on host capacities, Because different host IOMMU may have different capabilities, we want to return a consistent result only depending on user's cmdline config. >> >> The compatibility check between host IOMMU vs vIOMMU is done in >> set_iommu_device(), see vtd_check_hiod(). >> >> It's too late for VFIO to call get_viommu_cap() after set_iommu_device() >> because we need get_viommu_cap() to determine if creating nested parent >> hwpt or not at attaching stage. >isn't it possible to rework the call sequence? I think not. Current sequence: attach_device() get_viommu_cap() create hwpt ... create hiod set_iommu_device(hiod) Hiod realize needs iommufd,devid and hwpt_id which are ready after attach_device(). Thanks Zhenzhong