Hi Eric, >-----Original Message----- >From: Eric Auger <eric.au...@redhat.com> >Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >IOMMUFD backed device when x-flts=on > >Hi Zhenzhong, > >On 7/17/25 5:47 AM, Duan, Zhenzhong wrote: >> Hi Eric, >> >>> -----Original Message----- >>> From: Eric Auger <eric.au...@redhat.com> >>> Sent: Wednesday, July 16, 2025 8:09 PM >>> To: Duan, Zhenzhong <zhenzhong.d...@intel.com>; >>> qemu-devel@nongnu.org >>> Cc: alex.william...@redhat.com; c...@redhat.com; m...@redhat.com; >>> jasow...@redhat.com; pet...@redhat.com; ddut...@redhat.com; >>> j...@nvidia.com; nicol...@nvidia.com; >>> shameerali.kolothum.th...@huawei.com; joao.m.mart...@oracle.com; >>> clement.mathieu--d...@eviden.com; Tian, Kevin <kevin.t...@intel.com>; >Liu, >>> Yi L <yi.l....@intel.com>; Peng, Chao P <chao.p.p...@intel.com> >>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >>> IOMMUFD backed device when x-flts=on >>> >>> Hi Zhenzhong, >>> >>> On 7/16/25 12:31 PM, Duan, Zhenzhong wrote: >>>> Hi Eric, >>>> >>>>> -----Original Message----- >>>>> From: Eric Auger <eric.au...@redhat.com> >>>>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >>>>> IOMMUFD backed device when x-flts=on >>>>> >>>>> Hi Zhenzhong, >>>>> >>>>> On 7/8/25 1:05 PM, Zhenzhong Duan wrote: >>>>>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page >>> table >>>>>> is passed to host to construct nested page table. We need to check >>>>>> compatibility of some critical IOMMU capabilities between vIOMMU >and >>>>>> host IOMMU to ensure guest stage-1 page table could be used by host. >>>>>> >>>>>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but >>> host >>>>>> does not, then this IOMMUFD backed device should fail. >>>>>> >>>>>> Even of the checks pass, for now we willingly reject the association >>>>>> because all the bits are not there yet. >>>>>> >>>>>> Signed-off-by: Yi Liu <yi.l....@intel.com> >>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>>>> --- >>>>>> hw/i386/intel_iommu.c | 30 >>>>> +++++++++++++++++++++++++++++- >>>>>> hw/i386/intel_iommu_internal.h | 1 + >>>>>> 2 files changed, 30 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>>>>> index e90fd2f28f..c57ca02cdd 100644 >>>>>> --- a/hw/i386/intel_iommu.c >>>>>> +++ b/hw/i386/intel_iommu.c >>>>>> @@ -40,6 +40,7 @@ >>>>>> #include "kvm/kvm_i386.h" >>>>>> #include "migration/vmstate.h" >>>>>> #include "trace.h" >>>>>> +#include "system/iommufd.h" >>>>>> >>>>>> /* context entry operations */ >>>>>> #define VTD_CE_GET_RID2PASID(ce) \ >>>>>> @@ -4355,7 +4356,34 @@ static bool >vtd_check_hiod(IntelIOMMUState >>> *s, >>>>> HostIOMMUDevice *hiod, >>>>>> return true; >>>>>> } >>>>>> >>>>>> - error_setg(errp, "host device is uncompatible with stage-1 >>>>> translation"); >>>>>> +#ifdef CONFIG_IOMMUFD >>>>>> + struct HostIOMMUDeviceCaps *caps = &hiod->caps; >>>>>> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd; >>>>> I am now confused about how this relates to vtd_get_viommu_cap(). >>>>> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls >>>>> vtd_check_hiod() >>>>> viommu might return HW_NESTED_CAP through >>>>> PCIIOMMUOps.get_viommu_cap >>>>> without making sure the underlying HW IOMMU does support it. Is that >a >>>>> correct understanding? Maybe we should clarify the calling order >between >>>>> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU >>>>> prerequisites in vtd_get_viommu_cap() by enforcing this is called after >>>>> set_iommu_device. I think we should clarify the exact semantic of >>>>> get_viommu_cap().Thanks Eric >>>> My understanding get_viommu_cap() returns pure vIOMMU's capabilities >>>> with no host IOMMU's capabilities involved. >>>> >>>> For example, returned HW_NESTED_CAP means this vIOMMU has code >>>> to support creating nested hwpt and attaching, no matter if host IOMMU >>>> supports nesting or not. >>> Then I think you need to refine the description in 2/20 to make this clear. >>> stating explicitly get_viommu_cap returns theoretical capabilities which >>> are independent on the actual host capabilities they may depend on. >> Will do. >> >> For virtual vtd, we are unable to return capabilities depending on host >capacities, >> Because different host IOMMU may have different capabilities, we want to >return >> a consistent result only depending on user's cmdline config. >ok >> >>>> The compatibility check between host IOMMU vs vIOMMU is done in >>>> set_iommu_device(), see vtd_check_hiod(). >>>> >>>> It's too late for VFIO to call get_viommu_cap() after set_iommu_device() >>>> because we need get_viommu_cap() to determine if creating nested >parent >>>> hwpt or not at attaching stage. >>> isn't it possible to rework the call sequence? >> I think not. Current sequence: >> >> attach_device() >> get_viommu_cap() >> create hwpt >> ... >> create hiod >> set_iommu_device(hiod) >> >> Hiod realize needs iommufd,devid and hwpt_id which are ready after >attach_device(). >OK. I would add this explanation in the commit msg too.
Got it, will do. Thanks Zhenzhong