Hi Eric,

>-----Original Message-----
>From: Eric Auger <eric.au...@redhat.com>
>Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with
>IOMMUFD backed device when x-flts=on
>
>Hi Zhenzhong,
>
>On 7/17/25 5:47 AM, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>>> -----Original Message-----
>>> From: Eric Auger <eric.au...@redhat.com>
>>> Sent: Wednesday, July 16, 2025 8:09 PM
>>> To: Duan, Zhenzhong <zhenzhong.d...@intel.com>;
>>> qemu-devel@nongnu.org
>>> Cc: alex.william...@redhat.com; c...@redhat.com; m...@redhat.com;
>>> jasow...@redhat.com; pet...@redhat.com; ddut...@redhat.com;
>>> j...@nvidia.com; nicol...@nvidia.com;
>>> shameerali.kolothum.th...@huawei.com; joao.m.mart...@oracle.com;
>>> clement.mathieu--d...@eviden.com; Tian, Kevin <kevin.t...@intel.com>;
>Liu,
>>> Yi L <yi.l....@intel.com>; Peng, Chao P <chao.p.p...@intel.com>
>>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with
>>> IOMMUFD backed device when x-flts=on
>>>
>>> Hi Zhenzhong,
>>>
>>> On 7/16/25 12:31 PM, Duan, Zhenzhong wrote:
>>>> Hi Eric,
>>>>
>>>>> -----Original Message-----
>>>>> From: Eric Auger <eric.au...@redhat.com>
>>>>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with
>>>>> IOMMUFD backed device when x-flts=on
>>>>>
>>>>> Hi Zhenzhong,
>>>>>
>>>>> On 7/8/25 1:05 PM, Zhenzhong Duan wrote:
>>>>>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page
>>> table
>>>>>> is passed to host to construct nested page table. We need to check
>>>>>> compatibility of some critical IOMMU capabilities between vIOMMU
>and
>>>>>> host IOMMU to ensure guest stage-1 page table could be used by host.
>>>>>>
>>>>>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but
>>> host
>>>>>> does not, then this IOMMUFD backed device should fail.
>>>>>>
>>>>>> Even of the checks pass, for now we willingly reject the association
>>>>>> because all the bits are not there yet.
>>>>>>
>>>>>> Signed-off-by: Yi Liu <yi.l....@intel.com>
>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com>
>>>>>> ---
>>>>>>  hw/i386/intel_iommu.c          | 30
>>>>> +++++++++++++++++++++++++++++-
>>>>>>  hw/i386/intel_iommu_internal.h |  1 +
>>>>>>  2 files changed, 30 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>> index e90fd2f28f..c57ca02cdd 100644
>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>> @@ -40,6 +40,7 @@
>>>>>>  #include "kvm/kvm_i386.h"
>>>>>>  #include "migration/vmstate.h"
>>>>>>  #include "trace.h"
>>>>>> +#include "system/iommufd.h"
>>>>>>
>>>>>>  /* context entry operations */
>>>>>>  #define VTD_CE_GET_RID2PASID(ce) \
>>>>>> @@ -4355,7 +4356,34 @@ static bool
>vtd_check_hiod(IntelIOMMUState
>>> *s,
>>>>> HostIOMMUDevice *hiod,
>>>>>>          return true;
>>>>>>      }
>>>>>>
>>>>>> -    error_setg(errp, "host device is uncompatible with stage-1
>>>>> translation");
>>>>>> +#ifdef CONFIG_IOMMUFD
>>>>>> +    struct HostIOMMUDeviceCaps *caps = &hiod->caps;
>>>>>> +    struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
>>>>> I am now confused about how this relates to vtd_get_viommu_cap().
>>>>> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls
>>>>> vtd_check_hiod()
>>>>> viommu might return HW_NESTED_CAP through
>>>>> PCIIOMMUOps.get_viommu_cap
>>>>> without making sure the underlying HW IOMMU does support it. Is that
>a
>>>>> correct understanding? Maybe we should clarify the calling order
>between
>>>>> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU
>>>>> prerequisites in vtd_get_viommu_cap() by enforcing this is called after
>>>>> set_iommu_device. I think we should clarify the exact semantic of
>>>>> get_viommu_cap().Thanks Eric
>>>> My understanding get_viommu_cap() returns pure vIOMMU's capabilities
>>>> with no host IOMMU's capabilities involved.
>>>>
>>>> For example, returned HW_NESTED_CAP means this vIOMMU has code
>>>> to support creating nested hwpt and attaching, no matter if host IOMMU
>>>> supports nesting or not.
>>> Then I think you need to refine the description in 2/20 to make this clear.
>>> stating explicitly get_viommu_cap returns theoretical capabilities which
>>> are independent on the actual host capabilities they may depend on.
>> Will do.
>>
>> For virtual vtd, we are unable to return capabilities depending on host
>capacities,
>> Because different host IOMMU may have different capabilities, we want to
>return
>> a consistent result only depending on user's cmdline config.
>ok
>>
>>>> The compatibility check between host IOMMU vs vIOMMU is done in
>>>> set_iommu_device(), see vtd_check_hiod().
>>>>
>>>> It's too late for VFIO to call get_viommu_cap() after set_iommu_device()
>>>> because we need get_viommu_cap() to determine if creating nested
>parent
>>>> hwpt or not at attaching stage.
>>> isn't it possible to rework the call sequence?
>> I think not. Current sequence:
>>
>> attach_device()
>>     get_viommu_cap()
>>     create hwpt
>> ...
>> create hiod
>> set_iommu_device(hiod)
>>
>> Hiod realize needs iommufd,devid and hwpt_id which are ready after
>attach_device().
>OK. I would add this explanation in the commit msg too.

Got it, will do.

Thanks
Zhenzhong

Reply via email to