>-----Original Message-----
>From: Cédric Le Goater <c...@redhat.com>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>On 4/25/24 10:46, Duan, Zhenzhong wrote:
>> Hi Cédric,
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <c...@redhat.com>
>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>> compatibility check with host IOMMU cap/ecap
>>>
>>> Hello Zhenzhong,
>>>
>>> On 4/18/24 10:42, Duan, Zhenzhong wrote:
>>>> Hi Cédric,
>>>>
>>>>> -----Original Message-----
>>>>> From: Cédric Le Goater <c...@redhat.com>
>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>> compatibility check with host IOMMU cap/ecap
>>>>>
>>>>> Hello Zhenzhong
>>>>>
>>>>> On 4/17/24 11:24, Duan, Zhenzhong wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Cédric Le Goater <c...@redhat.com>
>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>
>>>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Cédric Le Goater <c...@redhat.com>
>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>>>>>>>> Hi Cédric,
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Cédric Le Goater <c...@redhat.com>
>>>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to
>do
>>>>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>>>>
>>>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>>>>>>>> From: Yi Liu <yi.l....@intel.com>
>>>>>>>>>>>>
>>>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device)
>>> should
>>>>>>> not
>>>>>>>>>>>> be passed to guest.
>>>>>>>>>>>>
>>>>>>>>>>>> Implementation details for different backends will be in
>>> following
>>>>>>>>> patches.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Yi Liu <yi.l....@intel.com>
>>>>>>>>>>>> Signed-off-by: Yi Sun <yi.y....@linux.intel.com>
>>>>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>        hw/i386/intel_iommu.c | 35
>>>>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>        1 file changed, 35 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>>>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>>>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>>>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>>>>>>>> @@ -35,6 +35,7 @@
>>>>>>>>>>>>        #include "sysemu/kvm.h"
>>>>>>>>>>>>        #include "sysemu/dma.h"
>>>>>>>>>>>>        #include "sysemu/sysemu.h"
>>>>>>>>>>>> +#include "sysemu/iommufd.h"
>>>>>>>>>>>>        #include "hw/i386/apic_internal.h"
>>>>>>>>>>>>        #include "kvm/kvm_i386.h"
>>>>>>>>>>>>        #include "migration/vmstate.h"
>>>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>>>>>>>            return vtd_dev_as;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>>>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>>>>>>>> +                                 Error **errp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>>>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>>>>>>>> +                                  Error **errp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>>>>>>>> VTDHostIOMMUDevice
>>>>>>>>>>> *vtd_hdev,
>>>>>>>>>>>> +                          Error **errp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(hiod),
>>> TYPE_HIOD_IOMMUFD))
>>>>> {
>>>>>>>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>>>>>>>> +}
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think we should be using the .get_host_iommu_info() class
>>> handler
>>>>>>>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>>>>>>>> the type ?
>>>>>>>>>>
>>>>>>>>>> There is some difficulty ini avoiding this check, the behavior of
>>>>>>>>> vtd_check_legacy_hdev
>>>>>>>>>> and vtd_check_iommufd_hdev are different especially after
>>> nesting
>>>>>>>>> support introduced.
>>>>>>>>>> vtd_check_iommufd_hdev() has much wider check over
>cap/ecap
>>> bits
>>>>>>>>> besides aw_bits.
>>>>>>>>>
>>>>>>>>> I think it is important to fully separate the vIOMMU model from
>the
>>>>>>>>> host IOMMU backing device.
>>>>>
>>>>> This comment is true for the structures also.
>>>>>
>>>>>>>>> Could we introduce a new HostIOMMUDeviceClass
>>>>>>>>> handler .check_hdev() handler, which would
>>>>> call .get_host_iommu_info() ?
>>>>>
>>>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO
>should
>>> be
>>>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes
>>>>> for the different backends. Each .get_host_iommu_info()
>implementation
>>>>> would translate the specific host iommu device data presentation
>>>>> into the common 'HostIOMMUDeviceInfo', this is true for
>host_aw_bits.
>>>>
>>>> I see, it's just not easy to define the unified elements in
>>> HostIOMMUDeviceInfo
>>>> so that they maps to bits or fields in host return IOMMU info.
>>>
>>> The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a
>>> new
>>> API needs to be completely defined for it. The IOMMU backend
>>> implementation
>>> could be anything, legacy, iommufd, iommufd v2, some other framework
>>> and
>>> the vIOMMU shouldn't be aware of its implementation.
>>>
>>> Exposing the kernel structures as done below should be avoided because
>>> they are part of the QEMU <-> kernel IOMMUFD interface.
>>>
>>>
>>>> Different platform returned host IOMMU info is platform specific.
>>>> For vtd and siommu:
>>>>
>>>> struct iommu_hw_info_vtd {
>>>>           __u32 flags;
>>>>           __u32 __reserved;
>>>>           __aligned_u64 cap_reg;
>>>>           __aligned_u64 ecap_reg;
>>>> };
>>>>
>>>> struct iommu_hw_info_arm_smmuv3 {
>>>>          __u32 flags;
>>>>          __u32 __reserved;
>>>>          __u32 idr[6];
>>>>          __u32 iidr;
>>>>          __u32 aidr;
>>>> };
>>>>
>>>> I can think of two kinds of declaration of HostIOMMUDeviceInfo:
>>>>
>>>> struct HostIOMMUDeviceInfo {
>>>>       uint8_t aw_bits;
>>>>       enum iommu_hw_info_type type;
>>>>       union {
>>>>           struct iommu_hw_info_vtd vtd;
>>>>           struct iommu_hw_info_arm_smmuv3;
>>>>           ......
>>>>       } data;
>>>> }
>>>>
>>>> or
>>>>
>>>> struct HostIOMMUDeviceInfo {
>>>>       uint8_t aw_bits;
>>>>       enum iommu_hw_info_type type;
>>>>       __u32 flags;
>>>>       __aligned_u64 cap_reg;
>>>>       __aligned_u64 ecap_reg;
>>>>       __u32 idr[6];
>>>>       __u32 iidr;
>>>>       __u32 aidr;
>>>>      ......
>>>> }
>>>>
>>>> Not clear if any is your expected format.
>>>>
>>>>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
>>>>> type attribute and host iommu device type definitions, or as you
>>>>> suggested with a QOM interface. This is more complex however. In
>>>>> this case, I would suggest to implement a .compatible() handler to
>>>>> compare the host iommu device type with the vIOMMU type.
>>>>>
>>>>> The resulting check_hdev routine would look something like :
>>>>>
>>>>> static int vtd_check_hdev(IntelIOMMUState *s,
>VTDHostIOMMUDevice
>>>>> *vtd_hdev,
>>>>>                             Error **errp)
>>>>> {
>>>>>       HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>       HostIOMMUDeviceClass *hiodc =
>>>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>>>       HostIOMMUDevice info;
>>>>>       int host_aw_bits, ret;
>>>>>
>>>>>       ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>>>>       if (ret) {
>>>>>           return ret;
>>>>>       }
>>>>>
>>>>>       ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
>>>>>       if (ret) {
>>>>>           return ret;
>>>>>       }
>>>>>
>>>>>       if (s->aw_bits > info.aw_bits) {
>>>>>           error_setg(errp, "aw-bits %d > host aw-bits %d",
>>>>>                      s->aw_bits, info.aw_bits);
>>>>>           return -EINVAL;
>>>>>       }
>>>>> }
>>>>>
>>>>> and the HostIOMMUDeviceClass::is_compatible() handler would call a
>>>>> vIOMMUInterface::compatible() handler simply returning
>>>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?
>>>>
>>>> Not quite get what HostIOMMUDeviceClass::is_compatible() does.
>>>
>>> HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU
>backend
>>> to determine which IOMMU types are exposed by the host, then calls the
>>> vIOMMUInterface::compatible() handler to do the compare. API is to be
>>> defined.
>>>
>>> As a refinement, we could introduce in the vIOMMU <->
>HostIOMMUDevice
>>> interface capabilities, or features, to check more precisely the level
>>> of compatibility between the vIOMMU and the host IOMMU device. This
>is
>>> similar to what is done between QEMU and KVM.
>>>
>>> If you think this is too complex, include type in HostIOMMUDeviceInfo.
>>>
>>>> Currently legacy and IOMMUFD host device has different check logic,
>how
>>> it can help
>>>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev()
>into
>>> a single vtd_check_hdev()?
>>>
>>> IMHO, IOMMU shouldn't be aware of the IOMMU backend
>implementation,
>>> but
>>> if you think the Intel vIOMMU should access directly the iommufd
>backend
>>> when available, then we should drop this proposal and revisit the design
>>> to take a different approach.
>>
>> I implemented a draft following your suggestions so we could explore
>further.
>> See
>https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_pre
>q_v3_tmp
>>
>> In this draft, it uses .check_cap() to query HOST_IOMMU_DEVICE_CAP_xxx
>> just like KVM CAPs.
>> A common HostIOMMUDeviceCaps structure is introduced to be used by
>> both legacy and iommufd backend.
>>
>> It indeed is cleaner. Only problem is I failed to implement .compatible()
>> as all the check could go ahead by just calling check_cap().
>> Could you help a quick check to see if I misunderstood any of your
>suggestion?
>
>Thanks for the changes. It looks cleaner and simpler ! Some comments,
>
>* HostIOMMUDeviceIOMMUFDClass seems useless as it is empty. I don't
>   remember if you told me already you had plans for future changes.
>   Sorry about that if this is the case. I forgot.

Never mind😊, reason is:

In nesting series 
https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_nesting_rfcv2/
This commit 
https://github.com/yiliu1765/qemu/commit/581fc900aa296988eaa48abee6d68d3670faf8c9
implement [at|de]tach_hwpt handlers.

So I add an extra layer of abstract HostIOMMUDeviceIOMMUFDClass to define
[at|de]tach_hwpt handlers.

>
>* I would use the 'host_iommu_device_' prefix for external routines
>   which are part of the HostIOMMUDevice API and use 'hiod_' for
>   internal routines where it makes sense, to limit the name length for
>   instance.

Good idea, will do.

>
>* I would rename HOST_IOMMU_DEVICE_CAP_IOMMUFD_V1 to
>   HOST_IOMMU_DEVICE_CAP_IOMMUFD. I mentioned IOMMUFD v2 as a
>   theoretical example of a different IOMMU interface. I don't think we
>   need to anticipate yet :)

Will do.

>
>* HostIOMMUDeviceCaps is using 'enum iommu_hw_info_type' from
>   'linux/iommufd.h', that's not my preferred choice but I won't
>   object. The result looks good.

Ok, will keep it for now. We can change when you want in future.

>
>* HostIOMMUDevice now has a realize() routine to query the host IOMMU
>   capability for later usage. This is a good idea. However, you could
>   change the return value to bool and avoid the ERRP_GUARD() prologue.

Will do.

>
>* Beware of :
>
>     struct Range {
>         /*
>          * Do not access members directly, use the functions!
>          * A non-empty range has @lob <= @upb.
>          * An empty range has @lob == @upb + 1.
>          */
>         uint64_t lob;        /* inclusive lower bound */
>         uint64_t upb;        /* inclusive upper bound */
>     };

I remember😊, will add the change in formal version.

>
>
>* I think you could introduce a new VFIOIOMMUClass attribute. Let's
>   call it VFIOIOMMUClass::hiod_typename. The creation of
>HostIOMMUDevice
>   would become generic and you could move :
>
>     hiod=
>HOST_IOMMU_DEVICE(object_new(TYPE_HOST_IOMMU_DEVICE_LEGACY_V
>FIO));
>     HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
>     if (*errp) {
>         object_unref(hiod);
>         return -EINVAL;
>     }
>     vbasedev->hiod = hiod;
>
>   at the end of vfio_attach_device().

Good suggestion! Will do.

Thanks
Zhenzhong

Reply via email to