>-----Original Message----- >From: Cédric Le Goater <c...@redhat.com> >Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >compatibility check with host IOMMU cap/ecap > >On 4/25/24 10:46, Duan, Zhenzhong wrote: >> Hi Cédric, >> >>> -----Original Message----- >>> From: Cédric Le Goater <c...@redhat.com> >>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>> compatibility check with host IOMMU cap/ecap >>> >>> Hello Zhenzhong, >>> >>> On 4/18/24 10:42, Duan, Zhenzhong wrote: >>>> Hi Cédric, >>>> >>>>> -----Original Message----- >>>>> From: Cédric Le Goater <c...@redhat.com> >>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>> compatibility check with host IOMMU cap/ecap >>>>> >>>>> Hello Zhenzhong >>>>> >>>>> On 4/17/24 11:24, Duan, Zhenzhong wrote: >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Cédric Le Goater <c...@redhat.com> >>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>>>> compatibility check with host IOMMU cap/ecap >>>>>>> >>>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote: >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Cédric Le Goater <c...@redhat.com> >>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>>>>>> compatibility check with host IOMMU cap/ecap >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote: >>>>>>>>>> Hi Cédric, >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Cédric Le Goater <c...@redhat.com> >>>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to >do >>>>>>>>>>> compatibility check with host IOMMU cap/ecap >>>>>>>>>>> >>>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote: >>>>>>>>>>>> From: Yi Liu <yi.l....@intel.com> >>>>>>>>>>>> >>>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device) >>> should >>>>>>> not >>>>>>>>>>>> be passed to guest. >>>>>>>>>>>> >>>>>>>>>>>> Implementation details for different backends will be in >>> following >>>>>>>>> patches. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Yi Liu <yi.l....@intel.com> >>>>>>>>>>>> Signed-off-by: Yi Sun <yi.y....@linux.intel.com> >>>>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>>>>>>>>>> --- >>>>>>>>>>>> hw/i386/intel_iommu.c | 35 >>>>>>>>>>> +++++++++++++++++++++++++++++++++++ >>>>>>>>>>>> 1 file changed, 35 insertions(+) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>>>>>>>>>>> index 4f84e2e801..a49b587c73 100644 >>>>>>>>>>>> --- a/hw/i386/intel_iommu.c >>>>>>>>>>>> +++ b/hw/i386/intel_iommu.c >>>>>>>>>>>> @@ -35,6 +35,7 @@ >>>>>>>>>>>> #include "sysemu/kvm.h" >>>>>>>>>>>> #include "sysemu/dma.h" >>>>>>>>>>>> #include "sysemu/sysemu.h" >>>>>>>>>>>> +#include "sysemu/iommufd.h" >>>>>>>>>>>> #include "hw/i386/apic_internal.h" >>>>>>>>>>>> #include "kvm/kvm_i386.h" >>>>>>>>>>>> #include "migration/vmstate.h" >>>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace >>>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, >>>>>>>>>>>> return vtd_dev_as; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s, >>>>>>>>>>>> + HostIOMMUDevice *hiod, >>>>>>>>>>>> + Error **errp) >>>>>>>>>>>> +{ >>>>>>>>>>>> + return 0; >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s, >>>>>>>>>>>> + HostIOMMUDevice *hiod, >>>>>>>>>>>> + Error **errp) >>>>>>>>>>>> +{ >>>>>>>>>>>> + return 0; >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s, >>>>>>>>> VTDHostIOMMUDevice >>>>>>>>>>> *vtd_hdev, >>>>>>>>>>>> + Error **errp) >>>>>>>>>>>> +{ >>>>>>>>>>>> + HostIOMMUDevice *hiod = vtd_hdev->dev; >>>>>>>>>>>> + >>>>>>>>>>>> + if (object_dynamic_cast(OBJECT(hiod), >>> TYPE_HIOD_IOMMUFD)) >>>>> { >>>>>>>>>>>> + return vtd_check_iommufd_hdev(s, hiod, errp); >>>>>>>>>>>> + } >>>>>>>>>>>> + >>>>>>>>>>>> + return vtd_check_legacy_hdev(s, hiod, errp); >>>>>>>>>>>> +} >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think we should be using the .get_host_iommu_info() class >>> handler >>>>>>>>>>> instead. Can we refactor the code slightly to avoid this check on >>>>>>>>>>> the type ? >>>>>>>>>> >>>>>>>>>> There is some difficulty ini avoiding this check, the behavior of >>>>>>>>> vtd_check_legacy_hdev >>>>>>>>>> and vtd_check_iommufd_hdev are different especially after >>> nesting >>>>>>>>> support introduced. >>>>>>>>>> vtd_check_iommufd_hdev() has much wider check over >cap/ecap >>> bits >>>>>>>>> besides aw_bits. >>>>>>>>> >>>>>>>>> I think it is important to fully separate the vIOMMU model from >the >>>>>>>>> host IOMMU backing device. >>>>> >>>>> This comment is true for the structures also. >>>>> >>>>>>>>> Could we introduce a new HostIOMMUDeviceClass >>>>>>>>> handler .check_hdev() handler, which would >>>>> call .get_host_iommu_info() ? >>>>> >>>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO >should >>> be >>>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes >>>>> for the different backends. Each .get_host_iommu_info() >implementation >>>>> would translate the specific host iommu device data presentation >>>>> into the common 'HostIOMMUDeviceInfo', this is true for >host_aw_bits. >>>> >>>> I see, it's just not easy to define the unified elements in >>> HostIOMMUDeviceInfo >>>> so that they maps to bits or fields in host return IOMMU info. >>> >>> The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a >>> new >>> API needs to be completely defined for it. The IOMMU backend >>> implementation >>> could be anything, legacy, iommufd, iommufd v2, some other framework >>> and >>> the vIOMMU shouldn't be aware of its implementation. >>> >>> Exposing the kernel structures as done below should be avoided because >>> they are part of the QEMU <-> kernel IOMMUFD interface. >>> >>> >>>> Different platform returned host IOMMU info is platform specific. >>>> For vtd and siommu: >>>> >>>> struct iommu_hw_info_vtd { >>>> __u32 flags; >>>> __u32 __reserved; >>>> __aligned_u64 cap_reg; >>>> __aligned_u64 ecap_reg; >>>> }; >>>> >>>> struct iommu_hw_info_arm_smmuv3 { >>>> __u32 flags; >>>> __u32 __reserved; >>>> __u32 idr[6]; >>>> __u32 iidr; >>>> __u32 aidr; >>>> }; >>>> >>>> I can think of two kinds of declaration of HostIOMMUDeviceInfo: >>>> >>>> struct HostIOMMUDeviceInfo { >>>> uint8_t aw_bits; >>>> enum iommu_hw_info_type type; >>>> union { >>>> struct iommu_hw_info_vtd vtd; >>>> struct iommu_hw_info_arm_smmuv3; >>>> ...... >>>> } data; >>>> } >>>> >>>> or >>>> >>>> struct HostIOMMUDeviceInfo { >>>> uint8_t aw_bits; >>>> enum iommu_hw_info_type type; >>>> __u32 flags; >>>> __aligned_u64 cap_reg; >>>> __aligned_u64 ecap_reg; >>>> __u32 idr[6]; >>>> __u32 iidr; >>>> __u32 aidr; >>>> ...... >>>> } >>>> >>>> Not clear if any is your expected format. >>>> >>>>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo' >>>>> type attribute and host iommu device type definitions, or as you >>>>> suggested with a QOM interface. This is more complex however. In >>>>> this case, I would suggest to implement a .compatible() handler to >>>>> compare the host iommu device type with the vIOMMU type. >>>>> >>>>> The resulting check_hdev routine would look something like : >>>>> >>>>> static int vtd_check_hdev(IntelIOMMUState *s, >VTDHostIOMMUDevice >>>>> *vtd_hdev, >>>>> Error **errp) >>>>> { >>>>> HostIOMMUDevice *hiod = vtd_hdev->dev; >>>>> HostIOMMUDeviceClass *hiodc = >>>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod); >>>>> HostIOMMUDevice info; >>>>> int host_aw_bits, ret; >>>>> >>>>> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); >>>>> if (ret) { >>>>> return ret; >>>>> } >>>>> >>>>> ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s)); >>>>> if (ret) { >>>>> return ret; >>>>> } >>>>> >>>>> if (s->aw_bits > info.aw_bits) { >>>>> error_setg(errp, "aw-bits %d > host aw-bits %d", >>>>> s->aw_bits, info.aw_bits); >>>>> return -EINVAL; >>>>> } >>>>> } >>>>> >>>>> and the HostIOMMUDeviceClass::is_compatible() handler would call a >>>>> vIOMMUInterface::compatible() handler simply returning >>>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ? >>>> >>>> Not quite get what HostIOMMUDeviceClass::is_compatible() does. >>> >>> HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU >backend >>> to determine which IOMMU types are exposed by the host, then calls the >>> vIOMMUInterface::compatible() handler to do the compare. API is to be >>> defined. >>> >>> As a refinement, we could introduce in the vIOMMU <-> >HostIOMMUDevice >>> interface capabilities, or features, to check more precisely the level >>> of compatibility between the vIOMMU and the host IOMMU device. This >is >>> similar to what is done between QEMU and KVM. >>> >>> If you think this is too complex, include type in HostIOMMUDeviceInfo. >>> >>>> Currently legacy and IOMMUFD host device has different check logic, >how >>> it can help >>>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() >into >>> a single vtd_check_hdev()? >>> >>> IMHO, IOMMU shouldn't be aware of the IOMMU backend >implementation, >>> but >>> if you think the Intel vIOMMU should access directly the iommufd >backend >>> when available, then we should drop this proposal and revisit the design >>> to take a different approach. >> >> I implemented a draft following your suggestions so we could explore >further. >> See >https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_pre >q_v3_tmp >> >> In this draft, it uses .check_cap() to query HOST_IOMMU_DEVICE_CAP_xxx >> just like KVM CAPs. >> A common HostIOMMUDeviceCaps structure is introduced to be used by >> both legacy and iommufd backend. >> >> It indeed is cleaner. Only problem is I failed to implement .compatible() >> as all the check could go ahead by just calling check_cap(). >> Could you help a quick check to see if I misunderstood any of your >suggestion? > >Thanks for the changes. It looks cleaner and simpler ! Some comments, > >* HostIOMMUDeviceIOMMUFDClass seems useless as it is empty. I don't > remember if you told me already you had plans for future changes. > Sorry about that if this is the case. I forgot.
Never mind😊, reason is: In nesting series https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_nesting_rfcv2/ This commit https://github.com/yiliu1765/qemu/commit/581fc900aa296988eaa48abee6d68d3670faf8c9 implement [at|de]tach_hwpt handlers. So I add an extra layer of abstract HostIOMMUDeviceIOMMUFDClass to define [at|de]tach_hwpt handlers. > >* I would use the 'host_iommu_device_' prefix for external routines > which are part of the HostIOMMUDevice API and use 'hiod_' for > internal routines where it makes sense, to limit the name length for > instance. Good idea, will do. > >* I would rename HOST_IOMMU_DEVICE_CAP_IOMMUFD_V1 to > HOST_IOMMU_DEVICE_CAP_IOMMUFD. I mentioned IOMMUFD v2 as a > theoretical example of a different IOMMU interface. I don't think we > need to anticipate yet :) Will do. > >* HostIOMMUDeviceCaps is using 'enum iommu_hw_info_type' from > 'linux/iommufd.h', that's not my preferred choice but I won't > object. The result looks good. Ok, will keep it for now. We can change when you want in future. > >* HostIOMMUDevice now has a realize() routine to query the host IOMMU > capability for later usage. This is a good idea. However, you could > change the return value to bool and avoid the ERRP_GUARD() prologue. Will do. > >* Beware of : > > struct Range { > /* > * Do not access members directly, use the functions! > * A non-empty range has @lob <= @upb. > * An empty range has @lob == @upb + 1. > */ > uint64_t lob; /* inclusive lower bound */ > uint64_t upb; /* inclusive upper bound */ > }; I remember😊, will add the change in formal version. > > >* I think you could introduce a new VFIOIOMMUClass attribute. Let's > call it VFIOIOMMUClass::hiod_typename. The creation of >HostIOMMUDevice > would become generic and you could move : > > hiod= >HOST_IOMMU_DEVICE(object_new(TYPE_HOST_IOMMU_DEVICE_LEGACY_V >FIO)); > HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp); > if (*errp) { > object_unref(hiod); > return -EINVAL; > } > vbasedev->hiod = hiod; > > at the end of vfio_attach_device(). Good suggestion! Will do. Thanks Zhenzhong