>-----Original Message----- >From: Cédric Le Goater <c...@redhat.com> >Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >compatibility check with host IOMMU cap/ecap > >Hello Zhenzhong, > >On 4/18/24 10:42, Duan, Zhenzhong wrote: >> Hi Cédric, >> >>> -----Original Message----- >>> From: Cédric Le Goater <c...@redhat.com> >>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>> compatibility check with host IOMMU cap/ecap >>> >>> Hello Zhenzhong >>> >>> On 4/17/24 11:24, Duan, Zhenzhong wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Cédric Le Goater <c...@redhat.com> >>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>> compatibility check with host IOMMU cap/ecap >>>>> >>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote: >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Cédric Le Goater <c...@redhat.com> >>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>>>> compatibility check with host IOMMU cap/ecap >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote: >>>>>>>> Hi Cédric, >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Cédric Le Goater <c...@redhat.com> >>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>>>>>> compatibility check with host IOMMU cap/ecap >>>>>>>>> >>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote: >>>>>>>>>> From: Yi Liu <yi.l....@intel.com> >>>>>>>>>> >>>>>>>>>> If check fails, the host side device(either vfio or vdpa device) >should >>>>> not >>>>>>>>>> be passed to guest. >>>>>>>>>> >>>>>>>>>> Implementation details for different backends will be in >following >>>>>>> patches. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Yi Liu <yi.l....@intel.com> >>>>>>>>>> Signed-off-by: Yi Sun <yi.y....@linux.intel.com> >>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>>>>>>>> --- >>>>>>>>>> hw/i386/intel_iommu.c | 35 >>>>>>>>> +++++++++++++++++++++++++++++++++++ >>>>>>>>>> 1 file changed, 35 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>>>>>>>>> index 4f84e2e801..a49b587c73 100644 >>>>>>>>>> --- a/hw/i386/intel_iommu.c >>>>>>>>>> +++ b/hw/i386/intel_iommu.c >>>>>>>>>> @@ -35,6 +35,7 @@ >>>>>>>>>> #include "sysemu/kvm.h" >>>>>>>>>> #include "sysemu/dma.h" >>>>>>>>>> #include "sysemu/sysemu.h" >>>>>>>>>> +#include "sysemu/iommufd.h" >>>>>>>>>> #include "hw/i386/apic_internal.h" >>>>>>>>>> #include "kvm/kvm_i386.h" >>>>>>>>>> #include "migration/vmstate.h" >>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace >>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, >>>>>>>>>> return vtd_dev_as; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s, >>>>>>>>>> + HostIOMMUDevice *hiod, >>>>>>>>>> + Error **errp) >>>>>>>>>> +{ >>>>>>>>>> + return 0; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s, >>>>>>>>>> + HostIOMMUDevice *hiod, >>>>>>>>>> + Error **errp) >>>>>>>>>> +{ >>>>>>>>>> + return 0; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s, >>>>>>> VTDHostIOMMUDevice >>>>>>>>> *vtd_hdev, >>>>>>>>>> + Error **errp) >>>>>>>>>> +{ >>>>>>>>>> + HostIOMMUDevice *hiod = vtd_hdev->dev; >>>>>>>>>> + >>>>>>>>>> + if (object_dynamic_cast(OBJECT(hiod), >TYPE_HIOD_IOMMUFD)) >>> { >>>>>>>>>> + return vtd_check_iommufd_hdev(s, hiod, errp); >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + return vtd_check_legacy_hdev(s, hiod, errp); >>>>>>>>>> +} >>>>>>>>> >>>>>>>>> >>>>>>>>> I think we should be using the .get_host_iommu_info() class >handler >>>>>>>>> instead. Can we refactor the code slightly to avoid this check on >>>>>>>>> the type ? >>>>>>>> >>>>>>>> There is some difficulty ini avoiding this check, the behavior of >>>>>>> vtd_check_legacy_hdev >>>>>>>> and vtd_check_iommufd_hdev are different especially after >nesting >>>>>>> support introduced. >>>>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap >bits >>>>>>> besides aw_bits. >>>>>>> >>>>>>> I think it is important to fully separate the vIOMMU model from the >>>>>>> host IOMMU backing device. >>> >>> This comment is true for the structures also. >>> >>>>>>> Could we introduce a new HostIOMMUDeviceClass >>>>>>> handler .check_hdev() handler, which would >>> call .get_host_iommu_info() ? >>> >>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should >be >>> a common structure 'HostIOMMUDeviceInfo' holding all attributes >>> for the different backends. Each .get_host_iommu_info() implementation >>> would translate the specific host iommu device data presentation >>> into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits. >> >> I see, it's just not easy to define the unified elements in >HostIOMMUDeviceInfo >> so that they maps to bits or fields in host return IOMMU info. > >The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a >new >API needs to be completely defined for it. The IOMMU backend >implementation >could be anything, legacy, iommufd, iommufd v2, some other framework >and >the vIOMMU shouldn't be aware of its implementation.
Kernel IOMMUFD provide a unified uAPI supporting different platform, arm, x86, etc. Userspace representive IOMMUFD backend(backend/iommufd.c) does the same, I'm treating it as a general library for other modules. That means other modules, i.e., vfio, vdpa and vIOMMUs can call into the functions iommufd library provides. VFIO supports two backends, "VFIO legacy backend"(driver/vfio/container.c) and "VFIO IOMMUFD backend" (driver/vfio/iommufd.c). The 2nd one is a thin layer calling into "general IOMMUFD backend". "VFIO legacy backend" is targeting VFIO usage so its scope is small than "general IOMMUFD backend". I think we don't need to hide the general IOMMUFD backend from vIOMMU, VFIO only need to provide critical data, such as iommufd/devid/etc to vIOMMU, vIOMMU can call "general IOMMUFD backend" itself. It looks unnecessary and inefficient to wrap "general IOMMUFD backend" provided functions attach/detach_hwpt, alloc_ioas, free_id, etc in vIOMMU <-> HostIOMMUDevice interface. On the other hand, legacy will be deprecated by iommufd finally. iommufd v2 should be backward compatible with iommufd v1, v2 should not break existing iommufd related code in vIOMMU. If it does, we treat it same as an other framework. For other framework, if it's only for VFIO usage and support nesting, I remember the oldest design is working that way and replaced by IOMMUFD. If it's also a general framework, I think we don't need to hide it under VFIO, we can extend vIOMMU to call into it too. > >Exposing the kernel structures as done below should be avoided because >they are part of the QEMU <-> kernel IOMMUFD interface. But VFIO is only an agent calling kernel for host IOMMU info and pass it to vIOMMU. VFIO doesn't need to know details of the kernel structures. Is that ok? For legacy VFIO backend, kernel doesn't provide a uAPI to get host IOMMU info. So VFIO constructs a host_iommu_info for vIOMMU with its knowledge. > > >> Different platform returned host IOMMU info is platform specific. >> For vtd and siommu: >> >> struct iommu_hw_info_vtd { >> __u32 flags; >> __u32 __reserved; >> __aligned_u64 cap_reg; >> __aligned_u64 ecap_reg; >> }; >> >> struct iommu_hw_info_arm_smmuv3 { >> __u32 flags; >> __u32 __reserved; >> __u32 idr[6]; >> __u32 iidr; >> __u32 aidr; >> }; >> >> I can think of two kinds of declaration of HostIOMMUDeviceInfo: >> >> struct HostIOMMUDeviceInfo { >> uint8_t aw_bits; >> enum iommu_hw_info_type type; >> union { >> struct iommu_hw_info_vtd vtd; >> struct iommu_hw_info_arm_smmuv3; >> ...... >> } data; >> } >> >> or >> >> struct HostIOMMUDeviceInfo { >> uint8_t aw_bits; >> enum iommu_hw_info_type type; >> __u32 flags; >> __aligned_u64 cap_reg; >> __aligned_u64 ecap_reg; >> __u32 idr[6]; >> __u32 iidr; >> __u32 aidr; >> ...... >> } >> >> Not clear if any is your expected format. >> >>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo' >>> type attribute and host iommu device type definitions, or as you >>> suggested with a QOM interface. This is more complex however. In >>> this case, I would suggest to implement a .compatible() handler to >>> compare the host iommu device type with the vIOMMU type. >>> >>> The resulting check_hdev routine would look something like : >>> >>> static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice >>> *vtd_hdev, >>> Error **errp) >>> { >>> HostIOMMUDevice *hiod = vtd_hdev->dev; >>> HostIOMMUDeviceClass *hiodc = >>> HOST_IOMMU_DEVICE_GET_CLASS(hiod); >>> HostIOMMUDevice info; >>> int host_aw_bits, ret; >>> >>> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); >>> if (ret) { >>> return ret; >>> } >>> >>> ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s)); >>> if (ret) { >>> return ret; >>> } >>> >>> if (s->aw_bits > info.aw_bits) { >>> error_setg(errp, "aw-bits %d > host aw-bits %d", >>> s->aw_bits, info.aw_bits); >>> return -EINVAL; >>> } >>> } >>> >>> and the HostIOMMUDeviceClass::is_compatible() handler would call a >>> vIOMMUInterface::compatible() handler simply returning >>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ? >> >> Not quite get what HostIOMMUDeviceClass::is_compatible() does. > >HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU backend >to determine which IOMMU types are exposed by the host, then calls the >vIOMMUInterface::compatible() handler to do the compare. API is to be >defined. > >As a refinement, we could introduce in the vIOMMU <-> HostIOMMUDevice >interface capabilities, or features, to check more precisely the level >of compatibility between the vIOMMU and the host IOMMU device. This is >similar to what is done between QEMU and KVM. > >If you think this is too complex, include type in HostIOMMUDeviceInfo. Yea, feel hard to define a common capabilities or features in HostIOMMUDeviceInfo for different host platform IOMMUs. > >> Currently legacy and IOMMUFD host device has different check logic, how >it can help >> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into >a single vtd_check_hdev()? > >IMHO, IOMMU shouldn't be aware of the IOMMU backend implementation, >but >if you think the Intel vIOMMU should access directly the iommufd backend >when available, then we should drop this proposal and revisit the design >to take a different approach. See my explanation above of why iommufd backend isn't separated from vIOMMU. I like your suggestion of QOM which helped in device attach/detach_hwpt interface. We only need to discuss further to get a consensus on host_iommu_info and host iommu device interface with vIOMMU. > >> Below is the two functions after nesting series, for your easy reference: >> >> static int vtd_check_legacy_hdev() >> { >> if (s->scalable_modern) { >> /* Modern vIOMMU and legacy backend */ >> error_setg(errp, "Need IOMMUFD backend in scalable modern >mode"); >> return -EINVAL; >> } > >This part would typically go in the compatible() handler. > >> >> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); >> if (ret) { >> return ret; >> } >> >> if (s->aw_bits > info.aw_bits) { >> error_setg(errp, "aw-bits %d > host aw-bits %d", >> s->aw_bits, info.aw_bits); >> return -EINVAL; >> } >> >> return 0; >> } >> >> static int vtd_check_iommufd_hdev(IntelIOMMUState *s, >> VTDHostIOMMUDevice *vtd_hdev, >> Error **errp) >> { >> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); >> if (ret) { >> return ret; >> } >> >> if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) { >> error_setg(errp, "IOMMU hardware is not compatible"); >> return -EINVAL; >> } >> >> vtd = &info.data.vtd; >> host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1; >> if (s->aw_bits > host_aw_bits) { >> error_setg(errp, "aw-bits %d > host aw-bits %d", >> s->aw_bits, host_aw_bits); >> return -EINVAL; >> } >> >> if (!s->scalable_modern) { >> goto done; >> } >> >> if (!(vtd->ecap_reg & VTD_ECAP_NEST)) { >> error_setg(errp, "Host IOMMU doesn't support nested translation"); >> return -EINVAL; >> } >> >> if (s->fl1gp && !(vtd->cap_reg & VTD_CAP_FL1GP)) { >> error_setg(errp, "Stage-1 1GB huge page is unsupported by host >IOMMU"); >> return -EINVAL; >> } > >These checks above would typically go in the compatible() handler also. I'm afraid we still need "if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)))" check in .compatible() so that we can do different check for legacy and iommufd hdev. > >Now, the question is how useful will that framework be if hotplugging >devices always fail because the vIOMMU and host IOMMU devices have >incompatible settings/capabilities ? Shouldn't we also add properties >at the vIOMMU level ? Yes, we should. We do add a property "x-cap-fl1gp" for s->fl1gp. See https://github.com/yiliu1765/qemu/commit/0716f874527b20e8784ef1afaff66069cc3d7b60 Property "aw_bits" already exist for a long time. The check on VTD_ECAP_NEST failure means host doesn't support nesting, Property "x-scalable-mode=legacy" can help. Thanks Zhenzhong