>-----Original Message-----
>From: Cédric Le Goater <c...@redhat.com>
>Sent: Monday, November 13, 2023 7:05 PM
>Subject: Re: [PATCH v5 03/20] vfio/iommufd: Implement the iommufd backend
>
>On 11/10/23 11:18, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <c...@redhat.com>
>>> Sent: Friday, November 10, 2023 5:34 PM
>>> Subject: Re: [PATCH v5 03/20] vfio/iommufd: Implement the iommufd
>backend
>>>
>>> On 11/9/23 12:45, Zhenzhong Duan wrote:
>>>> From: Yi Liu <yi.l....@intel.com>
>>>>
>>>> Add the iommufd backend. The IOMMUFD container class is implemented
>>>> based on the new /dev/iommu user API. This backend obviously depends
>>>> on CONFIG_IOMMUFD.
>>>>
>>>> So far, the iommufd backend doesn't support dirty page sync yet due
>>>> to missing support in the host kernel.
>>>>
>>>> Co-authored-by: Eric Auger <eric.au...@redhat.com>
>>>> Signed-off-by: Yi Liu <yi.l....@intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com>
>>>> ---
>>>> v5: Switch to IOAS attach/detach and hide hwpt
>>>>
>>>>    include/hw/vfio/vfio-common.h |  11 +
>>>>    hw/vfio/common.c              |  20 +-
>>>>    hw/vfio/iommufd.c             | 429 ++++++++++++++++++++++++++++++++++
>>>>    hw/vfio/meson.build           |   3 +
>>>>    hw/vfio/trace-events          |  10 +
>>>>    5 files changed, 469 insertions(+), 4 deletions(-)
>>>>    create mode 100644 hw/vfio/iommufd.c
>>>>
>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>>>> index 24ecc0e7ee..3dac5c167e 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -89,6 +89,14 @@ typedef struct VFIOHostDMAWindow {
>>>>        QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
>>>>    } VFIOHostDMAWindow;
>>>>
>>>> +typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>> +
>>>> +typedef struct VFIOIOMMUFDContainer {
>>>> +    VFIOContainerBase bcontainer;
>>>> +    IOMMUFDBackend *be;
>>>> +    uint32_t ioas_id;
>>>> +} VFIOIOMMUFDContainer;
>>>> +
>>>>    typedef struct VFIODeviceOps VFIODeviceOps;
>>>>
>>>>    typedef struct VFIODevice {
>>>> @@ -116,6 +124,8 @@ typedef struct VFIODevice {
>>>>        OnOffAuto pre_copy_dirty_page_tracking;
>>>>        bool dirty_pages_supported;
>>>>        bool dirty_tracking;
>>>> +    int devid;
>>>> +    IOMMUFDBackend *iommufd;
>>>>    } VFIODevice;
>>>>
>>>>    struct VFIODeviceOps {
>>>> @@ -201,6 +211,7 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice)
>>> VFIODeviceList;
>>>>    extern VFIOGroupList vfio_group_list;
>>>>    extern VFIODeviceList vfio_device_list;
>>>>    extern const VFIOIOMMUOps vfio_legacy_ops;
>>>> +extern const VFIOIOMMUOps vfio_iommufd_ops;
>>>>    extern const MemoryListener vfio_memory_listener;
>>>>    extern int vfio_kvm_device_fd;
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 572ae7c934..3b7e11158f 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -19,6 +19,7 @@
>>>>     */
>>>>
>>>>    #include "qemu/osdep.h"
>>>> +#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
>>>>    #include <sys/ioctl.h>
>>>>    #ifdef CONFIG_KVM
>>>>    #include <linux/kvm.h>
>>>> @@ -1462,10 +1463,13 @@ VFIOAddressSpace
>>> *vfio_get_address_space(AddressSpace *as)
>>>>
>>>>    void vfio_put_address_space(VFIOAddressSpace *space)
>>>>    {
>>>> -    if (QLIST_EMPTY(&space->containers)) {
>>>> -        QLIST_REMOVE(space, list);
>>>> -        g_free(space);
>>>> +    if (!QLIST_EMPTY(&space->containers)) {
>>>> +        return;
>>>
>>> I think this change deserves to be in a separate patch, even if simple.
>>> Is there some relation with iommufd ? This is not clear.
>>
>> OK, will do. It's unrelated to iommufd, just avoid unnecessary check below.
>>
>>>
>>>>        }
>>>> +
>>>> +    QLIST_REMOVE(space, list);
>>>> +    g_free(space);
>>>> +
>>>>        if (QLIST_EMPTY(&vfio_address_spaces)) {
>>>>            qemu_unregister_reset(vfio_reset_handler, NULL);
>>>>        }
>>>> @@ -1498,8 +1502,16 @@ retry:
>>>>    int vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>>                           AddressSpace *as, Error **errp)
>>>>    {
>>>> -    const VFIOIOMMUOps *ops = &vfio_legacy_ops;
>>>> +    const VFIOIOMMUOps *ops;
>>>>
>>>> +#ifdef CONFIG_IOMMUFD
>>>> +    if (vbasedev->iommufd) {
>>>> +        ops = &vfio_iommufd_ops;
>>>> +    } else
>>>> +#endif
>>>> +    {
>>>> +        ops = &vfio_legacy_ops;
>>>> +    }
>>>
>>> Simply adding :
>>>
>>>   +#ifdef CONFIG_IOMMUFD
>>>   +    if (vbasedev->iommufd) {
>>>   +        ops = &vfio_iommufd_ops;
>>>   +    }
>>>   +#endif
>>>
>>> would have the same effect with less change.
>>
>> Indeed, will do.
>>
>>>
>>> That said, it would also be nice to find a way to avoid the use of
>>> CONFIG_IOMMUFD in hw/vfio/common.c. May be with a helper returning
>>> 'const VFIOIOMMUOps *'. This is minor. Still, I find some redundancy
>>> with vfio_container_init() and I don't a good alternative yet :)
>>
>> Sure, will do, guess you mean a helper function in hw/vfio/helpers.c with
>> CONFIG_IOMMUFD check?
>
>Yes. That was the idea. I took a look and the benefits are minimal.
>I am not sure it is worth the effort.

So you are neutral, then I'd like to keep it😊
In fact, we have removed almost all CONFIG_IOMMUFD checks, this is the only one
in common.c, Looks no difference of this check in common.c or helpers.c for me.

>
>I also tried to minize the use of CONFIG_IOMMUFD in our various
>VFIO devices with an intermediate QOM interface. See below.
>
>+
>+#define TYPE_IOMMUFD_INTERFACE "iommufd-interface"
>+#define IOMMUFD_INTERFACE(obj) \
>+     INTERFACE_CHECK(IOMMUFDInterface, (obj), TYPE_IOMMUFD_INTERFACE)
>+typedef struct IOMMUFDInterfaceClass IOMMUFDInterfaceClass;
>+DECLARE_CLASS_CHECKERS(IOMMUFDInterfaceClass, IOMMUFD_INTERFACE,
>+                       TYPE_IOMMUFD_INTERFACE)
>+
>+typedef struct IOMMUFDInterface IOMMUFDInterface;
>+
>+struct IOMMUFDInterfaceClass {
>+    InterfaceClass parent;
>+    void (*set_fd)(Object *obj, const char *str, Error **errp);
>+};
>+
>+#ifdef CONFIG_IOMMUFD
>+static void iommufd_interface_set_fd(Object *obj, const char *str, Error 
>**errp)
>+{
>+    IOMMUFDInterfaceClass *iik = IOMMUFD_INTERFACE_GET_CLASS(obj);
>+
>+    iik->set_fd(obj, str, errp);
>+}
>+#endif
>+
>+static void iommufd_interface_class_init(ObjectClass *klass, void *data)
>+{
>+#ifdef CONFIG_IOMMUFD
>+    object_class_property_add_str(klass, "fd", NULL, 
>iommufd_interface_set_fd);
>+#endif
>+}
>+
>+static void iommufd_interface_add_property(Object *obj, Object **iommufd)
>+{
>+#ifdef CONFIG_IOMMUFD
>+    object_property_add_link(obj, "iommufd", TYPE_IOMMUFD_BACKEND,
>iommufd,
>+                             NULL, OBJ_PROP_LINK_STRONG);
>+#endif
>+}
>+
>+static const TypeInfo iommufd_interface_info = {
>+    .name = TYPE_IOMMUFD_INTERFACE,
>+    .parent = TYPE_INTERFACE,
>+    .class_size = sizeof(IOMMUFDInterfaceClass),
>+    .class_init = iommufd_interface_class_init,
>+};
>
>  /*
>   * Functions used whatever the injection method
>@@ -649,21 +696,18 @@ static Property vfio_platform_dev_proper
>  static void vfio_platform_instance_init(Object *obj)
>  {
>      VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(obj);
>+    Object *iommufd = OBJECT(vdev->vbasedev.iommufd);
>
>      vdev->vbasedev.fd = -1;
>+    iommufd_interface_add_property(obj, &iommufd);
>  }
>
>  static void vfio_platform_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      SysBusDeviceClass *sbc = SYS_BUS_DEVICE_CLASS(klass);
>+    IOMMUFDInterfaceClass *iik = IOMMUFD_INTERFACE_CLASS(klass);
>
>      dc->realize = vfio_platform_realize;
>      device_class_set_props(dc, vfio_platform_dev_properties);
>      dc->vmsd = &vfio_platform_vmstate;
>      dc->desc = "VFIO-based platform device assignment";
>      sbc->connect_irq_notifier = vfio_start_irqfd_injection;
>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>      /* Supported by TYPE_VIRT_MACHINE */
>      dc->user_creatable = true;
>+    iik->set_fd = vfio_platform_set_fd;
>  }
>
>  static const TypeInfo vfio_platform_dev_info = {
>@@ -703,11 +745,16 @@ static const TypeInfo vfio_platform_dev_
>      .instance_init = vfio_platform_instance_init,
>      .class_init = vfio_platform_class_init,
>      .class_size = sizeof(VFIOPlatformDeviceClass),
>+    .interfaces    = (InterfaceInfo[]) {
>+        { TYPE_IOMMUFD_INTERFACE },
>+        { }
>+    }
>  };
>
>
>Not really satisfying compared to the #ifdef. Let's keep them.

Still nice try out, thanks

>
>
>
>>
>>>
>>>
>>>>        return ops->attach_device(name, vbasedev, as, errp);
>>>>    }
>>>>
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> new file mode 100644
>>>> index 0000000000..ea4e23f4ec
>>>> --- /dev/null
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -0,0 +1,429 @@
>>>> +/*
>>>> + * iommufd container backend
>>>> + *
>>>> + * Copyright (C) 2023 Intel Corporation.
>>>> + * Copyright Red Hat, Inc. 2023
>>>> + *
>>>> + * Authors: Yi Liu <yi.l....@intel.com>
>>>> + *          Eric Auger <eric.au...@redhat.com>
>>>> + *
>>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>>> + */
>>>> +
>>>> +#include "qemu/osdep.h"
>>>> +#include <sys/ioctl.h>
>>>> +#include <linux/vfio.h>
>>>> +#include <linux/iommufd.h>
>>>> +
>>>> +#include "hw/vfio/vfio-common.h"
>>>> +#include "qemu/error-report.h"
>>>> +#include "trace.h"
>>>> +#include "qapi/error.h"
>>>> +#include "sysemu/iommufd.h"
>>>> +#include "hw/qdev-core.h"
>>>> +#include "sysemu/reset.h"
>>>> +#include "qemu/cutils.h"
>>>> +#include "qemu/chardev_open.h"
>>>> +
>>>> +static int iommufd_map(VFIOContainerBase *bcontainer, hwaddr iova,
>>>> +                       ram_addr_t size, void *vaddr, bool readonly)
>>>> +{
>>>> +    VFIOIOMMUFDContainer *container =
>>>> +        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>>>> +
>>>> +    return iommufd_backend_map_dma(container->be,
>>>> +                                   container->ioas_id,
>>>> +                                   iova, size, vaddr, readonly);
>>>> +}
>>>> +
>>>> +static int iommufd_unmap(VFIOContainerBase *bcontainer,
>>>> +                         hwaddr iova, ram_addr_t size,
>>>> +                         IOMMUTLBEntry *iotlb)
>>>> +{
>>>> +    VFIOIOMMUFDContainer *container =
>>>> +        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>>>> +
>>>> +    /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
>>>> +    return iommufd_backend_unmap_dma(container->be,
>>>> +                                     container->ioas_id, iova, size);
>>>> +}
>>>> +
>>>> +static void iommufd_cdev_kvm_device_add(VFIODevice *vbasedev)
>>>> +{
>>>> +    Error *err = NULL;
>>>> +
>>>> +    if (vfio_kvm_device_add_fd(vbasedev->fd, &err)) {
>>>> +        error_report_err(err);
>>>
>>> We should propagate this error to the callers instead.
>>
>> This is to follow legacy backend where the error doesn't treated as
>> a serious issue.
>>
>>>
>>>> +    }
>>>> +}
>>>> +
>>>> +static void iommufd_cdev_kvm_device_del(VFIODevice *vbasedev)
>>>> +{
>>>> +    Error *err = NULL;
>>>> +
>>>> +    if (vfio_kvm_device_del_fd(vbasedev->fd, &err)) {
>>>> +        error_report_err(err);
>>>
>>> Possibly this one also but It might be more complex. Let's keep it that
>>> way.
>>>
>>>> +    }
>>>> +}
>>>> +
>>>> +static int iommufd_connect_and_bind(VFIODevice *vbasedev, Error **errp)
>>>> +{
>>>> +    IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>>> +    struct vfio_device_bind_iommufd bind = {
>>>> +        .argsz = sizeof(bind),
>>>> +        .flags = 0,
>>>> +    };
>>>> +    int ret;
>>>> +
>>>> +    ret = iommufd_backend_connect(iommufd, errp);
>>>> +    if (ret) {
>>>> +        return ret;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Add device to kvm-vfio to be prepared for the tracking
>>>> +     * in KVM. Especially for some emulated devices, it requires
>>>> +     * to have kvm information in the device open.
>>>> +     */
>>>> +    iommufd_cdev_kvm_device_add(vbasedev);
>>>
>>> We shoud return a possible error.
>>
>> This is to follow legacy backend where this error is reported and ignored.
>> Do we need a difference for iommufd BE?
>
>I don't know ! You tell me :) It is always better to raise the
>exception and let the upper layers decide on the action to take.

Got it, will fix.

>
>>
>>>
>>>> +
>>>> +    /* Bind device to iommufd */
>>>> +    bind.iommufd = iommufd->fd;
>>>> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
>>>> +    if (ret) {
>>>> +        error_setg_errno(errp, errno, "error bind device fd=%d to
>iommufd=%d",
>>>> +                         vbasedev->fd, bind.iommufd);
>>>> +        goto err_bind;
>>>> +    }
>>>> +
>>>> +    vbasedev->devid = bind.out_devid;
>>>> +    trace_iommufd_connect_and_bind(bind.iommufd, vbasedev->name,
>>> vbasedev->fd,
>>>> +                                   vbasedev->devid);
>>>> +    return ret;
>>>> +err_bind:
>>>> +    iommufd_cdev_kvm_device_del(vbasedev);
>>>> +    iommufd_backend_disconnect(iommufd);
>>>
>>> These two calls look like iommufd_unbind_and_disconnect() below.
>>
>> Yes, they are same as iommufd doesn't support explicit device unbind.
>> But it looks strange to call iommufd_unbind_and_disconnect in
>> iommufd_connect_and_bind.
>
>This is just a little redudant. This is minor.

With above fix, there is a new label in between, so not an issue now.

Thanks
Zhenzhong

Reply via email to