>-----Original Message-----
>From: Joao Martins <joao.m.mart...@oracle.com>
>Subject: Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking
>
>On 11/07/2024 08:41, Cédric Le Goater wrote:
>> Hello Joao,
>>
>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>> This small series adds support for IOMMU dirty tracking support via the
>>> IOMMUFD backend. The hardware capability is available on most recent
>x86
>>> hardware. The series is divided organized as follows:
>>>
>>> * Patch 1: Fixes a regression into mdev support with IOMMUFD. This
>>>             one is independent of the series but happened to cross it
>>>             while testing mdev with this series
>>>
>>> * Patch 2: Adds a support to iommufd_get_device_info() for capabilities
>>>
>>> * Patches 3 - 7: IOMMUFD backend support for dirty tracking;
>>>
>>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is
>that
>>> we will find and attach a device to a compatible IOMMU domain, or
>allocate a new
>>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs).
>Afterwards the
>>> workflow is relatively simple:
>>>
>>> 1) Probe device and allow dirty tracking in the HWPT
>>> 2) Toggling dirty tracking on/off
>>> 3) Read-and-clear of Dirty IOVAs
>>>
>>> The heuristics selected for (1) were to always request the HWPT for
>>> dirty tracking if supported, or rely on device dirty page tracking. This
>>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>>> tracking even if we ask during hwpt allocation.
>>>
>>> The unmap case is deferred until further vIOMMU support with migration
>>> is added[3] which will then introduce the usage of
>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP
>ioctl in the
>>> dma unmap bitmap flow.
>>>
>>> * Patches 8-10: Don't block live migration where there's no VF dirty
>>> tracker, considering that we have IOMMU dirty tracking.
>>>
>>> Comments and feedback appreciated.
>>>
>>> Cheers,
>>>      Joao
>>>
>>> P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's
>>> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>>>
>>> Changes since RFCv2[4]:
>>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING
>even if
>>> we end up not actually toggling dirty tracking. (Avihai)
>>> * Fix error handling widely in auto domains logic and all patches (Avihai)
>>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
>>> * New patches 1 and 2 taking into consideration previous comments.
>>> * Store hwpt::flags to know if we have dirty tracking (Avihai)
>>> * New patch 8, that allows to query dirty tracking support after
>>> provisioning. This is a cleaner way to check IOMMU dirty tracking support
>>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
>>> device caps way is still used because at vfio attach we aren't yet with
>>> a fully initialized migration state.
>>> * Adopt error propagation in query,set dirty tracking
>>> * Misc improvements overall broadly and Avihai
>>> * Drop hugepages as it's a bit unrelated; I can pursue that patch
>>> * separately. The main motivation is to provide a way to test
>>> without hugepages similar to what
>vfio_type1_iommu.disable_hugepages=1
>>> does.
>>>
>>> Changes since RFCv1[2]:
>>> * Remove intel/amd dirty tracking emulation enabling
>>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
>>> [Will pursue these two in separate series]
>>> * Introduce auto domains support
>>> * Enforce dirty tracking following the IOMMUFD UAPI for this
>>> * Add support for toggling hugepages in IOMMUFD
>>> * Auto enable support when VF supports migration to use IOMMU
>>> when it doesn't have VF dirty tracking
>>> * Add a parameter to toggle VF dirty tracking
>>>
>>> [0]
>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-
>zhenzhong.d...@intel.com/
>>> [1]
>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-10-
>zhenzhong.d...@intel.com/
>>> [2]
>>> https://lore.kernel.org/qemu-devel/20220428211351.3897-1-
>joao.m.mart...@oracle.com/
>>> [3]
>>> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-
>joao.m.mart...@oracle.com/
>>> [4]
>>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-
>joao.m.mart...@oracle.com/
>>>
>>> Joao Martins (10):
>>>    vfio/iommufd: don't fail to realize on IOMMU_GET_HW_INFO failure
>>>    backends/iommufd: Extend iommufd_backend_get_device_info() to
>fetch HW
>>> capabilities
>>>    vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
>>>    vfio/iommufd: Introduce auto domain creation
>>>    vfio/iommufd: Probe and request hwpt dirty tracking capability
>>>    vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking
>support
>>>    vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap
>support
>>>    vfio/iommufd: Parse hw_caps and store dirty tracking support
>>>    vfio/migration: Don't block migration device dirty tracking is
>unsupported
>>>    vfio/common: Allow disabling device dirty page tracking
>>>
>>>   include/hw/vfio/vfio-common.h      |  11 ++
>>>   include/sysemu/host_iommu_device.h |   2 +
>>>   include/sysemu/iommufd.h           |  12 +-
>>>   backends/iommufd.c                 |  81 ++++++++++-
>>>   hw/vfio/common.c                   |   3 +
>>>   hw/vfio/iommufd.c                  | 217 +++++++++++++++++++++++++++--
>>>   hw/vfio/migration.c                |   7 +-
>>>   hw/vfio/pci.c                      |   3 +
>>>   backends/trace-events              |   3 +
>>>   9 files changed, 325 insertions(+), 14 deletions(-)
>>
>>
>> I am a bit confused with all the inline proposals. Would you mind
>> resending a v4 please ?
>>
>
>Yeap, I'll send it out today, or worst case tomorrow morning.
>
>> Regarding my comments on error handling,
>>
>> The error should be set in case of failure, which means a routine
>> can not return 'false' or '-errno' and not setting 'Error **'
>> parameter at the same time.
>>
>> If the returned value needs to be interpreted in some ways, for a
>> retry or any reason, then it makes sense to use an int, else please
>> use a bool. This is to avoid random negative values being interpreted
>> as an errno when they are not.
>>
>OK, I'll retain the Error* creation even when expecting to test the errno.
>
>> With VFIO migration support, low level errors (from the adapter FW
>> through the VFIO PCI variant driver) now reach to the core migration
>> subsystem. It is preferable to propagate this error, possibly literal,
>> to the VMM, monitor or libvirt. It's not fully symmetric today because
>> the log_global_stop handler for dirty tracking enablement is not
>> addressed. Anyhow, an effort on error reporting needs to be made and
>> any use of error_report() in a low level function is a sign for
>> improvement.
>>
>Gotcha. My earlier comment was mostly that it sounded like there was no
>place
>for returning -errno, but it seems it's not that binary and the Error* is the
>thing that really matters here.
>
>> I think it would have value to probe early the host IOMMU device for
>> its HW features. If the results were cached in the HostIOMMUDevice
>> struct, it would then remove unnecessary and redundant calls to the
>> host kernel and avoid error handling in complex code paths. I hope
>> this is feasible. I haven't looked closely tbh.
>>
>OK, I'll post in this series what I had inline[0], as that's what I did.
>
>[0]
>https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133-
>59170c471...@oracle.com/
>
>The gotcha in my opinion is that I cache IOMMUFD specific data returned by
>the
>GET_HW_INFO ioctl inside a new HostIOMMUDeviceCaps::iommufd. The
>reason being
>that vfio_device_get_aw_bits() has a hidden assumption that the container
>is
>already populated with the list of allowed iova ranges, which is not true for
>the first device. So rather than have partial set of caps initialized, I
>essentially ended up with fetching the raw caps and store them, and serialize
>caps into named features (e.g. caps::aw_bits) in
>HostIOMMUDevice::realize().

Another way is to call vfio_device_get_aw_bits() and return its result directly
in get_cap(), then no need to initialize caps::aw_bits.
This way host IOMMU device can be moved ahead as Cédric suggested.

Thanks
Zhenzhong

>
>> We are reaching soft freeze, in ~10 days. There is a chance this
>> proposal could make it for 9.1.
>>
>> Thanks,
>>
>> C.
>>

Reply via email to