On Mon, 6 Mar 2023 14:37:04 +0000 Joao Martins <joao.m.mart...@oracle.com> wrote:
> On 06/03/2023 13:41, Cédric Le Goater wrote: > > On 3/4/23 02:43, Joao Martins wrote: > >> According to the device DMA logging uAPI, IOVA ranges to be logged by > >> the device must be provided all at once upon DMA logging start. > >> > >> As preparation for the following patches which will add device dirty > >> page tracking, keep a record of all DMA mapped IOVA ranges so later they > >> can be used for DMA logging start. > >> > >> Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked. > >> This is due to the dynamic nature of vIOMMU DMA mapping/unmapping. > >> > >> Signed-off-by: Avihai Horon <avih...@nvidia.com> > >> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com> > >> --- > >> hw/vfio/common.c | 84 +++++++++++++++++++++++++++++++++++ > >> hw/vfio/trace-events | 1 + > >> include/hw/vfio/vfio-common.h | 11 +++++ > >> 3 files changed, 96 insertions(+) > >> > >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >> index ed908e303dbb..d84e5fd86bb4 100644 > >> --- a/hw/vfio/common.c > >> +++ b/hw/vfio/common.c > >> @@ -44,6 +44,7 @@ > >> #include "migration/blocker.h" > >> #include "migration/qemu-file.h" > >> #include "sysemu/tpm.h" > >> +#include "qemu/iova-tree.h" > >> VFIOGroupList vfio_group_list = > >> QLIST_HEAD_INITIALIZER(vfio_group_list); > >> @@ -1313,11 +1314,94 @@ static int > >> vfio_set_dirty_page_tracking(VFIOContainer > >> *container, bool start) > >> return ret; > >> } > >> +/* > >> + * Called for the dirty tracking memory listener to calculate the iova/end > >> + * for a given memory region section. The checks here, replicate the logic > >> + * in vfio_listener_region_{add,del}() used for the same purpose. And thus > >> + * both listener should be kept in sync. > >> + */ > >> +static bool vfio_get_section_iova_range(VFIOContainer *container, > >> + MemoryRegionSection *section, > >> + hwaddr *out_iova, hwaddr *out_end) > >> +{ > >> + Int128 llend; > >> + hwaddr iova; > >> + > >> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); > >> + llend = int128_make64(section->offset_within_address_space); > >> + llend = int128_add(llend, section->size); > >> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask())); > >> + > >> + if (int128_ge(int128_make64(iova), llend)) { > >> + return false; > >> + } > >> + > >> + *out_iova = iova; > >> + *out_end = int128_get64(llend) - 1; > >> + return true; > >> +} > >> + > >> +static void vfio_dirty_tracking_update(MemoryListener *listener, > >> + MemoryRegionSection *section) > >> +{ > >> + VFIOContainer *container = container_of(listener, VFIOContainer, > >> + tracking_listener); > >> + VFIODirtyTrackingRange *range = &container->tracking_range; > >> + hwaddr max32 = (1ULL << 32) - 1ULL; > >> + hwaddr iova, end; > >> + > >> + if (!vfio_listener_valid_section(section) || > >> + !vfio_get_section_iova_range(container, section, &iova, &end)) { > >> + return; > >> + } > >> + > >> + WITH_QEMU_LOCK_GUARD(&container->tracking_mutex) { > >> + if (iova < max32 && end <= max32) { > >> + if (range->min32 > iova) { > >> + range->min32 = iova; > >> + } > >> + if (range->max32 < end) { > >> + range->max32 = end; > >> + } > >> + trace_vfio_device_dirty_tracking_update(iova, end, > >> + range->min32, range->max32); > >> + } else { > >> + if (!range->min64 || range->min64 > iova) { > >> + range->min64 = iova; > >> + } > >> + if (range->max64 < end) { > >> + range->max64 = end; > >> + } > >> + trace_vfio_device_dirty_tracking_update(iova, end, > >> + range->min64, range->max64); > >> + } > >> + } > >> + return; > >> +} > >> + > >> +static const MemoryListener vfio_dirty_tracking_listener = { > >> + .name = "vfio-tracking", > >> + .region_add = vfio_dirty_tracking_update, > >> +}; > >> + > >> +static void vfio_dirty_tracking_init(VFIOContainer *container) > >> +{ > >> + memset(&container->tracking_range, 0, > >> sizeof(container->tracking_range)); > >> + qemu_mutex_init(&container->tracking_mutex); > >> + container->tracking_listener = vfio_dirty_tracking_listener; > >> + memory_listener_register(&container->tracking_listener, > >> + container->space->as); > > > > The following unregister+destroy calls seem to belong to a _fini routine. > > Am I missing something ? > > > The thinking is that once we register the memory listener, it will iterate > over all the sections and once that is finished the memory_listener_register > returns. So the state we initialize here isn't needed anyelse other than to > create the range and hence we destroy it right away. It was at > container_init() > but it was unnecessary overhead to keep around if it's *only* needed when we > start/stop dirty tracking. > > So the reason I don't add a _fini method is because there's no need to > teardown > the state anywhere else other than this function. > > I would argue that maybe I don't need the mutex at all as this is all > serialized... Right, this is in line with my previous comments that we don't need to keep the listener around since we don't expect changes to memory regions, ex. virtio-mem slots are locked down during migration, and we don't support removal of ranges from the device anyway. We're done with the listener after it's built our min/max ranges. And yes, the mutex seems superfluous. Thanks, Alex