Hi Paolo and Kirti, Many thanks for reply. I am busy today and will reply you tomorrow, Thanks.
Keqian. On 2021/1/28 5:03, Kirti Wankhede wrote: > > > On 1/11/2021 1:04 PM, Keqian Zhu wrote: >> For now the switch of vfio dirty page tracking is integrated into >> the vfio_save_handler, it causes some problems [1]. >> > > Sorry, I missed [1] mail, somehow it didn't landed in my inbox. > >> The object of dirty tracking is guest memory, but the object of >> the vfio_save_handler is device state. This mixed logic produces >> unnecessary coupling and conflicts: >> >> 1. Coupling: Their saving granule is different (perVM vs perDevice). >> vfio will enable dirty_page_tracking for each devices, actually >> once is enough. > > That's correct, enabling dirty page tracking once is enough. But log_start > and log_stop gets called on address space update transaction, region_add() or > region_del(), at this point migration may not be active. We don't want to > allocate bitmap memory in kernel for lifetime of VM, without knowing > migration will be happen or not. vfio_iommu_type1 module should allocate > bitmap memory only while migration is active. > > Paolo's suggestion here to use log_global_start and log_global_stop callbacks > seems correct here. But at this point vfio device state is not yet changed to > |_SAVING as you had identified it in [1]. May be we can start tracking bitmap > in iommu_type1 module while device is not yet _SAVING, but getting dirty > bitmap while device is yet not in _SAVING|_RUNNING state doesn't seem optimal > solution. > > Pasting here your question from [1] > >> Before start dirty tracking, we will check and ensure that the device >> is at _SAVING state and return error otherwise. But the question is >> that what is the rationale? Why does the VFIO_IOMMU_DIRTY_PAGES >> ioctl have something to do with the device state? > > Lets walk through the types of devices we are supporting: > 1. mdev devices without IOMMU backed device > Vendor driver pins pages as and when required during runtime. We can say > that vendor driver is smart which identifies the pages to pin. We are good > here. > > 2. mdev device with IOMMU backed device > This is similar to vfio-pci, direct assigned device, where all pages are > pinned at VM bootup. Vendor driver is not smart, so bitmap query will report > all pages dirty always. If --auto-converge is not set, VM stucks infinitely > in pre-copy phase. This is known to us. > > 3. mdev device with IOMMU backed device with smart vendor driver > In this case as well all pages are pinned at VM bootup, but vendor driver > is smart to identify the pages and pin them explicitly. > Pages can be pinned anytime, i.e. during normal VM runtime or on setting > _SAVING flag (entering pre-copy phase) or while in iterative pre-copy phase. > There is no restriction based on these phases for calling vfio_pin_pages(). > Vendor driver can start pinning pages based on its device state when _SAVING > flag is set. In that case, if dirty bitmap is queried before that then it > will report all sysmem as dirty with an unnecessary copy of sysmem. > As an optimal solution, I think its better to query bitmap only after all > vfio devices are in pre-copy phase, i.e. after _SAVING flag is set. > >> 2. Conflicts: The ram_save_setup() traverses all memory_listeners >> to execute their log_start() and log_sync() hooks to get the >> first round dirty bitmap, which is used by the bulk stage of >> ram saving. However, it can't get dirty bitmap from vfio, as >> @savevm_ram_handlers is registered before @vfio_save_handler. >> > Right, but it can get dirty bitmap from vfio device in it's iterative callback > ram_save_pending -> > migration_bitmap_sync_precopy() .. -> > vfio_listerner_log_sync > > Thanks, > Kirti > >> Move the switch of vfio dirty_page_tracking into vfio_memory_listener >> can solve above problems. Besides, Do not require devices in SAVING >> state for vfio_sync_dirty_bitmap(). >> >> [1] https://www.spinics.net/lists/kvm/msg229967.html >> >> Reported-by: Zenghui Yu <yuzeng...@huawei.com> >> Signed-off-by: Keqian Zhu <zhukeqi...@huawei.com> >> --- >> hw/vfio/common.c | 53 +++++++++++++++++++++++++++++++++++++-------- >> hw/vfio/migration.c | 35 ------------------------------ >> 2 files changed, 44 insertions(+), 44 deletions(-) >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >> index 6ff1daa763..9128cd7ee1 100644 >> --- a/hw/vfio/common.c >> +++ b/hw/vfio/common.c >> @@ -311,7 +311,7 @@ bool vfio_mig_active(void) >> return true; >> } >> -static bool vfio_devices_all_saving(VFIOContainer *container) >> +static bool vfio_devices_all_dirty_tracking(VFIOContainer *container) >> { >> VFIOGroup *group; >> VFIODevice *vbasedev; >> @@ -329,13 +329,8 @@ static bool vfio_devices_all_saving(VFIOContainer >> *container) >> return false; >> } >> - if (migration->device_state & VFIO_DEVICE_STATE_SAVING) { >> - if ((vbasedev->pre_copy_dirty_page_tracking == >> ON_OFF_AUTO_OFF) >> - && (migration->device_state & >> VFIO_DEVICE_STATE_RUNNING)) { >> - return false; >> - } >> - continue; >> - } else { >> + if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) >> + && (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) { >> return false; >> } >> } >> @@ -987,6 +982,44 @@ static void vfio_listener_region_del(MemoryListener >> *listener, >> } >> } >> +static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool >> start) >> +{ >> + int ret; >> + struct vfio_iommu_type1_dirty_bitmap dirty = { >> + .argsz = sizeof(dirty), >> + }; >> + >> + if (start) { >> + dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START; >> + } else { >> + dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP; >> + } >> + >> + ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty); >> + if (ret) { >> + error_report("Failed to set dirty tracking flag 0x%x errno: %d", >> + dirty.flags, errno); >> + } >> +} >> + >> +static void vfio_listener_log_start(MemoryListener *listener, >> + MemoryRegionSection *section, >> + int old, int new) >> +{ >> + VFIOContainer *container = container_of(listener, VFIOContainer, >> listener); >> + >> + vfio_set_dirty_page_tracking(container, true); >> +} >> + >> +static void vfio_listener_log_stop(MemoryListener *listener, >> + MemoryRegionSection *section, >> + int old, int new) >> +{ >> + VFIOContainer *container = container_of(listener, VFIOContainer, >> listener); >> + >> + vfio_set_dirty_page_tracking(container, false); >> +} >> + >> static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova, >> uint64_t size, ram_addr_t ram_addr) >> { >> @@ -1128,7 +1161,7 @@ static void vfio_listerner_log_sync(MemoryListener >> *listener, >> return; >> } >> - if (vfio_devices_all_saving(container)) { >> + if (vfio_devices_all_dirty_tracking(container)) { >> vfio_sync_dirty_bitmap(container, section); >> } >> } >> @@ -1136,6 +1169,8 @@ static void vfio_listerner_log_sync(MemoryListener >> *listener, >> static const MemoryListener vfio_memory_listener = { >> .region_add = vfio_listener_region_add, >> .region_del = vfio_listener_region_del, >> + .log_start = vfio_listener_log_start, >> + .log_stop = vfio_listener_log_stop, >> .log_sync = vfio_listerner_log_sync, >> }; >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index 00daa50ed8..c0f646823a 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -395,40 +395,10 @@ static int vfio_load_device_config_state(QEMUFile *f, >> void *opaque) >> return qemu_file_get_error(f); >> } >> -static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start) >> -{ >> - int ret; >> - VFIOMigration *migration = vbasedev->migration; >> - VFIOContainer *container = vbasedev->group->container; >> - struct vfio_iommu_type1_dirty_bitmap dirty = { >> - .argsz = sizeof(dirty), >> - }; >> - >> - if (start) { >> - if (migration->device_state & VFIO_DEVICE_STATE_SAVING) { >> - dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START; >> - } else { >> - return -EINVAL; >> - } >> - } else { >> - dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP; >> - } >> - >> - ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty); >> - if (ret) { >> - error_report("Failed to set dirty tracking flag 0x%x errno: %d", >> - dirty.flags, errno); >> - return -errno; >> - } >> - return ret; >> -} >> - >> static void vfio_migration_cleanup(VFIODevice *vbasedev) >> { >> VFIOMigration *migration = vbasedev->migration; >> - vfio_set_dirty_page_tracking(vbasedev, false); >> - >> if (migration->region.mmaps) { >> vfio_region_unmap(&migration->region); >> } >> @@ -469,11 +439,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque) >> return ret; >> } >> - ret = vfio_set_dirty_page_tracking(vbasedev, true); >> - if (ret) { >> - return ret; >> - } >> - >> qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); >> ret = qemu_file_get_error(f); >> > . >