Re: [PATCH for 9.0 00/12] Map memory at destination .load_setup in vDPA-net migration
On Mon, Dec 25, 2023 at 5:31 PM Michael S. Tsirkin wrote: > > On Fri, Dec 15, 2023 at 06:28:18PM +0100, Eugenio Pérez wrote: > > Current memory operations like pinning may take a lot of time at the > > destination. Currently they are done after the source of the migration is > > stopped, and before the workload is resumed at the destination. This is a > > period where neigher traffic can flow, nor the VM workload can continue > > (downtime). > > > > We can do better as we know the memory layout of the guest RAM at the > > destination from the moment the migration starts. Moving that operation > > allows > > QEMU to communicate the kernel the maps while the workload is still running > > in > > the source, so Linux can start mapping them. > > > > Also, the destination of the guest memory may finish before the destination > > QEMU maps all the memory. In this case, the rest of the memory will be > > mapped > > at the same time as before applying this series, when the device is > > starting. > > So we're only improving with this series. > > > > If the destination has the switchover_ack capability enabled, the > > destination > > hold the migration until all the memory is mapped. > > > > This needs to be applied on top of [1]. That series performs some code > > reorganization that allows to map the guest memory without knowing the queue > > layout the guest configure on the device. > > > > This series reduced the downtime in the stop-and-copy phase of the live > > migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa > > devices, > > per [2]. > > I think this is reasonable and could be applied - batching is good. > Could you rebase on master and repost please? > New comments appeared in the meantime [1], but I'll rebase with the needed changes after they converge. Thanks! [1] https://patchwork.kernel.org/comment/25653487/ > > Future directions on top of this series may include: > > * Iterative migration of virtio-net devices, as it may reduce downtime per > > [3]. > > vhost-vdpa net can apply the configuration through CVQ in the destination > > while the source is still migrating. > > * Move more things ahead of migration time, like DRIVER_OK. > > * Check that the devices of the destination are valid, and cancel the > > migration > > in case it is not. > > > > v1 from RFC v2: > > * Hold on migration if memory has not been mapped in full with > > switchover_ack. > > * Revert map if the device is not started. > > > > RFC v2: > > * Delegate map to another thread so it does no block QMP. > > * Fix not allocating iova_tree if x-svq=on at the destination. > > * Rebased on latest master. > > * More cleanups of current code, that might be split from this series too. > > > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html > > [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html > > [3] > > https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/ > > > > Eugenio Pérez (12): > > vdpa: do not set virtio status bits if unneeded > > vdpa: make batch_begin_once early return > > vdpa: merge _begin_batch into _batch_begin_once > > vdpa: extract out _dma_end_batch from _listener_commit > > vdpa: factor out stop path of vhost_vdpa_dev_start > > vdpa: check for iova tree initialized at net_client_start > > vdpa: set backend capabilities at vhost_vdpa_init > > vdpa: add vhost_vdpa_load_setup > > vdpa: approve switchover after memory map in the migration destination > > vdpa: add vhost_vdpa_net_load_setup NetClient callback > > vdpa: add vhost_vdpa_net_switchover_ack_needed > > virtio_net: register incremental migration handlers > > > > include/hw/virtio/vhost-vdpa.h | 32 > > include/net/net.h | 8 + > > hw/net/virtio-net.c| 48 ++ > > hw/virtio/vhost-vdpa.c | 274 +++-- > > net/vhost-vdpa.c | 43 +- > > 5 files changed, 357 insertions(+), 48 deletions(-) > > > > -- > > 2.39.3 > > >
Re: [PATCH for 9.0 00/12] Map memory at destination .load_setup in vDPA-net migration
On Fri, Dec 15, 2023 at 06:28:18PM +0100, Eugenio Pérez wrote: > Current memory operations like pinning may take a lot of time at the > destination. Currently they are done after the source of the migration is > stopped, and before the workload is resumed at the destination. This is a > period where neigher traffic can flow, nor the VM workload can continue > (downtime). > > We can do better as we know the memory layout of the guest RAM at the > destination from the moment the migration starts. Moving that operation > allows > QEMU to communicate the kernel the maps while the workload is still running in > the source, so Linux can start mapping them. > > Also, the destination of the guest memory may finish before the destination > QEMU maps all the memory. In this case, the rest of the memory will be mapped > at the same time as before applying this series, when the device is starting. > So we're only improving with this series. > > If the destination has the switchover_ack capability enabled, the destination > hold the migration until all the memory is mapped. > > This needs to be applied on top of [1]. That series performs some code > reorganization that allows to map the guest memory without knowing the queue > layout the guest configure on the device. > > This series reduced the downtime in the stop-and-copy phase of the live > migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa devices, > per [2]. I think this is reasonable and could be applied - batching is good. Could you rebase on master and repost please? > Future directions on top of this series may include: > * Iterative migration of virtio-net devices, as it may reduce downtime per > [3]. > vhost-vdpa net can apply the configuration through CVQ in the destination > while the source is still migrating. > * Move more things ahead of migration time, like DRIVER_OK. > * Check that the devices of the destination are valid, and cancel the > migration > in case it is not. > > v1 from RFC v2: > * Hold on migration if memory has not been mapped in full with switchover_ack. > * Revert map if the device is not started. > > RFC v2: > * Delegate map to another thread so it does no block QMP. > * Fix not allocating iova_tree if x-svq=on at the destination. > * Rebased on latest master. > * More cleanups of current code, that might be split from this series too. > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html > [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html > [3] > https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/ > > Eugenio Pérez (12): > vdpa: do not set virtio status bits if unneeded > vdpa: make batch_begin_once early return > vdpa: merge _begin_batch into _batch_begin_once > vdpa: extract out _dma_end_batch from _listener_commit > vdpa: factor out stop path of vhost_vdpa_dev_start > vdpa: check for iova tree initialized at net_client_start > vdpa: set backend capabilities at vhost_vdpa_init > vdpa: add vhost_vdpa_load_setup > vdpa: approve switchover after memory map in the migration destination > vdpa: add vhost_vdpa_net_load_setup NetClient callback > vdpa: add vhost_vdpa_net_switchover_ack_needed > virtio_net: register incremental migration handlers > > include/hw/virtio/vhost-vdpa.h | 32 > include/net/net.h | 8 + > hw/net/virtio-net.c| 48 ++ > hw/virtio/vhost-vdpa.c | 274 +++-- > net/vhost-vdpa.c | 43 +- > 5 files changed, 357 insertions(+), 48 deletions(-) > > -- > 2.39.3 >
Re: [PATCH for 9.0 00/12] Map memory at destination .load_setup in vDPA-net migration
QE tested this series with regression tests, there are no new regression issues. Tested-by: Lei Yang On Sat, Dec 16, 2023 at 1:28 AM Eugenio Pérez wrote: > > Current memory operations like pinning may take a lot of time at the > destination. Currently they are done after the source of the migration is > stopped, and before the workload is resumed at the destination. This is a > period where neigher traffic can flow, nor the VM workload can continue > (downtime). > > We can do better as we know the memory layout of the guest RAM at the > destination from the moment the migration starts. Moving that operation > allows > QEMU to communicate the kernel the maps while the workload is still running in > the source, so Linux can start mapping them. > > Also, the destination of the guest memory may finish before the destination > QEMU maps all the memory. In this case, the rest of the memory will be mapped > at the same time as before applying this series, when the device is starting. > So we're only improving with this series. > > If the destination has the switchover_ack capability enabled, the destination > hold the migration until all the memory is mapped. > > This needs to be applied on top of [1]. That series performs some code > reorganization that allows to map the guest memory without knowing the queue > layout the guest configure on the device. > > This series reduced the downtime in the stop-and-copy phase of the live > migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa devices, > per [2]. > > Future directions on top of this series may include: > * Iterative migration of virtio-net devices, as it may reduce downtime per > [3]. > vhost-vdpa net can apply the configuration through CVQ in the destination > while the source is still migrating. > * Move more things ahead of migration time, like DRIVER_OK. > * Check that the devices of the destination are valid, and cancel the > migration > in case it is not. > > v1 from RFC v2: > * Hold on migration if memory has not been mapped in full with switchover_ack. > * Revert map if the device is not started. > > RFC v2: > * Delegate map to another thread so it does no block QMP. > * Fix not allocating iova_tree if x-svq=on at the destination. > * Rebased on latest master. > * More cleanups of current code, that might be split from this series too. > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html > [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html > [3] > https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/ > > Eugenio Pérez (12): > vdpa: do not set virtio status bits if unneeded > vdpa: make batch_begin_once early return > vdpa: merge _begin_batch into _batch_begin_once > vdpa: extract out _dma_end_batch from _listener_commit > vdpa: factor out stop path of vhost_vdpa_dev_start > vdpa: check for iova tree initialized at net_client_start > vdpa: set backend capabilities at vhost_vdpa_init > vdpa: add vhost_vdpa_load_setup > vdpa: approve switchover after memory map in the migration destination > vdpa: add vhost_vdpa_net_load_setup NetClient callback > vdpa: add vhost_vdpa_net_switchover_ack_needed > virtio_net: register incremental migration handlers > > include/hw/virtio/vhost-vdpa.h | 32 > include/net/net.h | 8 + > hw/net/virtio-net.c| 48 ++ > hw/virtio/vhost-vdpa.c | 274 +++-- > net/vhost-vdpa.c | 43 +- > 5 files changed, 357 insertions(+), 48 deletions(-) > > -- > 2.39.3 > >