Re: [PATCH for 9.0 00/12] Map memory at destination .load_setup in vDPA-net migration

2024-01-02 Thread Eugenio Perez Martin
On Mon, Dec 25, 2023 at 5:31 PM Michael S. Tsirkin  wrote:
>
> On Fri, Dec 15, 2023 at 06:28:18PM +0100, Eugenio Pérez wrote:
> > Current memory operations like pinning may take a lot of time at the
> > destination.  Currently they are done after the source of the migration is
> > stopped, and before the workload is resumed at the destination.  This is a
> > period where neigher traffic can flow, nor the VM workload can continue
> > (downtime).
> >
> > We can do better as we know the memory layout of the guest RAM at the
> > destination from the moment the migration starts.  Moving that operation 
> > allows
> > QEMU to communicate the kernel the maps while the workload is still running 
> > in
> > the source, so Linux can start mapping them.
> >
> > Also, the destination of the guest memory may finish before the destination
> > QEMU maps all the memory.  In this case, the rest of the memory will be 
> > mapped
> > at the same time as before applying this series, when the device is 
> > starting.
> > So we're only improving with this series.
> >
> > If the destination has the switchover_ack capability enabled, the 
> > destination
> > hold the migration until all the memory is mapped.
> >
> > This needs to be applied on top of [1]. That series performs some code
> > reorganization that allows to map the guest memory without knowing the queue
> > layout the guest configure on the device.
> >
> > This series reduced the downtime in the stop-and-copy phase of the live
> > migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa 
> > devices,
> > per [2].
>
> I think this is reasonable and could be applied - batching is good.
> Could you rebase on master and repost please?
>

New comments appeared in the meantime [1], but I'll rebase with the
needed changes after they converge.

Thanks!

[1] https://patchwork.kernel.org/comment/25653487/


> > Future directions on top of this series may include:
> > * Iterative migration of virtio-net devices, as it may reduce downtime per 
> > [3].
> >   vhost-vdpa net can apply the configuration through CVQ in the destination
> >   while the source is still migrating.
> > * Move more things ahead of migration time, like DRIVER_OK.
> > * Check that the devices of the destination are valid, and cancel the 
> > migration
> >   in case it is not.
> >
> > v1 from RFC v2:
> > * Hold on migration if memory has not been mapped in full with 
> > switchover_ack.
> > * Revert map if the device is not started.
> >
> > RFC v2:
> > * Delegate map to another thread so it does no block QMP.
> > * Fix not allocating iova_tree if x-svq=on at the destination.
> > * Rebased on latest master.
> > * More cleanups of current code, that might be split from this series too.
> >
> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html
> > [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html
> > [3] 
> > https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/
> >
> > Eugenio Pérez (12):
> >   vdpa: do not set virtio status bits if unneeded
> >   vdpa: make batch_begin_once early return
> >   vdpa: merge _begin_batch into _batch_begin_once
> >   vdpa: extract out _dma_end_batch from _listener_commit
> >   vdpa: factor out stop path of vhost_vdpa_dev_start
> >   vdpa: check for iova tree initialized at net_client_start
> >   vdpa: set backend capabilities at vhost_vdpa_init
> >   vdpa: add vhost_vdpa_load_setup
> >   vdpa: approve switchover after memory map in the migration destination
> >   vdpa: add vhost_vdpa_net_load_setup NetClient callback
> >   vdpa: add vhost_vdpa_net_switchover_ack_needed
> >   virtio_net: register incremental migration handlers
> >
> >  include/hw/virtio/vhost-vdpa.h |  32 
> >  include/net/net.h  |   8 +
> >  hw/net/virtio-net.c|  48 ++
> >  hw/virtio/vhost-vdpa.c | 274 +++--
> >  net/vhost-vdpa.c   |  43 +-
> >  5 files changed, 357 insertions(+), 48 deletions(-)
> >
> > --
> > 2.39.3
> >
>




Re: [PATCH for 9.0 00/12] Map memory at destination .load_setup in vDPA-net migration

2023-12-25 Thread Michael S. Tsirkin
On Fri, Dec 15, 2023 at 06:28:18PM +0100, Eugenio Pérez wrote:
> Current memory operations like pinning may take a lot of time at the
> destination.  Currently they are done after the source of the migration is
> stopped, and before the workload is resumed at the destination.  This is a
> period where neigher traffic can flow, nor the VM workload can continue
> (downtime).
> 
> We can do better as we know the memory layout of the guest RAM at the
> destination from the moment the migration starts.  Moving that operation 
> allows
> QEMU to communicate the kernel the maps while the workload is still running in
> the source, so Linux can start mapping them.
> 
> Also, the destination of the guest memory may finish before the destination
> QEMU maps all the memory.  In this case, the rest of the memory will be mapped
> at the same time as before applying this series, when the device is starting.
> So we're only improving with this series.
> 
> If the destination has the switchover_ack capability enabled, the destination
> hold the migration until all the memory is mapped.
> 
> This needs to be applied on top of [1]. That series performs some code
> reorganization that allows to map the guest memory without knowing the queue
> layout the guest configure on the device.
> 
> This series reduced the downtime in the stop-and-copy phase of the live
> migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa devices,
> per [2].

I think this is reasonable and could be applied - batching is good.
Could you rebase on master and repost please?

> Future directions on top of this series may include:
> * Iterative migration of virtio-net devices, as it may reduce downtime per 
> [3].
>   vhost-vdpa net can apply the configuration through CVQ in the destination
>   while the source is still migrating.
> * Move more things ahead of migration time, like DRIVER_OK.
> * Check that the devices of the destination are valid, and cancel the 
> migration
>   in case it is not.
> 
> v1 from RFC v2:
> * Hold on migration if memory has not been mapped in full with switchover_ack.
> * Revert map if the device is not started.
> 
> RFC v2:
> * Delegate map to another thread so it does no block QMP.
> * Fix not allocating iova_tree if x-svq=on at the destination.
> * Rebased on latest master.
> * More cleanups of current code, that might be split from this series too.
> 
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html
> [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html
> [3] 
> https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/
> 
> Eugenio Pérez (12):
>   vdpa: do not set virtio status bits if unneeded
>   vdpa: make batch_begin_once early return
>   vdpa: merge _begin_batch into _batch_begin_once
>   vdpa: extract out _dma_end_batch from _listener_commit
>   vdpa: factor out stop path of vhost_vdpa_dev_start
>   vdpa: check for iova tree initialized at net_client_start
>   vdpa: set backend capabilities at vhost_vdpa_init
>   vdpa: add vhost_vdpa_load_setup
>   vdpa: approve switchover after memory map in the migration destination
>   vdpa: add vhost_vdpa_net_load_setup NetClient callback
>   vdpa: add vhost_vdpa_net_switchover_ack_needed
>   virtio_net: register incremental migration handlers
> 
>  include/hw/virtio/vhost-vdpa.h |  32 
>  include/net/net.h  |   8 +
>  hw/net/virtio-net.c|  48 ++
>  hw/virtio/vhost-vdpa.c | 274 +++--
>  net/vhost-vdpa.c   |  43 +-
>  5 files changed, 357 insertions(+), 48 deletions(-)
> 
> -- 
> 2.39.3
> 




Re: [PATCH for 9.0 00/12] Map memory at destination .load_setup in vDPA-net migration

2023-12-24 Thread Lei Yang
QE tested this series with regression tests, there are no new regression issues.

Tested-by: Lei Yang 



On Sat, Dec 16, 2023 at 1:28 AM Eugenio Pérez  wrote:
>
> Current memory operations like pinning may take a lot of time at the
> destination.  Currently they are done after the source of the migration is
> stopped, and before the workload is resumed at the destination.  This is a
> period where neigher traffic can flow, nor the VM workload can continue
> (downtime).
>
> We can do better as we know the memory layout of the guest RAM at the
> destination from the moment the migration starts.  Moving that operation 
> allows
> QEMU to communicate the kernel the maps while the workload is still running in
> the source, so Linux can start mapping them.
>
> Also, the destination of the guest memory may finish before the destination
> QEMU maps all the memory.  In this case, the rest of the memory will be mapped
> at the same time as before applying this series, when the device is starting.
> So we're only improving with this series.
>
> If the destination has the switchover_ack capability enabled, the destination
> hold the migration until all the memory is mapped.
>
> This needs to be applied on top of [1]. That series performs some code
> reorganization that allows to map the guest memory without knowing the queue
> layout the guest configure on the device.
>
> This series reduced the downtime in the stop-and-copy phase of the live
> migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa devices,
> per [2].
>
> Future directions on top of this series may include:
> * Iterative migration of virtio-net devices, as it may reduce downtime per 
> [3].
>   vhost-vdpa net can apply the configuration through CVQ in the destination
>   while the source is still migrating.
> * Move more things ahead of migration time, like DRIVER_OK.
> * Check that the devices of the destination are valid, and cancel the 
> migration
>   in case it is not.
>
> v1 from RFC v2:
> * Hold on migration if memory has not been mapped in full with switchover_ack.
> * Revert map if the device is not started.
>
> RFC v2:
> * Delegate map to another thread so it does no block QMP.
> * Fix not allocating iova_tree if x-svq=on at the destination.
> * Rebased on latest master.
> * More cleanups of current code, that might be split from this series too.
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html
> [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html
> [3] 
> https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/
>
> Eugenio Pérez (12):
>   vdpa: do not set virtio status bits if unneeded
>   vdpa: make batch_begin_once early return
>   vdpa: merge _begin_batch into _batch_begin_once
>   vdpa: extract out _dma_end_batch from _listener_commit
>   vdpa: factor out stop path of vhost_vdpa_dev_start
>   vdpa: check for iova tree initialized at net_client_start
>   vdpa: set backend capabilities at vhost_vdpa_init
>   vdpa: add vhost_vdpa_load_setup
>   vdpa: approve switchover after memory map in the migration destination
>   vdpa: add vhost_vdpa_net_load_setup NetClient callback
>   vdpa: add vhost_vdpa_net_switchover_ack_needed
>   virtio_net: register incremental migration handlers
>
>  include/hw/virtio/vhost-vdpa.h |  32 
>  include/net/net.h  |   8 +
>  hw/net/virtio-net.c|  48 ++
>  hw/virtio/vhost-vdpa.c | 274 +++--
>  net/vhost-vdpa.c   |  43 +-
>  5 files changed, 357 insertions(+), 48 deletions(-)
>
> --
> 2.39.3
>
>