RE: [RFC 0/7] migration patches for VFIO

2022-10-20 Thread Yishai Hadas
> From: Qemu-devel  bounces+yishaih=nvidia@nongnu.org> On Behalf Of Juan Quintela
> Sent: Monday, 3 October 2022 6:16
> To: qemu-de...@nongnu.org
> Cc: Alex Williamson ; Eric Blake
> ; Stefan Hajnoczi ; Fam
> Zheng ; qemu-s3...@nongnu.org; Cornelia Huck
> ; Thomas Huth ; Vladimir
> Sementsov-Ogievskiy ; Laurent Vivier
> ; John Snow ; Dr. David Alan
> Gilbert ; Christian Borntraeger
> ; Halil Pasic ; Juan
> Quintela ; Paolo Bonzini ;
> qemu-block@nongnu.org; Eric Farman ; Richard
> Henderson ; David Hildenbrand
> 
> Subject: [RFC 0/7] migration patches for VFIO
> 
> Hi
> 
> VFIO migration has several requirements:
> - the size of the state is only known when the guest is stopped

As was discussed in the conference call, I just sent a patch to the kernel 
mailing list to be able to get the state size in each state.

See:
https://patchwork.kernel.org/project/kvm/patch/20221020132109.112708-1-yish...@nvidia.com/

This can drop the need to stop the guest and ask for that data.

So, I assume that you can drop some complexity and hacks from your RFC once 
you'll send the next series.

Specifically,
No need to stop the VM and re-start it in case the SLA can't meet, just read 
upon RUNNING the estimated data length that will be required to complete 
STOP_COPY and use it.

Yishai

> - they need to send possible lots of data.
> 
> this series only address the 1st set of problems.
> 
> What they do:
> - res_compatible parameter was not used anywhere, just add that
> information to res_postcopy.
> - Remove QEMUFILE parameter from save_live_pending
> - Split save_live_pending into
>   * save_pending_estimate(): the pending state size without trying too hard
>   * save_pending_exact(): the real pending state size, it is called with the
> guest stopped.
> - Now save_pending_* don't need the threshold parameter
> - HACK a way to stop the guest before moving there.
> 
> ToDo:
> - autoconverge test is broken, no real clue why, but it is possible that the 
> test
> is wrong.
> 
> - Make an artifact to be able to send massive amount of data in the save
> state stage (probably more multifd channels).
> 
> - Be able to not having to start the guest between cheking the state pending
> size and migration_completion().
> 
> Please review.
> 
> Thanks, Juan.
> 
> Juan Quintela (7):
>   migration: Remove res_compatible parameter
>   migration: No save_live_pending() method uses the QEMUFile parameter
>   migration: Block migration comment or code is wrong
>   migration: Split save_live_pending() into state_pending_*
>   migration: Remove unused threshold_size parameter
>   migration: simplify migration_iteration_run()
>   migration: call qemu_savevm_state_pending_exact() with the guest
> stopped
> 
>  docs/devel/migration.rst   | 18 ++--
>  docs/devel/vfio-migration.rst  |  4 +--
>  include/migration/register.h   | 29 ++-
>  migration/savevm.h |  8 +++---
>  hw/s390x/s390-stattrib.c   | 11 ---
>  hw/vfio/migration.c| 17 +--
>  migration/block-dirty-bitmap.c | 14 -
>  migration/block.c  | 17 ++-
>  migration/migration.c  | 52 ++
>  migration/ram.c| 35 ---
>  migration/savevm.c | 37 +---
>  tests/qtest/migration-test.c   |  3 +-
>  hw/vfio/trace-events   |  2 +-
>  migration/trace-events |  7 +++--
>  14 files changed, 148 insertions(+), 106 deletions(-)
> 
> --
> 2.37.2
> 



[RFC 0/7] migration patches for VFIO

2022-10-02 Thread Juan Quintela
Hi

VFIO migration has several requirements:
- the size of the state is only known when the guest is stopped
- they need to send possible lots of data.

this series only address the 1st set of problems.

What they do:
- res_compatible parameter was not used anywhere, just add that information to 
res_postcopy.
- Remove QEMUFILE parameter from save_live_pending
- Split save_live_pending into
  * save_pending_estimate(): the pending state size without trying too hard
  * save_pending_exact(): the real pending state size, it is called with the 
guest stopped.
- Now save_pending_* don't need the threshold parameter
- HACK a way to stop the guest before moving there.

ToDo:
- autoconverge test is broken, no real clue why, but it is possible that the 
test is wrong.

- Make an artifact to be able to send massive amount of data in the save state 
stage (probably more multifd channels).

- Be able to not having to start the guest between cheking the state pending 
size and migration_completion().

Please review.

Thanks, Juan.

Juan Quintela (7):
  migration: Remove res_compatible parameter
  migration: No save_live_pending() method uses the QEMUFile parameter
  migration: Block migration comment or code is wrong
  migration: Split save_live_pending() into state_pending_*
  migration: Remove unused threshold_size parameter
  migration: simplify migration_iteration_run()
  migration: call qemu_savevm_state_pending_exact() with the guest
stopped

 docs/devel/migration.rst   | 18 ++--
 docs/devel/vfio-migration.rst  |  4 +--
 include/migration/register.h   | 29 ++-
 migration/savevm.h |  8 +++---
 hw/s390x/s390-stattrib.c   | 11 ---
 hw/vfio/migration.c| 17 +--
 migration/block-dirty-bitmap.c | 14 -
 migration/block.c  | 17 ++-
 migration/migration.c  | 52 ++
 migration/ram.c| 35 ---
 migration/savevm.c | 37 +---
 tests/qtest/migration-test.c   |  3 +-
 hw/vfio/trace-events   |  2 +-
 migration/trace-events |  7 +++--
 14 files changed, 148 insertions(+), 106 deletions(-)

-- 
2.37.2