Hi,

On Fri, 14 Mar 2025 at 01:40, Peter Xu <[email protected]> wrote:
>+        save_section_header(f, se, QEMU_VM_SECTION_PART);
> +        ram_save_zero_page(f, se->opaque);
>I'll stop requesting a why here...

* Earlier in this thread you mentioned 'We need a header'. I took it
as a 'RAM page' header, not save_section_header(). Section type
(QEMU_VM_COMMAND) was sent by qemu_savevm_command_send() as well.

> but I think this is another example that even if all the tests pass it may 
> not be correct.

* This is also an example of - communication is hard.

> From f9343dfc777ef04168443e86a1fa3922296ea563 Mon Sep 17 00:00:00 2001
> From: Peter Xu <[email protected]>
> Date: Thu, 13 Mar 2025 15:34:10 -0400
> Subject: [PATCH 1/2] migration: Add save_postcopy_prepare() savevm handler
>
> Add a savevm handler for a module to opt-in sending extra sections right
> before postcopy starts, and before VM is stopped.
>
> RAM will start to use this new savevm handler in the next patch to do flush
> and sync for multifd pages.
>
> Note that we choose to do it before VM stopped because the current only
> potential user is not sensitive to VM status, so doing it before VM is
> stopped is preferred to enlarge any postcopy downtime.
>
> It is still a bit unfortunate that we need to introduce such a new savevm
> handler just for the only use case, however it's so far the cleanest.
>
> Signed-off-by: Peter Xu <[email protected]>
> ---
>  include/migration/register.h | 15 +++++++++++++++
>  migration/savevm.h           |  1 +
>  migration/migration.c        |  4 ++++
>  migration/savevm.c           | 33 +++++++++++++++++++++++++++++++++
>  4 files changed, 53 insertions(+)
>
> diff --git a/include/migration/register.h b/include/migration/register.h
> index c041ce32f2..b79dc81b8d 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -189,6 +189,21 @@ typedef struct SaveVMHandlers {
>
>      /* This runs outside the BQL!  */
>
> +    /**
> +     * @save_postcopy_prepare
> +     *
> +     * This hook will be invoked on the source side right before switching
> +     * to postcopy (before VM stopped).
> +     *
> +     * @f:      QEMUFile where to send the data
> +     * @opaque: Data pointer passed to register_savevm_live()
> +     * @errp:   Error** used to report error message
> +     *
> +     * Returns: true if succeeded, false if error occured.  When false is
> +     * returned, @errp must be set.
> +     */
> +    bool (*save_postcopy_prepare)(QEMUFile *f, void *opaque, Error **errp);
> +
>      /**
>       * @state_pending_estimate
>       *
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 138c39a7f9..2d5e9c7166 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -45,6 +45,7 @@ void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
>  void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
>                                          uint64_t *can_postcopy);
>  int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool 
> in_postcopy);
> +bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp);
>  void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
>  void qemu_savevm_send_open_return_path(QEMUFile *f);
>  int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
> diff --git a/migration/migration.c b/migration/migration.c
> index d46e776e24..212f6b4145 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2707,6 +2707,10 @@ static int postcopy_start(MigrationState *ms, Error 
> **errp)
>          }
>      }
>
> +    if (!qemu_savevm_state_postcopy_prepare(ms->to_dst_file, errp)) {
> +        return -1;
> +    }
> +
>      trace_postcopy_start();
>      bql_lock();
>      trace_postcopy_start_set_run();
> diff --git a/migration/savevm.c b/migration/savevm.c
> index ce158c3512..23ef4c7dc9 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1523,6 +1523,39 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
>      qemu_fflush(f);
>  }
>
> +bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp)
> +{
> +    SaveStateEntry *se;
> +    bool ret;
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (!se->ops || !se->ops->save_postcopy_prepare) {
> +            continue;
> +        }
> +
> +        if (se->ops->is_active) {
> +            if (!se->ops->is_active(se->opaque)) {
> +                continue;
> +            }
> +        }
> +
> +        trace_savevm_section_start(se->idstr, se->section_id);
> +
> +        save_section_header(f, se, QEMU_VM_SECTION_PART);
> +        ret = se->ops->save_postcopy_prepare(f, se->opaque, errp);
> +        save_section_footer(f, se);
> +
> +        trace_savevm_section_end(se->idstr, se->section_id, ret);
> +
> +        if (!ret) {
> +            assert(*errp);
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
>  int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool 
> in_postcopy)
>  {
>      int64_t start_ts_each, end_ts_each;
> --
> 2.47.0
>
>
> From 299e1cdd9b28802f361ed012673825685e30f965 Mon Sep 17 00:00:00 2001
> From: Peter Xu <[email protected]>
> Date: Thu, 13 Mar 2025 15:56:01 -0400
> Subject: [PATCH 2/2] migration/ram: Implement save_postcopy_prepare()
>
> Implement save_postcopy_prepare(), preparing for the enablement of both
> multifd and postcopy.
>
> Please see the rich comment for the rationals.
>
> Signed-off-by: Peter Xu <[email protected]>
> ---
>  migration/ram.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 424df6d9f1..119e7d3ac2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -4420,6 +4420,42 @@ static int ram_resume_prepare(MigrationState *s, void 
> *opaque)
>      return 0;
>  }
>
> +static bool ram_save_postcopy_prepare(QEMUFile *f, void *opaque, Error 
> **errp)
> +{
> +    int ret;
> +
> +    if (migrate_multifd()) {
> +        /*
> +         * When multifd is enabled, source QEMU needs to make sure all the
> +         * pages queued before postcopy starts to be flushed.
> +         *
> +         * Meanwhile, the load of these pages must happen before switching
> +         * to postcopy.  It's because loading of guest pages (so far) in
> +         * multifd recv threads is still non-atomic, so the load cannot
> +         * happen with vCPUs running on destination side.
> +         *
> +         * This flush and sync will guarantee those pages loaded _before_
> +         * postcopy starts on destination. The rational is, this happens
> +         * before VM stops (and before source QEMU sends all the rest of
> +         * the postcopy messages).  So when the destination QEMU received
> +         * the postcopy messages, it must have received the sync message on
> +         * the main channel (either RAM_SAVE_FLAG_MULTIFD_FLUSH, or
> +         * RAM_SAVE_FLAG_EOS), and such message should have guaranteed all
> +         * previous guest pages queued in the multifd channels to be
> +         * completely loaded.
> +         */
> +        ret = multifd_ram_flush_and_sync(f);
> +        if (ret < 0) {
> +            error_setg(errp, "%s: multifd flush and sync failed", __func__);
> +            return false;
> +        }
> +    }
> +
> +    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> +
> +    return true;
> +}
> +
>  void postcopy_preempt_shutdown_file(MigrationState *s)
>  {
>      qemu_put_be64(s->postcopy_qemufile_src, RAM_SAVE_FLAG_EOS);
> @@ -4439,6 +4475,7 @@ static SaveVMHandlers savevm_ram_handlers = {
>      .load_setup = ram_load_setup,
>      .load_cleanup = ram_load_cleanup,
>      .resume_prepare = ram_resume_prepare,
> +    .save_postcopy_prepare = ram_save_postcopy_prepare,
>  };
>
>  static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
> --
> 2.47.0

* I get the infrastructural changes that they'll help to take
'section' specific action before postcopy starts. It's not clear how
tying flush and sync with a RAM section helps; because on the
destination side 'section' is only used to call
se->ops->load_state()->ram_load->ram_load_precopy()->multifd_recv_sync_main().

* To confirm:
    -  Benefit of this approach is that 'flush and sync' works via
vmstate_load -> se->ops->load_state() -> ram_load ->
ram_load_precopy() sequence?

* Thank you for the patches, I'll send a revised patchset including them.

Thank you.
---
  - Prasad


Reply via email to