Hi, On Fri, 14 Mar 2025 at 01:40, Peter Xu <[email protected]> wrote: >+ save_section_header(f, se, QEMU_VM_SECTION_PART); > + ram_save_zero_page(f, se->opaque); >I'll stop requesting a why here...
* Earlier in this thread you mentioned 'We need a header'. I took it as a 'RAM page' header, not save_section_header(). Section type (QEMU_VM_COMMAND) was sent by qemu_savevm_command_send() as well. > but I think this is another example that even if all the tests pass it may > not be correct. * This is also an example of - communication is hard. > From f9343dfc777ef04168443e86a1fa3922296ea563 Mon Sep 17 00:00:00 2001 > From: Peter Xu <[email protected]> > Date: Thu, 13 Mar 2025 15:34:10 -0400 > Subject: [PATCH 1/2] migration: Add save_postcopy_prepare() savevm handler > > Add a savevm handler for a module to opt-in sending extra sections right > before postcopy starts, and before VM is stopped. > > RAM will start to use this new savevm handler in the next patch to do flush > and sync for multifd pages. > > Note that we choose to do it before VM stopped because the current only > potential user is not sensitive to VM status, so doing it before VM is > stopped is preferred to enlarge any postcopy downtime. > > It is still a bit unfortunate that we need to introduce such a new savevm > handler just for the only use case, however it's so far the cleanest. > > Signed-off-by: Peter Xu <[email protected]> > --- > include/migration/register.h | 15 +++++++++++++++ > migration/savevm.h | 1 + > migration/migration.c | 4 ++++ > migration/savevm.c | 33 +++++++++++++++++++++++++++++++++ > 4 files changed, 53 insertions(+) > > diff --git a/include/migration/register.h b/include/migration/register.h > index c041ce32f2..b79dc81b8d 100644 > --- a/include/migration/register.h > +++ b/include/migration/register.h > @@ -189,6 +189,21 @@ typedef struct SaveVMHandlers { > > /* This runs outside the BQL! */ > > + /** > + * @save_postcopy_prepare > + * > + * This hook will be invoked on the source side right before switching > + * to postcopy (before VM stopped). > + * > + * @f: QEMUFile where to send the data > + * @opaque: Data pointer passed to register_savevm_live() > + * @errp: Error** used to report error message > + * > + * Returns: true if succeeded, false if error occured. When false is > + * returned, @errp must be set. > + */ > + bool (*save_postcopy_prepare)(QEMUFile *f, void *opaque, Error **errp); > + > /** > * @state_pending_estimate > * > diff --git a/migration/savevm.h b/migration/savevm.h > index 138c39a7f9..2d5e9c7166 100644 > --- a/migration/savevm.h > +++ b/migration/savevm.h > @@ -45,6 +45,7 @@ void qemu_savevm_state_pending_exact(uint64_t *must_precopy, > void qemu_savevm_state_pending_estimate(uint64_t *must_precopy, > uint64_t *can_postcopy); > int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool > in_postcopy); > +bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp); > void qemu_savevm_send_ping(QEMUFile *f, uint32_t value); > void qemu_savevm_send_open_return_path(QEMUFile *f); > int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len); > diff --git a/migration/migration.c b/migration/migration.c > index d46e776e24..212f6b4145 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -2707,6 +2707,10 @@ static int postcopy_start(MigrationState *ms, Error > **errp) > } > } > > + if (!qemu_savevm_state_postcopy_prepare(ms->to_dst_file, errp)) { > + return -1; > + } > + > trace_postcopy_start(); > bql_lock(); > trace_postcopy_start_set_run(); > diff --git a/migration/savevm.c b/migration/savevm.c > index ce158c3512..23ef4c7dc9 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1523,6 +1523,39 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f) > qemu_fflush(f); > } > > +bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp) > +{ > + SaveStateEntry *se; > + bool ret; > + > + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { > + if (!se->ops || !se->ops->save_postcopy_prepare) { > + continue; > + } > + > + if (se->ops->is_active) { > + if (!se->ops->is_active(se->opaque)) { > + continue; > + } > + } > + > + trace_savevm_section_start(se->idstr, se->section_id); > + > + save_section_header(f, se, QEMU_VM_SECTION_PART); > + ret = se->ops->save_postcopy_prepare(f, se->opaque, errp); > + save_section_footer(f, se); > + > + trace_savevm_section_end(se->idstr, se->section_id, ret); > + > + if (!ret) { > + assert(*errp); > + return false; > + } > + } > + > + return true; > +} > + > int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool > in_postcopy) > { > int64_t start_ts_each, end_ts_each; > -- > 2.47.0 > > > From 299e1cdd9b28802f361ed012673825685e30f965 Mon Sep 17 00:00:00 2001 > From: Peter Xu <[email protected]> > Date: Thu, 13 Mar 2025 15:56:01 -0400 > Subject: [PATCH 2/2] migration/ram: Implement save_postcopy_prepare() > > Implement save_postcopy_prepare(), preparing for the enablement of both > multifd and postcopy. > > Please see the rich comment for the rationals. > > Signed-off-by: Peter Xu <[email protected]> > --- > migration/ram.c | 37 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 37 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c > index 424df6d9f1..119e7d3ac2 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -4420,6 +4420,42 @@ static int ram_resume_prepare(MigrationState *s, void > *opaque) > return 0; > } > > +static bool ram_save_postcopy_prepare(QEMUFile *f, void *opaque, Error > **errp) > +{ > + int ret; > + > + if (migrate_multifd()) { > + /* > + * When multifd is enabled, source QEMU needs to make sure all the > + * pages queued before postcopy starts to be flushed. > + * > + * Meanwhile, the load of these pages must happen before switching > + * to postcopy. It's because loading of guest pages (so far) in > + * multifd recv threads is still non-atomic, so the load cannot > + * happen with vCPUs running on destination side. > + * > + * This flush and sync will guarantee those pages loaded _before_ > + * postcopy starts on destination. The rational is, this happens > + * before VM stops (and before source QEMU sends all the rest of > + * the postcopy messages). So when the destination QEMU received > + * the postcopy messages, it must have received the sync message on > + * the main channel (either RAM_SAVE_FLAG_MULTIFD_FLUSH, or > + * RAM_SAVE_FLAG_EOS), and such message should have guaranteed all > + * previous guest pages queued in the multifd channels to be > + * completely loaded. > + */ > + ret = multifd_ram_flush_and_sync(f); > + if (ret < 0) { > + error_setg(errp, "%s: multifd flush and sync failed", __func__); > + return false; > + } > + } > + > + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); > + > + return true; > +} > + > void postcopy_preempt_shutdown_file(MigrationState *s) > { > qemu_put_be64(s->postcopy_qemufile_src, RAM_SAVE_FLAG_EOS); > @@ -4439,6 +4475,7 @@ static SaveVMHandlers savevm_ram_handlers = { > .load_setup = ram_load_setup, > .load_cleanup = ram_load_cleanup, > .resume_prepare = ram_resume_prepare, > + .save_postcopy_prepare = ram_save_postcopy_prepare, > }; > > static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host, > -- > 2.47.0 * I get the infrastructural changes that they'll help to take 'section' specific action before postcopy starts. It's not clear how tying flush and sync with a RAM section helps; because on the destination side 'section' is only used to call se->ops->load_state()->ram_load->ram_load_precopy()->multifd_recv_sync_main(). * To confirm: - Benefit of this approach is that 'flush and sync' works via vmstate_load -> se->ops->load_state() -> ram_load -> ram_load_precopy() sequence? * Thank you for the patches, I'll send a revised patchset including them. Thank you. --- - Prasad
