[PULL 2/2] MAINTAINERS: Adjust migration documentation files
From: Avihai Horon Commit 8cb2f8b172e7 ("docs/migration: Create migration/ directory") changed migration documentation file structure but forgot to update the entries in the MAINTAINERS file. Commit 4c6f8a79ae53 ("docs/migration: Split 'dirty limit'") extracted dirty limit documentation to a new file without updating dirty limit section in MAINTAINERS file. Fix the above. Fixes: 8cb2f8b172e7 ("docs/migration: Create migration/ directory") Fixes: 4c6f8a79ae53 ("docs/migration: Split 'dirty limit'") Signed-off-by: Avihai Horon Link: https://lore.kernel.org/r/20240407081125.13951-1-avih...@nvidia.com Signed-off-by: Peter Xu --- MAINTAINERS | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index e71183eef9..d3fc2a06e3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2170,7 +2170,7 @@ S: Supported F: hw/vfio/* F: include/hw/vfio/ F: docs/igd-assign.txt -F: docs/devel/vfio-migration.rst +F: docs/devel/migration/vfio.rst vfio-ccw M: Eric Farman @@ -2231,6 +2231,7 @@ F: qapi/virtio.json F: net/vhost-user.c F: include/hw/virtio/ F: docs/devel/virtio* +F: docs/devel/migration/virtio.rst virtio-balloon M: Michael S. Tsirkin @@ -3422,7 +3423,7 @@ F: migration/ F: scripts/vmstate-static-checker.py F: tests/vmstate-static-checker-data/ F: tests/qtest/migration-test.c -F: docs/devel/migration.rst +F: docs/devel/migration/ F: qapi/migration.json F: tests/migration/ F: util/userfaultfd.c @@ -3442,6 +3443,7 @@ F: include/sysemu/dirtylimit.h F: migration/dirtyrate.c F: migration/dirtyrate.h F: include/sysemu/dirtyrate.h +F: docs/devel/migration/dirty-limit.rst Detached LUKS header M: Hyman Huang -- 2.44.0
[PULL 0/2] Migration 20240407 patches
From: Peter Xu The following changes since commit ce64e6224affb8b4e4b019f76d2950270b391af5: Merge tag 'qemu-sparc-20240404' of https://github.com/mcayland/qemu into staging (2024-04-04 15:28:06 +0100) are available in the Git repository at: https://gitlab.com/peterx/qemu.git tags/migration-20240407-pull-request for you to fetch changes up to 8e0b21e375f0f6e6dbaeaecc1d52e2220f163e40: MAINTAINERS: Adjust migration documentation files (2024-04-07 14:40:55 -0400) Migration pull for 9.0-rc3 - Wei/Lei's fix on a rare postcopy race that can hang the channel (since 8.0) - Avihai's fix on maintainers file, points to the right doc links Avihai Horon (1): MAINTAINERS: Adjust migration documentation files Wei Wang (1): migration/postcopy: ensure preempt channel is ready before loading states MAINTAINERS| 6 -- migration/savevm.c | 21 + 2 files changed, 25 insertions(+), 2 deletions(-) -- 2.44.0
[PULL 1/2] migration/postcopy: ensure preempt channel is ready before loading states
From: Wei Wang Before loading the guest states, ensure that the preempt channel has been ready to use, as some of the states (e.g. via virtio_load) might trigger page faults that will be handled through the preempt channel. So yield to the main thread in the case that the channel create event hasn't been dispatched. Cc: qemu-stable Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread") Originally-by: Lei Wang Link: https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced...@intel.com/T/ Signed-off-by: Lei Wang Signed-off-by: Wei Wang Link: https://lore.kernel.org/r/20240405034056.23933-1-wei.w.w...@intel.com [peterx: add a todo section, add Fixes and copy stable for 8.0+] Signed-off-by: Peter Xu --- migration/savevm.c | 21 + 1 file changed, 21 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index 388d7af7cd..e7c1215671 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2342,6 +2342,27 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis) QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc)); +/* + * Before loading the guest states, ensure that the preempt channel has + * been ready to use, as some of the states (e.g. via virtio_load) might + * trigger page faults that will be handled through the preempt channel. + * So yield to the main thread in the case that the channel create event + * hasn't been dispatched. + * + * TODO: if we can move migration loadvm out of main thread, then we + * won't block main thread from polling the accept() fds. We can drop + * this as a whole when that is done. + */ +do { +if (!migrate_postcopy_preempt() || !qemu_in_coroutine() || +mis->postcopy_qemufile_dst) { +break; +} + +aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self()); +qemu_coroutine_yield(); +} while (1); + ret = qemu_loadvm_state_main(packf, mis); trace_loadvm_handle_cmd_packaged_main(ret); qemu_fclose(packf); -- 2.44.0
[PULL 0/2] Migration 20240331 patches
From: Peter Xu The following changes since commit b9dbf6f9bf533564f6a4277d03906fcd32bb0245: Merge tag 'pull-tcg-20240329' of https://gitlab.com/rth7680/qemu into staging (2024-03-30 14:54:57 +) are available in the Git repository at: https://gitlab.com/peterx/qemu.git tags/migration-20240331-pull-request for you to fetch changes up to d0ad271a7613459bd0a3397c8071a4ad06f3f7eb: migration/postcopy: Ensure postcopy_start() sets errp if it fails (2024-03-31 14:30:03 -0400) Migration pull for 9.0-rc2 - Avihai's two fixes on error paths Avihai Horon (2): migration: Set migration error in migration_completion() migration/postcopy: Ensure postcopy_start() sets errp if it fails migration/migration.c | 18 ++ 1 file changed, 18 insertions(+) -- 2.44.0
[PULL 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
From: Avihai Horon There are several places where postcopy_start() fails without setting errp. This can cause a null pointer de-reference, as in case of error, the caller of postcopy_start() copies/prints the error set in errp. Fix it by setting errp in all of postcopy_start() error paths. Cc: qemu-stable Fixes: 908927db28ea ("migration: Update error description whenever migration fails") Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240328140252.16756-3-avih...@nvidia.com Signed-off-by: Peter Xu --- migration/migration.c | 8 1 file changed, 8 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index b73ae3a72c..86bf76e925 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2510,6 +2510,8 @@ static int postcopy_start(MigrationState *ms, Error **errp) migration_wait_main_channel(ms); if (postcopy_preempt_establish_channel(ms)) { migrate_set_state(>state, ms->state, MIGRATION_STATUS_FAILED); +error_setg(errp, "%s: Failed to establish preempt channel", + __func__); return -1; } } @@ -2525,17 +2527,22 @@ static int postcopy_start(MigrationState *ms, Error **errp) ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE); if (ret < 0) { +error_setg_errno(errp, -ret, "%s: Failed to stop the VM", __func__); goto fail; } ret = migration_maybe_pause(ms, _state, MIGRATION_STATUS_POSTCOPY_ACTIVE); if (ret < 0) { +error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()", + __func__); goto fail; } ret = bdrv_inactivate_all(); if (ret < 0) { +error_setg_errno(errp, -ret, "%s: Failed in bdrv_inactivate_all()", + __func__); goto fail; } restart_block = true; @@ -2612,6 +2619,7 @@ static int postcopy_start(MigrationState *ms, Error **errp) /* Now send that blob */ if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) { +error_setg(errp, "%s: Failed to send packaged data", __func__); goto fail_closefb; } qemu_fclose(fb); -- 2.44.0
[PULL 1/2] migration: Set migration error in migration_completion()
From: Avihai Horon After commit 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()"), close_return_path_on_source() assumes that migration error is set if an error occurs during migration. This may not be true if migration errors in migration_completion(). For example, if qemu_savevm_state_complete_precopy() errors, migration error will not be set. This in turn, will cause a migration hang bug, similar to the bug that was fixed by commit 22b04245f0d5 ("migration: Join the return path thread before releasing to_dst_file"), as shutdown() will not be issued for the return-path channel. Fix it by ensuring migration error is set in case of error in migration_completion(). Signed-off-by: Avihai Horon Reviewed-by: Peter Xu Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()") Acked-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240328140252.16756-2-avih...@nvidia.com Signed-off-by: Peter Xu --- migration/migration.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 9fe8fd2afd..b73ae3a72c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s) { int ret = 0; int current_active_state = s->state; +Error *local_err = NULL; if (s->state == MIGRATION_STATUS_ACTIVE) { ret = migration_completion_precopy(s, _active_state); @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s) return; fail: +if (qemu_file_get_error_obj(s->to_dst_file, _err)) { +migrate_set_error(s, local_err); +error_free(local_err); +} else if (ret) { +error_setg_errno(_err, -ret, "Error in migration completion"); +migrate_set_error(s, local_err); +error_free(local_err); +} + migration_completion_failed(s, current_active_state); } -- 2.44.0
[PULL 2/3] migration/postcopy: Fix high frequency sync
From: Peter Xu With current code base I can observe extremely high sync count during precopy, as long as one enables postcopy-ram=on before switchover to postcopy. To provide some context of when QEMU decides to do a full sync: it checks must_precopy (which implies "data must be sent during precopy phase"), and as long as it is lower than the threshold size we calculated (out of bandwidth and expected downtime) QEMU will kick off the slow/exact sync. However, when postcopy is enabled (even if still during precopy phase), RAM only reports all pages as can_postcopy, and report must_precopy==0. Then "must_precopy <= threshold_size" mostly always triggers and enforces a slow sync for every call to migration_iteration_run() when postcopy is enabled even if not used. That is insane. It turns out it was a regress bug introduced in the previous refactoring in 8.0 as reported by Nina [1]: (a) c8df4a7aef ("migration: Split save_live_pending() into state_pending_*") Then a workaround patch is applied at the end of release (8.0-rc4) to fix it: (b) 28ef5339c3 ("migration: fix ram_state_pending_exact()") However that "workaround" was overlooked when during the cleanup in this 9.0 release in this commit.. (c) b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()") Then the issue was re-exposed as reported by Nina [1]. The problem with (b) is that it only fixed the case for RAM, rather than all the rest of iterators. Here a slow sync should only be required if all dirty data (precopy+postcopy) is less than the threshold_size that QEMU calculated. It is even debatable whether a sync is needed when switched to postcopy. Currently ram_state_pending_exact() will be mostly noop if switched to postcopy, and that logic seems to apply too for all the rest of iterators, as sync dirty bitmap during a postcopy doesn't make much sense. However let's leave such change for later, as we're in rc phase. So rather than reusing commit (b), this patch provides the complete fix for all iterators. When at it, cleanup a little bit on the lines around. [1] https://gitlab.com/qemu-project/qemu/-/issues/1565 Reported-by: Nina Schoetterl-Glausch Fixes: b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()") Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240320214453.584374-1-pet...@redhat.com Signed-off-by: Peter Xu --- migration/migration.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 047b6b49cf..9fe8fd2afd 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3199,17 +3199,16 @@ typedef enum { */ static MigIterateState migration_iteration_run(MigrationState *s) { -uint64_t must_precopy, can_postcopy; +uint64_t must_precopy, can_postcopy, pending_size; Error *local_err = NULL; bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; bool can_switchover = migration_can_switchover(s); qemu_savevm_state_pending_estimate(_precopy, _postcopy); -uint64_t pending_size = must_precopy + can_postcopy; - +pending_size = must_precopy + can_postcopy; trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy); -if (must_precopy <= s->threshold_size) { +if (pending_size < s->threshold_size) { qemu_savevm_state_pending_exact(_precopy, _postcopy); pending_size = must_precopy + can_postcopy; trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy); -- 2.44.0
[PULL 3/3] migration/multifd: Fix clearing of mapped-ram zero pages
From: Fabiano Rosas When the zero page detection is done in the multifd threads, we need to iterate the second part of the pages->offset array and clear the file bitmap for each zero page. The piece of code we merged to do that is wrong. The reason this has passed all the tests is because the bitmap is initialized with zeroes already, so clearing the bits only really has an effect during live migration and when a data page goes from having data to no data. Fixes: 303e6f54f9 ("migration/multifd: Implement zero page transmission on the multifd thread.") Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240321201242.6009-1-faro...@suse.de Signed-off-by: Peter Xu --- migration/multifd.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index d2f0238f70..2802afe79d 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -111,7 +111,6 @@ void multifd_send_channel_created(void) static void multifd_set_file_bitmap(MultiFDSendParams *p) { MultiFDPages_t *pages = p->pages; -uint32_t zero_num = p->pages->num - p->pages->normal_num; assert(pages->block); @@ -119,7 +118,7 @@ static void multifd_set_file_bitmap(MultiFDSendParams *p) ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], true); } -for (int i = p->pages->num; i < zero_num; i++) { +for (int i = p->pages->normal_num; i < p->pages->num; i++) { ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], false); } } -- 2.44.0
[PULL 0/3] Migration 20240322 patches
From: Peter Xu The following changes since commit 853546f8128476eefb701d4a55b2781bb3a46faa: Merge tag 'pull-loongarch-20240322' of https://gitlab.com/gaosong/qemu into staging (2024-03-22 10:59:57 +) are available in the Git repository at: https://gitlab.com/peterx/qemu.git tags/migration-20240322-pull-request for you to fetch changes up to 8fa1a21c6edc2bf7de85984944848ab9ac49e937: migration/multifd: Fix clearing of mapped-ram zero pages (2024-03-22 12:12:08 -0400) Migration pull for 9.0-rc1 - Fabiano's patch to revert fd: support on mapped-ram - Peter's fix on postcopy regression on unnecessary dirty syncs - Fabiano's fix on mapped-ram rare corrupt on zero page handling Fabiano Rosas (2): migration: Revert mapped-ram multifd support to fd: URI migration/multifd: Fix clearing of mapped-ram zero pages Peter Xu (1): migration/postcopy: Fix high frequency sync migration/fd.h | 2 -- migration/fd.c | 56 migration/file.c | 19 ++-- migration/migration.c| 20 ++--- migration/multifd.c | 5 +--- tests/qtest/migration-test.c | 43 --- 6 files changed, 12 insertions(+), 133 deletions(-) -- 2.44.0
[PULL 1/3] migration: Revert mapped-ram multifd support to fd: URI
From: Fabiano Rosas This reverts commit decdc76772c453ff1444612e910caa0d45cd8eac in full and also the relevant migration-tests from 7a09f092834641b7a793d50a3a261073bbb404a6. After the addition of the new QAPI-based migration address API in 8.2 we've been converting an "fd:" URI into a SocketAddress, missing the fact that the "fd:" syntax could also be used for a plain file instead of a socket. This is a problem because the SocketAddress is part of the API, so we're effectively asking users to create a "socket" channel to pass in a plain file. The easiest way to fix this situation is to deprecate the usage of both SocketAddress and "fd:" when used with a plain file for migration. Since this has been possible since 8.2, we can wait until 9.1 to deprecate it. For 9.0, however, we should avoid adding further support to migration to a plain file using the old "fd:" syntax or the new SocketAddress API, and instead require the usage of either the old-style "file:" URI or the FileMigrationArgs::filename field of the new API with the "/dev/fdset/NN" syntax, both of which are already supported. Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240319210941.1907-1-faro...@suse.de Signed-off-by: Peter Xu --- migration/fd.h | 2 -- migration/fd.c | 56 migration/file.c | 19 ++-- migration/migration.c| 13 - migration/multifd.c | 2 -- tests/qtest/migration-test.c | 43 --- 6 files changed, 8 insertions(+), 127 deletions(-) diff --git a/migration/fd.h b/migration/fd.h index 0c0a18d9e7..b901bc014e 100644 --- a/migration/fd.h +++ b/migration/fd.h @@ -20,6 +20,4 @@ void fd_start_incoming_migration(const char *fdname, Error **errp); void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp); -void fd_cleanup_outgoing_migration(void); -int fd_args_get_fd(void); #endif diff --git a/migration/fd.c b/migration/fd.c index fe0d096abd..449adaa2de 100644 --- a/migration/fd.c +++ b/migration/fd.c @@ -15,42 +15,19 @@ */ #include "qemu/osdep.h" -#include "qapi/error.h" #include "channel.h" #include "fd.h" #include "file.h" #include "migration.h" #include "monitor/monitor.h" -#include "io/channel-file.h" -#include "io/channel-socket.h" #include "io/channel-util.h" -#include "options.h" #include "trace.h" -static struct FdOutgoingArgs { -int fd; -} outgoing_args; - -int fd_args_get_fd(void) -{ -return outgoing_args.fd; -} - -void fd_cleanup_outgoing_migration(void) -{ -if (outgoing_args.fd > 0) { -close(outgoing_args.fd); -outgoing_args.fd = -1; -} -} - void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) { QIOChannel *ioc; int fd = monitor_get_fd(monitor_cur(), fdname, errp); -int newfd; - if (fd == -1) { return; } @@ -62,18 +39,6 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error ** return; } -/* - * This is dup()ed just to avoid referencing an fd that might - * be already closed by the iochannel. - */ -newfd = dup(fd); -if (newfd == -1) { -error_setg_errno(errp, errno, "Could not dup FD %d", fd); -object_unref(ioc); -return; -} -outgoing_args.fd = newfd; - qio_channel_set_name(ioc, "migration-fd-outgoing"); migration_channel_connect(s, ioc, NULL, NULL); object_unref(OBJECT(ioc)); @@ -104,20 +69,9 @@ void fd_start_incoming_migration(const char *fdname, Error **errp) return; } -if (migrate_multifd()) { -if (fd_is_socket(fd)) { -error_setg(errp, - "Multifd migration to a socket FD is not supported"); -object_unref(ioc); -return; -} - -file_create_incoming_channels(ioc, errp); -} else { -qio_channel_set_name(ioc, "migration-fd-incoming"); -qio_channel_add_watch_full(ioc, G_IO_IN, - fd_accept_incoming_migration, - NULL, NULL, - g_main_context_get_thread_default()); -} +qio_channel_set_name(ioc, "migration-fd-incoming"); +qio_channel_add_watch_full(ioc, G_IO_IN, + fd_accept_incoming_migration, + NULL, NULL, + g_main_context_get_thread_default()); } diff --git a/migration/file.c b/migration/file.c index b6e8ba13f2..ab18ba505a 100644 --- a/migration/file.c +++ b/migration/file.c @@ -11,7 +11,6 @@ #include "qemu/error-report.h" #include "qapi/error.h" #include "channel.h" -#include "fd.h" #include "file.h" #include "migration.h" #include "io/channel-file.h" @@ -55,27 +54,15 @@ bool file_send_channel_create(gpointer opaque, Error **errp) { QIOChannelFile
[PATCH] migration/postcopy: Fix high frequency sync
From: Peter Xu On current code base I can observe extremely high sync count during precopy, as long as one enables postcopy-ram=on before switchover to postcopy. To provide some context of when we decide to do a full sync: we check must_precopy (which implies "data must be sent during precopy phase"), and as long as it is lower than the threshold size we calculated (out of bandwidth and expected downtime) we will kick off the slow sync. However, when postcopy is enabled (even if still during precopy phase), RAM only reports all pages as can_postcopy, and report must_precopy==0. Then "must_precopy <= threshold_size" mostly always triggers and enforces a slow sync for every call to migration_iteration_run() when postcopy is enabled even if not used. That is insane. It turns out it was a regress bug introduced in the previous refactoring in QEMU 8.0 in late 2022. Fix this by checking the whole RAM size rather than must_precopy, like before. Not copy stable yet as many things changed, and even if this should be a major performance regression, no functional change has observed (and that's also probably why nobody found it). I only notice this when looking for another bug reported by Nina. When at it, cleanup a little bit on the lines around. Cc: Nina Schoetterl-Glausch Fixes: c8df4a7aef ("migration: Split save_live_pending() into state_pending_*") Signed-off-by: Peter Xu --- Nina: I copied you only because this might still be relevant, as this issue also misteriously points back to c8df4a7aef.. However I don't think it should be a fix of your problem, at most it can change the possibility of reproducability. This is not a regression for this release, but I still want to have it for 9.0. Fabiano, any opinions / objections? --- migration/migration.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 047b6b49cf..9fe8fd2afd 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3199,17 +3199,16 @@ typedef enum { */ static MigIterateState migration_iteration_run(MigrationState *s) { -uint64_t must_precopy, can_postcopy; +uint64_t must_precopy, can_postcopy, pending_size; Error *local_err = NULL; bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; bool can_switchover = migration_can_switchover(s); qemu_savevm_state_pending_estimate(_precopy, _postcopy); -uint64_t pending_size = must_precopy + can_postcopy; - +pending_size = must_precopy + can_postcopy; trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy); -if (must_precopy <= s->threshold_size) { +if (pending_size < s->threshold_size) { qemu_savevm_state_pending_exact(_precopy, _postcopy); pending_size = must_precopy + can_postcopy; trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy); -- 2.44.0
[PULL 05/10] physmem: Fix migration dirty bitmap coherency with TCG memory access
From: Nicholas Piggin The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large aligned ranges forgot to bring the TCG TLB up to date after clearing some of the dirty memory bitmap bits. This can result in stores though the TCG TLB not setting the dirty memory bitmap and ultimately causes memory corruption / lost updates during migration from a TCG host. Fix this by calling cpu_physical_memory_dirty_bits_cleared() when dirty bits have been cleared. Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time") Signed-off-by: Nicholas Piggin Tested-by: Thomas Huth Message-ID: <20240219061731.232570-1-npig...@gmail.com> [PMD: Split patch in 2: part 2/2, slightly adapt description] Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Richard Henderson Link: https://lore.kernel.org/r/20240312201458.79532-4-phi...@linaro.org Signed-off-by: Peter Xu --- include/exec/ram_addr.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index b060ea9176..de45ba7bc9 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -513,6 +513,9 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb, idx++; } } +if (num_dirty) { +cpu_physical_memory_dirty_bits_cleared(start, length); +} if (rb->clear_bmap) { /* -- 2.44.0
[PULL 10/10] migration/multifd: Duplicate the fd for the outgoing_args
From: Fabiano Rosas We currently store the file descriptor used during the main outgoing channel creation to use it again when creating the multifd channels. Since this fd is used for the first iochannel, there's risk that the QIOChannel gets freed and the fd closed while outgoing_args.fd still has it available. This could lead to an fd-reuse bug. Duplicate the outgoing_args fd to avoid this issue. Suggested-by: Peter Xu Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240315032040.7974-3-faro...@suse.de Signed-off-by: Peter Xu --- migration/fd.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/migration/fd.c b/migration/fd.c index c07030f715..fe0d096abd 100644 --- a/migration/fd.c +++ b/migration/fd.c @@ -49,8 +49,7 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error ** { QIOChannel *ioc; int fd = monitor_get_fd(monitor_cur(), fdname, errp); - -outgoing_args.fd = -1; +int newfd; if (fd == -1) { return; @@ -63,7 +62,17 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error ** return; } -outgoing_args.fd = fd; +/* + * This is dup()ed just to avoid referencing an fd that might + * be already closed by the iochannel. + */ +newfd = dup(fd); +if (newfd == -1) { +error_setg_errno(errp, errno, "Could not dup FD %d", fd); +object_unref(ioc); +return; +} +outgoing_args.fd = newfd; qio_channel_set_name(ioc, "migration-fd-outgoing"); migration_channel_connect(s, ioc, NULL, NULL); -- 2.44.0
[PULL 03/10] physmem: Expose tlb_reset_dirty_range_all()
From: Philippe Mathieu-Daudé In order to call tlb_reset_dirty_range_all() outside of system/physmem.c, expose its prototype. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Richard Henderson Link: https://lore.kernel.org/r/20240312201458.79532-2-phi...@linaro.org Signed-off-by: Peter Xu --- include/exec/exec-all.h | 1 + system/physmem.c| 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index ce36bb10d4..3e53501691 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -655,6 +655,7 @@ static inline void mmap_unlock(void) {} void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length); void tlb_set_dirty(CPUState *cpu, vaddr addr); +void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length); MemoryRegionSection * address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr, diff --git a/system/physmem.c b/system/physmem.c index 6cfb7a80ab..5441480ff0 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -819,7 +819,7 @@ found: return block; } -static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length) +void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length) { CPUState *cpu; ram_addr_t start1; -- 2.44.0
[PULL 01/10] io: Introduce qio_channel_file_new_dupfd
From: Fabiano Rosas Add a new helper function for creating a QIOChannelFile channel with a duplicated file descriptor. This saves the calling code from having to do error checking on the dup() call. Suggested-by: "Daniel P. Berrangé" Signed-off-by: Fabiano Rosas Reviewed-by: "Daniel P. Berrangé" Link: https://lore.kernel.org/r/2024031125.17299-2-faro...@suse.de Signed-off-by: Peter Xu --- include/io/channel-file.h | 18 ++ io/channel-file.c | 12 2 files changed, 30 insertions(+) diff --git a/include/io/channel-file.h b/include/io/channel-file.h index 50e8eb1138..d373a4e44d 100644 --- a/include/io/channel-file.h +++ b/include/io/channel-file.h @@ -68,6 +68,24 @@ struct QIOChannelFile { QIOChannelFile * qio_channel_file_new_fd(int fd); +/** + * qio_channel_file_new_dupfd: + * @fd: the file descriptor + * @errp: pointer to initialized error object + * + * Create a new IO channel object for a file represented by the @fd + * parameter. Like qio_channel_file_new_fd(), but the @fd is first + * duplicated with dup(). + * + * The channel will own the duplicated file descriptor and will take + * responsibility for closing it, the original FD is owned by the + * caller. + * + * Returns: the new channel object + */ +QIOChannelFile * +qio_channel_file_new_dupfd(int fd, Error **errp); + /** * qio_channel_file_new_path: * @path: the file path diff --git a/io/channel-file.c b/io/channel-file.c index a6ad7770c6..6436cfb6ae 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -45,6 +45,18 @@ qio_channel_file_new_fd(int fd) return ioc; } +QIOChannelFile * +qio_channel_file_new_dupfd(int fd, Error **errp) +{ +int newfd = dup(fd); + +if (newfd < 0) { +error_setg_errno(errp, errno, "Could not dup FD %d", fd); +return NULL; +} + +return qio_channel_file_new_fd(newfd); +} QIOChannelFile * qio_channel_file_new_path(const char *path, -- 2.44.0
[PULL 04/10] physmem: Factor cpu_physical_memory_dirty_bits_cleared() out
From: Nicholas Piggin Signed-off-by: Nicholas Piggin Tested-by: Thomas Huth Message-ID: <20240219061731.232570-1-npig...@gmail.com> [PMD: Split patch in 2: part 1/2] Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Richard Henderson Link: https://lore.kernel.org/r/20240312201458.79532-3-phi...@linaro.org Signed-off-by: Peter Xu --- include/exec/ram_addr.h | 9 + system/physmem.c| 8 +++- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 90676093f5..b060ea9176 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -25,6 +25,7 @@ #include "sysemu/tcg.h" #include "exec/ramlist.h" #include "exec/ramblock.h" +#include "exec/exec-all.h" extern uint64_t total_dirty_pages; @@ -443,6 +444,14 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap, } #endif /* not _WIN32 */ +static inline void cpu_physical_memory_dirty_bits_cleared(ram_addr_t start, + ram_addr_t length) +{ +if (tcg_enabled()) { +tlb_reset_dirty_range_all(start, length); +} + +} bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start, ram_addr_t length, unsigned client); diff --git a/system/physmem.c b/system/physmem.c index 5441480ff0..a4fe3d2bf8 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -881,8 +881,8 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start, memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size); } -if (dirty && tcg_enabled()) { -tlb_reset_dirty_range_all(start, length); +if (dirty) { +cpu_physical_memory_dirty_bits_cleared(start, length); } return dirty; @@ -929,9 +929,7 @@ DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty } } -if (tcg_enabled()) { -tlb_reset_dirty_range_all(start, length); -} +cpu_physical_memory_dirty_bits_cleared(start, length); memory_region_clear_dirty_bitmap(mr, offset, length); -- 2.44.0
[PULL 02/10] migration: Fix error handling after dup in file migration
From: Fabiano Rosas The file migration code was allowing a possible -1 from a failed call to dup() to propagate into the new QIOFileChannel::fd before checking for validity. Coverity doesn't like that, possibly due to the the lseek(-1, ...) call that would ensue before returning from the channel creation routine. Use the newly introduced qio_channel_file_dupfd() to properly check the return of dup() before proceeding. Fixes: CID 1539961 Fixes: CID 1539965 Fixes: CID 1539960 Fixes: 2dd7ee7a51 ("migration/multifd: Add incoming QIOChannelFile support") Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI") Reported-by: Peter Maydell Signed-off-by: Fabiano Rosas Reviewed-by: "Daniel P. Berrangé" Link: https://lore.kernel.org/r/2024031125.17299-3-faro...@suse.de Signed-off-by: Peter Xu --- migration/fd.c | 9 - migration/file.c | 14 +++--- 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/migration/fd.c b/migration/fd.c index d4ae72d132..4e2a63a73d 100644 --- a/migration/fd.c +++ b/migration/fd.c @@ -80,6 +80,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc, void fd_start_incoming_migration(const char *fdname, Error **errp) { QIOChannel *ioc; +QIOChannelFile *fioc; int fd = monitor_fd_param(monitor_cur(), fdname, errp); if (fd == -1) { return; @@ -103,15 +104,13 @@ void fd_start_incoming_migration(const char *fdname, Error **errp) int channels = migrate_multifd_channels(); while (channels--) { -ioc = QIO_CHANNEL(qio_channel_file_new_fd(dup(fd))); - -if (QIO_CHANNEL_FILE(ioc)->fd == -1) { -error_setg(errp, "Failed to duplicate fd %d", fd); +fioc = qio_channel_file_new_dupfd(fd, errp); +if (!fioc) { return; } qio_channel_set_name(ioc, "migration-fd-incoming"); -qio_channel_add_watch_full(ioc, G_IO_IN, +qio_channel_add_watch_full(QIO_CHANNEL(fioc), G_IO_IN, fd_accept_incoming_migration, NULL, NULL, g_main_context_get_thread_default()); diff --git a/migration/file.c b/migration/file.c index b0b963e0ce..e56c5eb0a5 100644 --- a/migration/file.c +++ b/migration/file.c @@ -58,12 +58,13 @@ bool file_send_channel_create(gpointer opaque, Error **errp) int fd = fd_args_get_fd(); if (fd && fd != -1) { -ioc = qio_channel_file_new_fd(dup(fd)); +ioc = qio_channel_file_new_dupfd(fd, errp); } else { ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp); -if (!ioc) { -goto out; -} +} + +if (!ioc) { +goto out; } multifd_channel_connect(opaque, QIO_CHANNEL(ioc)); @@ -147,10 +148,9 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp) NULL, NULL, g_main_context_get_thread_default()); -fioc = qio_channel_file_new_fd(dup(fioc->fd)); +fioc = qio_channel_file_new_dupfd(fioc->fd, errp); -if (!fioc || fioc->fd == -1) { -error_setg(errp, "Error creating migration incoming channel"); +if (!fioc) { break; } } while (++i < channels); -- 2.44.0
[PULL 06/10] migration: Skip only empty block devices
From: Cédric Le Goater The block .save_setup() handler calls a helper routine init_blk_migration() which builds a list of block devices to take into account for migration. When one device is found to be empty (sectors == 0), the loop exits and all the remaining devices are ignored. This is a regression introduced when bdrv_iterate() was removed. Change that by skipping only empty devices. Cc: Markus Armbruster Cc: qemu-stable Suggested-by: Kevin Wolf Fixes: fea68bb6e9fa ("block: Eliminate bdrv_iterate(), use bdrv_next()") Signed-off-by: Cédric Le Goater Reviewed-by: Stefan Hajnoczi Reviewed-by: Kevin Wolf Link: https://lore.kernel.org/r/20240312120431.550054-1-...@redhat.com [peterx: fix "Suggested-by:"] Signed-off-by: Peter Xu --- migration/block.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/migration/block.c b/migration/block.c index 8c6ebafacc..2b9054889a 100644 --- a/migration/block.c +++ b/migration/block.c @@ -402,7 +402,10 @@ static int init_blk_migration(QEMUFile *f) } sectors = bdrv_nb_sectors(bs); -if (sectors <= 0) { +if (sectors == 0) { +continue; +} +if (sectors < 0) { ret = sectors; bdrv_next_cleanup(); goto out; -- 2.44.0
[PULL 09/10] migration/multifd: Ensure we're not given a socket for file migration
From: Fabiano Rosas When doing migration using the fd: URI, QEMU will fetch the file descriptor passed in via the monitor at fd_start_outgoing|incoming_migration(), which means the checks at migration_channels_and_transport_compatible() happen too soon and we don't know at that point whether the FD refers to a plain file or a socket. For this reason, we've been allowing a migration channel of type SOCKET_ADDRESS_TYPE_FD to pass the initial verifications in scenarios where the socket migration is not supported, such as with fd + multifd. The commit decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI") was supposed to add a second check prior to starting migration to make sure a socket fd is not passed instead of a file fd, but failed to do so. Add the missing verification and update the comment explaining this situation which is currently incorrect. Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI") Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240315032040.7974-2-faro...@suse.de Signed-off-by: Peter Xu --- migration/fd.c| 8 migration/file.c | 7 +++ migration/migration.c | 6 +++--- 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/migration/fd.c b/migration/fd.c index 39a52e5c90..c07030f715 100644 --- a/migration/fd.c +++ b/migration/fd.c @@ -22,6 +22,7 @@ #include "migration.h" #include "monitor/monitor.h" #include "io/channel-file.h" +#include "io/channel-socket.h" #include "io/channel-util.h" #include "options.h" #include "trace.h" @@ -95,6 +96,13 @@ void fd_start_incoming_migration(const char *fdname, Error **errp) } if (migrate_multifd()) { +if (fd_is_socket(fd)) { +error_setg(errp, + "Multifd migration to a socket FD is not supported"); +object_unref(ioc); +return; +} + file_create_incoming_channels(ioc, errp); } else { qio_channel_set_name(ioc, "migration-fd-incoming"); diff --git a/migration/file.c b/migration/file.c index ddde0ca818..b6e8ba13f2 100644 --- a/migration/file.c +++ b/migration/file.c @@ -15,6 +15,7 @@ #include "file.h" #include "migration.h" #include "io/channel-file.h" +#include "io/channel-socket.h" #include "io/channel-util.h" #include "options.h" #include "trace.h" @@ -58,6 +59,12 @@ bool file_send_channel_create(gpointer opaque, Error **errp) int fd = fd_args_get_fd(); if (fd && fd != -1) { +if (fd_is_socket(fd)) { +error_setg(errp, + "Multifd migration to a socket FD is not supported"); +goto out; +} + ioc = qio_channel_file_new_dupfd(fd, errp); } else { ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp); diff --git a/migration/migration.c b/migration/migration.c index 644e073b7d..f60bd371e3 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -166,9 +166,9 @@ static bool transport_supports_seeking(MigrationAddress *addr) } /* - * At this point, the user might not yet have passed the file - * descriptor to QEMU, so we cannot know for sure whether it - * refers to a plain file or a socket. Let it through anyway. + * At this point QEMU has not yet fetched the fd passed in by the + * user, so we cannot know for sure whether it refers to a plain + * file or a socket. Let it through anyway and check at fd.c. */ if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) { return addr->u.socket.type == SOCKET_ADDRESS_TYPE_FD; -- 2.44.0
[PULL 07/10] migration: cpr-reboot documentation
From: Steve Sistare Signed-off-by: Steve Sistare Reviewed-by: Cédric Le Goater Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/1710338119-330923-1-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- docs/devel/migration/CPR.rst | 147 ++ docs/devel/migration/features.rst | 1 + 2 files changed, 148 insertions(+) create mode 100644 docs/devel/migration/CPR.rst diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst new file mode 100644 index 00..63c36470cf --- /dev/null +++ b/docs/devel/migration/CPR.rst @@ -0,0 +1,147 @@ +CheckPoint and Restart (CPR) + + +CPR is the umbrella name for a set of migration modes in which the +VM is migrated to a new QEMU instance on the same host. It is +intended for use when the goal is to update host software components +that run the VM, such as QEMU or even the host kernel. At this time, +cpr-reboot is the only available mode. + +Because QEMU is restarted on the same host, with access to the same +local devices, CPR is allowed in certain cases where normal migration +would be blocked. However, the user must not modify the contents of +guest block devices between quitting old QEMU and starting new QEMU. + +CPR unconditionally stops VM execution before memory is saved, and +thus does not depend on any form of dirty page tracking. + +cpr-reboot mode +--- + +In this mode, QEMU stops the VM, and writes VM state to the migration +URI, which will typically be a file. After quitting QEMU, the user +resumes by running QEMU with the ``-incoming`` option. Because the +old and new QEMU instances are not active concurrently, the URI cannot +be a type that streams data from one instance to the other. + +Guest RAM can be saved in place if backed by shared memory, or can be +copied to a file. The former is more efficient and is therefore +preferred. + +After state and memory are saved, the user may update userland host +software before restarting QEMU and resuming the VM. Further, if +the RAM is backed by persistent shared memory, such as a DAX device, +then the user may reboot to a new host kernel before restarting QEMU. + +This mode supports VFIO devices provided the user first puts the +guest in the suspended runstate, such as by issuing the +``guest-suspend-ram`` command to the QEMU guest agent. The agent +must be pre-installed in the guest, and the guest must support +suspend to RAM. Beware that suspension can take a few seconds, so +the user should poll to see the suspended state before proceeding +with the CPR operation. + +Usage +^ + +It is recommended that guest RAM be backed with some type of shared +memory, such as ``memory-backend-file,share=on``, and that the +``x-ignore-shared`` capability be set. This combination allows memory +to be saved in place. Otherwise, after QEMU stops the VM, all guest +RAM is copied to the migration URI. + +Outgoing: + * Set the migration mode parameter to ``cpr-reboot``. + * Set the ``x-ignore-shared`` capability if desired. + * Issue the ``migrate`` command. It is recommended the the URI be a +``file`` type, but one can use other types such as ``exec``, +provided the command captures all the data from the outgoing side, +and provides all the data to the incoming side. + * Quit when QEMU reaches the postmigrate state. + +Incoming: + * Start QEMU with the ``-incoming defer`` option. + * Set the migration mode parameter to ``cpr-reboot``. + * Set the ``x-ignore-shared`` capability if desired. + * Issue the ``migrate-incoming`` command. + * If the VM was running when the outgoing ``migrate`` command was +issued, then QEMU automatically resumes VM execution. + +Example 1 +^ +:: + + # qemu-kvm -monitor stdio + -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/dax0.0,align=2M,share=on -m 4G + ... + + (qemu) info status + VM status: running + (qemu) migrate_set_parameter mode cpr-reboot + (qemu) migrate_set_capability x-ignore-shared on + (qemu) migrate -d file:vm.state + (qemu) info status + VM status: paused (postmigrate) + (qemu) quit + + ### optionally update kernel and reboot + # systemctl kexec + kexec_core: Starting new kernel + ... + + # qemu-kvm ... -incoming defer + (qemu) info status + VM status: paused (inmigrate) + (qemu) migrate_set_parameter mode cpr-reboot + (qemu) migrate_set_capability x-ignore-shared on + (qemu) migrate_incoming file:vm.state + (qemu) info status + VM status: running + +Example 2: VFIO +^^^ +:: + + # qemu-kvm -monitor stdio + -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/dax0.0,align=2M,share=on -m 4G + -device vfio-pci, ... + -chardev socket,id=qga0,path=qga.sock,server=on,wait=off + -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 + ... + + (qemu) info status + VM status: running + + # echo '{"execute":"guest-suspend-ram"}' | ncat --send-only -U
[PULL 08/10] migration: Fix iocs leaks during file and fd migration
From: Fabiano Rosas The memory for the io channels is being leaked in three different ways during file migration: 1) if the offset check fails we never drop the ioc reference; 2) we allocate an extra channel for no reason; 3) if multifd is enabled but channel creation fails when calling dup(), we leave the previous channels around along with the glib polling; Fix all issues by restructuring the code to first allocate the channels and only register the watches when all channels have been created. For multifd, the file and fd migrations can share code because both are backed by a QIOChannelFile. For the non-multifd case, the fd needs to be separate because it is backed by a QIOChannelSocket. Fixes: 2dd7ee7a51 ("migration/multifd: Add incoming QIOChannelFile support") Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI") Reported-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240313212824.16974-2-faro...@suse.de Signed-off-by: Peter Xu --- migration/file.h | 1 + migration/fd.c | 29 +++- migration/file.c | 58 ++-- 3 files changed, 46 insertions(+), 42 deletions(-) diff --git a/migration/file.h b/migration/file.h index 9f71e87f74..7699c04677 100644 --- a/migration/file.h +++ b/migration/file.h @@ -20,6 +20,7 @@ void file_start_outgoing_migration(MigrationState *s, int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp); void file_cleanup_outgoing_migration(void); bool file_send_channel_create(gpointer opaque, Error **errp); +void file_create_incoming_channels(QIOChannel *ioc, Error **errp); int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov, int niov, RAMBlock *block, Error **errp); int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp); diff --git a/migration/fd.c b/migration/fd.c index 4e2a63a73d..39a52e5c90 100644 --- a/migration/fd.c +++ b/migration/fd.c @@ -18,6 +18,7 @@ #include "qapi/error.h" #include "channel.h" #include "fd.h" +#include "file.h" #include "migration.h" #include "monitor/monitor.h" #include "io/channel-file.h" @@ -80,7 +81,6 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc, void fd_start_incoming_migration(const char *fdname, Error **errp) { QIOChannel *ioc; -QIOChannelFile *fioc; int fd = monitor_fd_param(monitor_cur(), fdname, errp); if (fd == -1) { return; @@ -94,26 +94,13 @@ void fd_start_incoming_migration(const char *fdname, Error **errp) return; } -qio_channel_set_name(ioc, "migration-fd-incoming"); -qio_channel_add_watch_full(ioc, G_IO_IN, - fd_accept_incoming_migration, - NULL, NULL, - g_main_context_get_thread_default()); - if (migrate_multifd()) { -int channels = migrate_multifd_channels(); - -while (channels--) { -fioc = qio_channel_file_new_dupfd(fd, errp); -if (!fioc) { -return; -} - -qio_channel_set_name(ioc, "migration-fd-incoming"); -qio_channel_add_watch_full(QIO_CHANNEL(fioc), G_IO_IN, - fd_accept_incoming_migration, - NULL, NULL, - g_main_context_get_thread_default()); -} +file_create_incoming_channels(ioc, errp); +} else { +qio_channel_set_name(ioc, "migration-fd-incoming"); +qio_channel_add_watch_full(ioc, G_IO_IN, + fd_accept_incoming_migration, + NULL, NULL, + g_main_context_get_thread_default()); } } diff --git a/migration/file.c b/migration/file.c index e56c5eb0a5..ddde0ca818 100644 --- a/migration/file.c +++ b/migration/file.c @@ -115,13 +115,46 @@ static gboolean file_accept_incoming_migration(QIOChannel *ioc, return G_SOURCE_REMOVE; } +void file_create_incoming_channels(QIOChannel *ioc, Error **errp) +{ +int i, fd, channels = 1; +g_autofree QIOChannel **iocs = NULL; + +if (migrate_multifd()) { +channels += migrate_multifd_channels(); +} + +iocs = g_new0(QIOChannel *, channels); +fd = QIO_CHANNEL_FILE(ioc)->fd; +iocs[0] = ioc; + +for (i = 1; i < channels; i++) { +QIOChannelFile *fioc = qio_channel_file_new_dupfd(fd, errp); + +if (!fioc) { +while (i) { +object_unref(iocs[--i]); +} +return; +} + +iocs[i] = QIO_CHANNEL(fioc); +} + +for (i = 0; i < channels; i++) { +qio_channel_set_name(iocs[i], "migration-file-incoming"); +qio_channel_add_watch_full(iocs[i], G_IO_IN, + file_accept_incoming_migration, + NULL, NULL, +
[PULL 00/10] Migration 20240317 patches
From: Peter Xu The following changes since commit 35ac6831d98e18e2c78c85c93e3a6ca1f1ae3e58: Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging (2024-03-12 13:42:57 +) are available in the Git repository at: https://gitlab.com/peterx/qemu.git tags/migration-20240317-pull-request for you to fetch changes up to 9adfb308c1513562d6acec02aa780c5ef9b0193d: migration/multifd: Duplicate the fd for the outgoing_args (2024-03-15 11:26:33 -0400) Migration pull for 9.0-rc0 - Nicholas/Phil's fix on migration corruption / inconsistent for tcg - Cedric's fix on block migration over n_sectors==0 - Steve's CPR reboot documentation page - Fabiano's misc fixes on mapped-ram (IOC leak, dup() errors, fd checks, fd use race, etc.) Cédric Le Goater (1): migration: Skip only empty block devices Fabiano Rosas (5): io: Introduce qio_channel_file_new_dupfd migration: Fix error handling after dup in file migration migration: Fix iocs leaks during file and fd migration migration/multifd: Ensure we're not given a socket for file migration migration/multifd: Duplicate the fd for the outgoing_args Nicholas Piggin (2): physmem: Factor cpu_physical_memory_dirty_bits_cleared() out physmem: Fix migration dirty bitmap coherency with TCG memory access Philippe Mathieu-Daudé (1): physmem: Expose tlb_reset_dirty_range_all() Steve Sistare (1): migration: cpr-reboot documentation docs/devel/migration/CPR.rst | 147 ++ docs/devel/migration/features.rst | 1 + include/exec/exec-all.h | 1 + include/exec/ram_addr.h | 12 +++ include/io/channel-file.h | 18 migration/file.h | 1 + io/channel-file.c | 12 +++ migration/block.c | 5 +- migration/fd.c| 51 ++- migration/file.c | 75 +-- migration/migration.c | 6 +- system/physmem.c | 10 +- 12 files changed, 279 insertions(+), 60 deletions(-) create mode 100644 docs/devel/migration/CPR.rst -- 2.44.0
[PULL 20/34] migration: export migration_is_running
From: Steve Sistare Delete the MigrationState parameter from migration_is_running and move it to the public API in misc.h. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-5-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 1 + migration/migration.h | 2 -- migration/migration.c | 10 ++ migration/options.c| 4 ++-- migration/savevm.c | 2 +- system/dirtylimit.c| 2 +- target/riscv/kvm/kvm-cpu.c | 4 ++-- 7 files changed, 13 insertions(+), 12 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index e1f1bf853e..7526977de6 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -106,6 +106,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type, bool migration_in_setup(MigrationState *); bool migration_has_finished(MigrationState *); bool migration_has_failed(MigrationState *); +bool migration_is_running(void); /* ...and after the device transmission */ /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */ bool migration_in_incoming_postcopy(void); diff --git a/migration/migration.h b/migration/migration.h index 736460aa8b..e4983db9c9 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -479,8 +479,6 @@ bool migrate_has_error(MigrationState *s); void migrate_fd_connect(MigrationState *s, Error *error_in); -bool migration_is_running(int state); - int migrate_init(MigrationState *s, Error **errp); bool migration_is_blocked(Error **errp); /* True if outgoing migration has entered postcopy phase */ diff --git a/migration/migration.c b/migration/migration.c index 17859cbaee..546ba86c63 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1103,9 +1103,11 @@ bool migration_is_setup_or_active(void) } } -bool migration_is_running(int state) +bool migration_is_running(void) { -switch (state) { +MigrationState *s = current_migration; + +switch (s->state) { case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: @@ -1477,7 +1479,7 @@ static void migrate_fd_cancel(MigrationState *s) do { old_state = s->state; -if (!migration_is_running(old_state)) { +if (!migration_is_running()) { break; } /* If the migration is paused, kick it out of the pause */ @@ -1962,7 +1964,7 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc, return true; } -if (migration_is_running(s->state)) { +if (migration_is_running()) { error_setg(errp, QERR_MIGRATION_ACTIVE); return false; } diff --git a/migration/options.c b/migration/options.c index 40eb930940..642cfb00a3 100644 --- a/migration/options.c +++ b/migration/options.c @@ -681,7 +681,7 @@ bool migrate_cap_set(int cap, bool value, Error **errp) MigrationState *s = migrate_get_current(); bool new_caps[MIGRATION_CAPABILITY__MAX]; -if (migration_is_running(s->state)) { +if (migration_is_running()) { error_setg(errp, QERR_MIGRATION_ACTIVE); return false; } @@ -725,7 +725,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, MigrationCapabilityStatusList *cap; bool new_caps[MIGRATION_CAPABILITY__MAX]; -if (migration_is_running(s->state) || migration_in_colo_state()) { +if (migration_is_running() || migration_in_colo_state()) { error_setg(errp, QERR_MIGRATION_ACTIVE); return; } diff --git a/migration/savevm.c b/migration/savevm.c index 76b57a9888..388d7af7cd 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1706,7 +1706,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp) MigrationState *ms = migrate_get_current(); MigrationStatus status; -if (migration_is_running(ms->state)) { +if (migration_is_running()) { error_setg(errp, QERR_MIGRATION_ACTIVE); return -EINVAL; } diff --git a/system/dirtylimit.c b/system/dirtylimit.c index 051e0311c1..1622bb7426 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -451,7 +451,7 @@ static bool dirtylimit_is_allowed(void) { MigrationState *ms = migrate_get_current(); -if (migration_is_running(ms->state) && +if (migration_is_running() && (!qemu_thread_is_self(>thread)) && migrate_dirty_limit() && dirtylimit_in_service()) { diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c index c7afdb1e81..cda7d78a77 100644 --- a/target/riscv/kvm/kvm-cpu.c +++ b/target/riscv/kvm/kvm-cpu.c @@ -44,7 +44,7 @@ #include "kvm_riscv.h" #include "sbi_ecall_interface.h" #include "chardev/char-fe.h" -#include "migration/migration.h" +#include "migration/misc.h" #include "sysemu/runstate.h" #include "hw/riscv/numa.h" @@ -729,7 +729,7 @@ static void kvm_riscv_put_regs_timer(CPUState
[PULL 31/34] migration/multifd: Implement zero page transmission on the multifd thread.
From: Hao Xiang 1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for non-compression, zlib and zstd compression backends. 3. Added a new value 'multifd' in ZeroPageDetection enumeration. 4. Adds zero page counters and updates multifd send/receive tracing format to track the newly added counters. Signed-off-by: Hao Xiang Acked-by: Markus Armbruster Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xi...@linux.dev Signed-off-by: Peter Xu --- qapi/migration.json | 7 ++- migration/multifd.h | 23 +++- hw/core/qdev-properties-system.c | 2 +- migration/multifd-zero-page.c| 87 ++ migration/multifd-zlib.c | 21 ++-- migration/multifd-zstd.c | 20 +-- migration/multifd.c | 90 +++- migration/ram.c | 1 - migration/meson.build| 1 + migration/trace-events | 8 +-- 10 files changed, 228 insertions(+), 32 deletions(-) create mode 100644 migration/multifd-zero-page.c diff --git a/qapi/migration.json b/qapi/migration.json index 83fdef73b9..2684e4e9ac 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -677,10 +677,15 @@ # # @legacy: Perform zero page checking in main migration thread. # +# @multifd: Perform zero page checking in multifd sender thread if +# multifd migration is enabled, else in the main migration +# thread as for @legacy. +# # Since: 9.0 +# ## { 'enum': 'ZeroPageDetection', - 'data': [ 'none', 'legacy' ] } + 'data': [ 'none', 'legacy', 'multifd' ] } ## # @BitmapMigrationBitmapAliasTransform: diff --git a/migration/multifd.h b/migration/multifd.h index 7447c2bea3..c9d9b09239 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -55,14 +55,24 @@ typedef struct { /* size of the next packet that contains pages */ uint32_t next_packet_size; uint64_t packet_num; -uint64_t unused[4];/* Reserved for future use */ +/* zero pages */ +uint32_t zero_pages; +uint32_t unused32[1];/* Reserved for future use */ +uint64_t unused64[3];/* Reserved for future use */ char ramblock[256]; +/* + * This array contains the pointers to: + * - normal pages (initial normal_pages entries) + * - zero pages (following zero_pages entries) + */ uint64_t offset[]; } __attribute__((packed)) MultiFDPacket_t; typedef struct { /* number of used pages */ uint32_t num; +/* number of normal pages */ +uint32_t normal_num; /* number of allocated pages */ uint32_t allocated; /* offset of each page */ @@ -136,6 +146,8 @@ typedef struct { uint64_t packets_sent; /* non zero pages sent through this channel */ uint64_t total_normal_pages; +/* zero pages sent through this channel */ +uint64_t total_zero_pages; /* buffers to send */ struct iovec *iov; /* number of iovs used */ @@ -194,12 +206,18 @@ typedef struct { uint8_t *host; /* non zero pages recv through this channel */ uint64_t total_normal_pages; +/* zero pages recv through this channel */ +uint64_t total_zero_pages; /* buffers to recv */ struct iovec *iov; /* Pages that are not zero */ ram_addr_t *normal; /* num of non zero pages */ uint32_t normal_num; +/* Pages that are zero */ +ram_addr_t *zero; +/* num of zero pages */ +uint32_t zero_num; /* used for de-compression methods */ void *compress_data; } MultiFDRecvParams; @@ -221,6 +239,9 @@ typedef struct { void multifd_register_ops(int method, MultiFDMethods *ops); void multifd_send_fill_packet(MultiFDSendParams *p); +bool multifd_send_prepare_common(MultiFDSendParams *p); +void multifd_send_zero_page_detect(MultiFDSendParams *p); +void multifd_recv_zero_page_process(MultiFDRecvParams *p); static inline void multifd_send_prepare_header(MultiFDSendParams *p) { diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index 71a21bf24e..7eca2f2377 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -696,7 +696,7 @@ const PropertyInfo qdev_prop_granule_mode = { const PropertyInfo qdev_prop_zero_page_detection = { .name = "ZeroPageDetection", .description = "zero_page_detection values, " - "none,legacy", + "none,legacy,multifd", .enum_table = _lookup, .get = qdev_propinfo_get_enum, .set = qdev_propinfo_set_enum, diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c new file mode 100644 index 00..1ba38be636 --- /dev/null +++ b/migration/multifd-zero-page.c @@ -0,0 +1,87 @@ +/* + * Multifd zero page detection implementation. + * + * Copyright (c) 2024 Bytedance Inc + * + * Authors: + * Hao Xiang + * + * This work is licensed under the terms
[PULL 30/34] migration/multifd: Add new migration option zero-page-detection.
From: Hao Xiang This new parameter controls where the zero page checking is running. 1. If this parameter is set to 'legacy', zero page checking is done in the migration main thread. 2. If this parameter is set to 'none', zero page checking is disabled. Signed-off-by: Hao Xiang Reviewed-by: Peter Xu Acked-by: Markus Armbruster Link: https://lore.kernel.org/r/20240311180015.3359271-4-hao.xi...@linux.dev Signed-off-by: Peter Xu --- qapi/migration.json | 33 ++--- include/hw/qdev-properties-system.h | 4 migration/options.h | 1 + hw/core/qdev-properties-system.c| 10 + migration/migration-hmp-cmds.c | 9 migration/options.c | 21 ++ migration/ram.c | 4 7 files changed, 79 insertions(+), 3 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index 51d188b902..83fdef73b9 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -670,6 +670,18 @@ { 'enum': 'MigMode', 'data': [ 'normal', 'cpr-reboot' ] } +## +# @ZeroPageDetection: +# +# @none: Do not perform zero page checking. +# +# @legacy: Perform zero page checking in main migration thread. +# +# Since: 9.0 +## +{ 'enum': 'ZeroPageDetection', + 'data': [ 'none', 'legacy' ] } + ## # @BitmapMigrationBitmapAliasTransform: # @@ -891,6 +903,10 @@ # @mode: Migration mode. See description in @MigMode. Default is 'normal'. #(Since 8.2) # +# @zero-page-detection: Whether and how to detect zero pages. +# See description in @ZeroPageDetection. Default is 'legacy'. +# (since 9.0) +# # Features: # # @deprecated: Member @block-incremental is deprecated. Use @@ -924,7 +940,8 @@ 'block-bitmap-mapping', { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] }, 'vcpu-dirty-limit', - 'mode'] } + 'mode', + 'zero-page-detection'] } ## # @MigrateSetParameters: @@ -1083,6 +1100,10 @@ # @mode: Migration mode. See description in @MigMode. Default is 'normal'. #(Since 8.2) # +# @zero-page-detection: Whether and how to detect zero pages. +# See description in @ZeroPageDetection. Default is 'legacy'. +# (since 9.0) +# # Features: # # @deprecated: Member @block-incremental is deprecated. Use @@ -1136,7 +1157,8 @@ '*x-vcpu-dirty-limit-period': { 'type': 'uint64', 'features': [ 'unstable' ] }, '*vcpu-dirty-limit': 'uint64', -'*mode': 'MigMode'} } +'*mode': 'MigMode', +'*zero-page-detection': 'ZeroPageDetection'} } ## # @migrate-set-parameters: @@ -1311,6 +1333,10 @@ # @mode: Migration mode. See description in @MigMode. Default is 'normal'. #(Since 8.2) # +# @zero-page-detection: Whether and how to detect zero pages. +# See description in @ZeroPageDetection. Default is 'legacy'. +# (since 9.0) +# # Features: # # @deprecated: Member @block-incremental is deprecated. Use @@ -1361,7 +1387,8 @@ '*x-vcpu-dirty-limit-period': { 'type': 'uint64', 'features': [ 'unstable' ] }, '*vcpu-dirty-limit': 'uint64', -'*mode': 'MigMode'} } +'*mode': 'MigMode', +'*zero-page-detection': 'ZeroPageDetection'} } ## # @query-migrate-parameters: diff --git a/include/hw/qdev-properties-system.h b/include/hw/qdev-properties-system.h index 626be87dd3..438f65389f 100644 --- a/include/hw/qdev-properties-system.h +++ b/include/hw/qdev-properties-system.h @@ -9,6 +9,7 @@ extern const PropertyInfo qdev_prop_reserved_region; extern const PropertyInfo qdev_prop_multifd_compression; extern const PropertyInfo qdev_prop_mig_mode; extern const PropertyInfo qdev_prop_granule_mode; +extern const PropertyInfo qdev_prop_zero_page_detection; extern const PropertyInfo qdev_prop_losttickpolicy; extern const PropertyInfo qdev_prop_blockdev_on_error; extern const PropertyInfo qdev_prop_bios_chs_trans; @@ -50,6 +51,9 @@ extern const PropertyInfo qdev_prop_iothread_vq_mapping_list; MigMode) #define DEFINE_PROP_GRANULE_MODE(_n, _s, _f, _d) \ DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_granule_mode, GranuleMode) +#define DEFINE_PROP_ZERO_PAGE_DETECTION(_n, _s, _f, _d) \ +DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_zero_page_detection, \ + ZeroPageDetection) #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \ DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_losttickpolicy, \ LostTickPolicy) diff --git a/migration/options.h b/migration/options.h index b6b69c2bb7..ab8199e207 100644 --- a/migration/options.h +++ b/migration/options.h @@ -90,6 +90,7 @@ const char *migrate_tls_authz(void); const char *migrate_tls_creds(void); const char *migrate_tls_hostname(void); uint64_t
[PULL 25/34] migration: privatize colo interfaces
From: Steve Sistare Remove private migration interfaces from net/colo-compare.c and push them to migration/colo.c. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-10-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- migration/colo.c | 17 +++-- net/colo-compare.c | 3 +-- stubs/colo.c | 1 - 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 315e31fe32..84632a603e 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -63,9 +63,9 @@ static bool colo_runstate_is_stopped(void) return runstate_check(RUN_STATE_COLO) || !runstate_is_running(); } -static void colo_checkpoint_notify(void *opaque) +static void colo_checkpoint_notify(void) { -MigrationState *s = opaque; +MigrationState *s = migrate_get_current(); int64_t next_notify_time; qemu_event_set(>colo_checkpoint_event); @@ -74,10 +74,15 @@ static void colo_checkpoint_notify(void *opaque) timer_mod(s->colo_delay_timer, next_notify_time); } +static void colo_checkpoint_notify_timer(void *opaque) +{ +colo_checkpoint_notify(); +} + void colo_checkpoint_delay_set(void) { if (migration_in_colo_state()) { -colo_checkpoint_notify(migrate_get_current()); +colo_checkpoint_notify(); } } @@ -162,7 +167,7 @@ static void primary_vm_do_failover(void) * kick COLO thread which might wait at * qemu_sem_wait(>colo_checkpoint_sem). */ -colo_checkpoint_notify(s); +colo_checkpoint_notify(); /* * Wake up COLO thread which may blocked in recv() or send(), @@ -518,7 +523,7 @@ out: static void colo_compare_notify_checkpoint(Notifier *notifier, void *data) { -colo_checkpoint_notify(data); +colo_checkpoint_notify(); } static void colo_process_checkpoint(MigrationState *s) @@ -642,7 +647,7 @@ void migrate_start_colo_process(MigrationState *s) bql_unlock(); qemu_event_init(>colo_checkpoint_event, false); s->colo_delay_timer = timer_new_ms(QEMU_CLOCK_HOST, -colo_checkpoint_notify, s); +colo_checkpoint_notify_timer, NULL); qemu_sem_init(>colo_exit_sem, 0); colo_process_checkpoint(s); diff --git a/net/colo-compare.c b/net/colo-compare.c index f2dfc0ebdc..c4ad0ab71f 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -28,7 +28,6 @@ #include "sysemu/iothread.h" #include "net/colo-compare.h" #include "migration/colo.h" -#include "migration/migration.h" #include "util.h" #include "block/aio-wait.h" @@ -189,7 +188,7 @@ static void colo_compare_inconsistency_notify(CompareState *s) notify_remote_frame(s); } else { notifier_list_notify(_compare_notifiers, - migrate_get_current()); + NULL); } } diff --git a/stubs/colo.c b/stubs/colo.c index 08c9f982d5..f8c069b739 100644 --- a/stubs/colo.c +++ b/stubs/colo.c @@ -2,7 +2,6 @@ #include "qemu/notify.h" #include "net/colo-compare.h" #include "migration/colo.h" -#include "migration/migration.h" #include "qemu/error-report.h" #include "qapi/qapi-commands-migration.h" -- 2.44.0
[PULL 27/34] migration: purge MigrationState from public interface
From: Steve Sistare Move remaining MigrationState references from the public file misc.h to the private file migration.h. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-12-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 6 ++ migration/migration.h| 6 ++ 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index d563d2c801..c9e200f4eb 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -64,7 +64,6 @@ bool migration_is_active(void); bool migration_is_device(void); bool migration_thread_is_self(void); bool migration_is_setup_or_active(void); -bool migrate_mode_is_cpr(MigrationState *); typedef enum MigrationEventType { MIG_EVENT_PRECOPY_SETUP, @@ -103,16 +102,15 @@ void migration_add_notifier_mode(NotifierWithReturn *notify, MigrationNotifyFunc func, MigMode mode); void migration_remove_notifier(NotifierWithReturn *notify); -int migration_call_notifiers(MigrationState *s, MigrationEventType type, - Error **errp); -bool migration_has_failed(MigrationState *); bool migration_is_running(void); void migration_file_set_error(int err); /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */ bool migration_in_incoming_postcopy(void); + /* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */ bool migration_incoming_postcopy_advised(void); + /* True if background snapshot is active */ bool migration_in_bg_snapshot(void); diff --git a/migration/migration.h b/migration/migration.h index e4983db9c9..8045e39c26 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -26,6 +26,7 @@ #include "qom/object.h" #include "postcopy-ram.h" #include "sysemu/runstate.h" +#include "migration/misc.h" struct PostcopyBlocktimeContext; @@ -479,12 +480,17 @@ bool migrate_has_error(MigrationState *s); void migrate_fd_connect(MigrationState *s, Error *error_in); +int migration_call_notifiers(MigrationState *s, MigrationEventType type, + Error **errp); + int migrate_init(MigrationState *s, Error **errp); bool migration_is_blocked(Error **errp); /* True if outgoing migration has entered postcopy phase */ bool migration_in_postcopy(void); bool migration_postcopy_is_alive(int state); MigrationState *migrate_get_current(void); +bool migration_has_failed(MigrationState *); +bool migrate_mode_is_cpr(MigrationState *); uint64_t ram_get_total_transferred_pages(void); -- 2.44.0
[PULL 21/34] migration: export vcpu_dirty_limit_period
From: Steve Sistare Define and export vcpu_dirty_limit_period to eliminate a dependency on MigrationState. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-6-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/client-options.h | 1 + migration/options.c| 7 +++ system/dirtylimit.c| 3 +-- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/migration/client-options.h b/include/migration/client-options.h index 887fea1565..59f4b55cf4 100644 --- a/include/migration/client-options.h +++ b/include/migration/client-options.h @@ -20,5 +20,6 @@ bool migrate_switchover_ack(void); /* parameters */ MigMode migrate_mode(void); +uint64_t migrate_vcpu_dirty_limit_period(void); #endif diff --git a/migration/options.c b/migration/options.c index 642cfb00a3..09178c6f60 100644 --- a/migration/options.c +++ b/migration/options.c @@ -924,6 +924,13 @@ const char *migrate_tls_hostname(void) return s->parameters.tls_hostname; } +uint64_t migrate_vcpu_dirty_limit_period(void) +{ +MigrationState *s = migrate_get_current(); + +return s->parameters.x_vcpu_dirty_limit_period; +} + uint64_t migrate_xbzrle_cache_size(void) { MigrationState *s = migrate_get_current(); diff --git a/system/dirtylimit.c b/system/dirtylimit.c index 1622bb7426..b0afaa0776 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -77,14 +77,13 @@ static bool dirtylimit_quit; static void vcpu_dirty_rate_stat_collect(void) { -MigrationState *s = migrate_get_current(); VcpuStat stat; int i = 0; int64_t period = DIRTYLIMIT_CALC_TIME_MS; if (migrate_dirty_limit() && migration_is_active()) { -period = s->parameters.x_vcpu_dirty_limit_period; +period = migrate_vcpu_dirty_limit_period(); } /* calculate vcpu dirtyrate */ -- 2.44.0
[PULL 15/34] migration: Fix format in error message
From: Anthony PERARD In file_write_ramblock_iov(), "offset" is "uintptr_t" and not "ram_addr_t". While usually they are both equivalent, this is not the case with CONFIG_XEN_BACKEND. Use the right format. This will fix build on 32-bit. Fixes: f427d90b9898 ("migration/multifd: Support outgoing mapped-ram stream format") Signed-off-by: Anthony PERARD Link: https://lore.kernel.org/r/20240311123439.16844-1-anthony.per...@citrix.com Signed-off-by: Peter Xu --- migration/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/file.c b/migration/file.c index 164b079966..5054a60851 100644 --- a/migration/file.c +++ b/migration/file.c @@ -191,7 +191,7 @@ int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov, */ offset = (uintptr_t) iov[slice_idx].iov_base - (uintptr_t) block->host; if (offset >= block->used_length) { -error_setg(errp, "offset " RAM_ADDR_FMT +error_setg(errp, "offset %" PRIxPTR "outside of ramblock %s range", offset, block->idstr); ret = -1; break; -- 2.44.0
[PULL 29/34] migration/multifd: Allow clearing of the file_bmap from multifd
From: Fabiano Rosas We currently only need to clear the mapped-ram file bitmap from the migration thread during save_zero_page. We're about to add support for zero page detection on the multifd thread, so allow ramblock_set_file_bmap_atomic() to also clear the bits. Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240311180015.3359271-3-hao.xi...@linux.dev Signed-off-by: Peter Xu --- migration/ram.h | 3 ++- migration/multifd.c | 2 +- migration/ram.c | 8 ++-- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/migration/ram.h b/migration/ram.h index b9ac0da587..08feecaf51 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -75,7 +75,8 @@ bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp); bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); void postcopy_preempt_shutdown_file(MigrationState *s); void *postcopy_preempt_thread(void *opaque); -void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset); +void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset, + bool set); /* ram cache */ int colo_init_ram_cache(void); diff --git a/migration/multifd.c b/migration/multifd.c index bf9d483f7a..3ba922694e 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -115,7 +115,7 @@ static void multifd_set_file_bitmap(MultiFDSendParams *p) assert(pages->block); for (int i = 0; i < p->pages->num; i++) { -ramblock_set_file_bmap_atomic(pages->block, pages->offset[i]); +ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], true); } } diff --git a/migration/ram.c b/migration/ram.c index 3ee8cb47d3..dec2e73f8e 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3149,9 +3149,13 @@ static void ram_save_file_bmap(QEMUFile *f) } } -void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset) +void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset, bool set) { -set_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap); +if (set) { +set_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap); +} else { +clear_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap); +} } /** -- 2.44.0
[PULL 05/34] migration: Report error when shutdown fails
From: Cédric Le Goater This will help detect issues regarding I/O channels usage. Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Peter Xu Signed-off-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240304122844.1888308-7-...@redhat.com Signed-off-by: Peter Xu --- migration/qemu-file.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index b10c882629..a10882d47f 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -63,6 +63,8 @@ struct QEMUFile { */ int qemu_file_shutdown(QEMUFile *f) { +Error *err = NULL; + /* * We must set qemufile error before the real shutdown(), otherwise * there can be a race window where we thought IO all went though @@ -91,7 +93,8 @@ int qemu_file_shutdown(QEMUFile *f) return -ENOSYS; } -if (qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL) < 0) { +if (qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, ) < 0) { +error_report_err(err); return -EIO; } -- 2.44.0
[PULL 32/34] migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page.
From: Hao Xiang 1. Add a dedicated handler for MigrationOps::ram_save_target_page in multifd live migration. 2. Refactor ram_save_target_page_legacy so that the legacy and multifd handlers don't have internal functions calling into each other. Signed-off-by: Hao Xiang Reviewed-by: Fabiano Rosas Message-Id: <20240226195654.934709-4-hao.xi...@bytedance.com> Link: https://lore.kernel.org/r/20240311180015.3359271-6-hao.xi...@linux.dev Signed-off-by: Peter Xu --- migration/ram.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index c26435adc7..8deb84984f 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2079,7 +2079,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss, */ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) { -RAMBlock *block = pss->block; ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; int res; @@ -2095,17 +2094,33 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) return 1; } +return ram_save_page(rs, pss); +} + +/** + * ram_save_target_page_multifd: send one target page to multifd workers + * + * Returns 1 if the page was queued, -1 otherwise. + * + * @rs: current RAM state + * @pss: data about the page we want to send + */ +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss) +{ +RAMBlock *block = pss->block; +ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; + /* - * Do not use multifd in postcopy as one whole host page should be - * placed. Meanwhile postcopy requires atomic update of pages, so even - * if host page size == guest page size the dest guest during run may - * still see partially copied pages which is data corruption. + * While using multifd live migration, we still need to handle zero + * page checking on the migration main thread. */ -if (migrate_multifd() && !migration_in_postcopy()) { -return ram_save_multifd_page(block, offset); +if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { +if (save_zero_page(rs, pss, offset)) { +return 1; +} } -return ram_save_page(rs, pss); +return ram_save_multifd_page(block, offset); } /* Should be called before sending a host page */ @@ -3112,7 +3127,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } migration_ops = g_malloc0(sizeof(MigrationOps)); -migration_ops->ram_save_target_page = ram_save_target_page_legacy; + +if (migrate_multifd()) { +migration_ops->ram_save_target_page = ram_save_target_page_multifd; +} else { +migration_ops->ram_save_target_page = ram_save_target_page_legacy; +} bql_unlock(); ret = multifd_send_sync_main(); -- 2.44.0
[PULL 07/34] migration: Add documentation for SaveVMHandlers
From: Cédric Le Goater The SaveVMHandlers structure is still in use for complex subsystems and devices. Document the handlers since we are going to modify a few later. Reviewed-by: Peter Xu Signed-off-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240304122844.1888308-9-...@redhat.com Signed-off-by: Peter Xu --- include/migration/register.h | 263 +++ 1 file changed, 237 insertions(+), 26 deletions(-) diff --git a/include/migration/register.h b/include/migration/register.h index 2e6a7d766e..d7b70a8be6 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -16,30 +16,130 @@ #include "hw/vmstate-if.h" +/** + * struct SaveVMHandlers: handler structure to finely control + * migration of complex subsystems and devices, such as RAM, block and + * VFIO. + */ typedef struct SaveVMHandlers { -/* This runs inside the BQL. */ + +/* The following handlers run inside the BQL. */ + +/** + * @save_state + * + * Saves state section on the source using the latest state format + * version. + * + * Legacy method. Should be deprecated when all users are ported + * to VMStateDescription. + * + * @f: QEMUFile where to send the data + * @opaque: data pointer passed to register_savevm_live() + */ void (*save_state)(QEMUFile *f, void *opaque); -/* - * save_prepare is called early, even before migration starts, and can be - * used to perform early checks. +/** + * @save_prepare + * + * Called early, even before migration starts, and can be used to + * perform early checks. + * + * @opaque: data pointer passed to register_savevm_live() + * @errp: pointer to Error*, to store an error if it happens. + * + * Returns zero to indicate success and negative for error */ int (*save_prepare)(void *opaque, Error **errp); + +/** + * @save_setup + * + * Initializes the data structures on the source and transmits + * first section containing information on the device + * + * @f: QEMUFile where to send the data + * @opaque: data pointer passed to register_savevm_live() + * + * Returns zero to indicate success and negative for error + */ int (*save_setup)(QEMUFile *f, void *opaque); + +/** + * @save_cleanup + * + * Uninitializes the data structures on the source + * + * @opaque: data pointer passed to register_savevm_live() + */ void (*save_cleanup)(void *opaque); + +/** + * @save_live_complete_postcopy + * + * Called at the end of postcopy for all postcopyable devices. + * + * @f: QEMUFile where to send the data + * @opaque: data pointer passed to register_savevm_live() + * + * Returns zero to indicate success and negative for error + */ int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque); + +/** + * @save_live_complete_precopy + * + * Transmits the last section for the device containing any + * remaining data at the end of a precopy phase. When postcopy is + * enabled, devices that support postcopy will skip this step, + * where the final data will be flushed at the end of postcopy via + * @save_live_complete_postcopy instead. + * + * @f: QEMUFile where to send the data + * @opaque: data pointer passed to register_savevm_live() + * + * Returns zero to indicate success and negative for error + */ int (*save_live_complete_precopy)(QEMUFile *f, void *opaque); /* This runs both outside and inside the BQL. */ + +/** + * @is_active + * + * Will skip a state section if not active + * + * @opaque: data pointer passed to register_savevm_live() + * + * Returns true if state section is active else false + */ bool (*is_active)(void *opaque); + +/** + * @has_postcopy + * + * Checks if a device supports postcopy + * + * @opaque: data pointer passed to register_savevm_live() + * + * Returns true for postcopy support else false + */ bool (*has_postcopy)(void *opaque); -/* is_active_iterate - * If it is not NULL then qemu_savevm_state_iterate will skip iteration if - * it returns false. For example, it is needed for only-postcopy-states, - * which needs to be handled by qemu_savevm_state_setup and - * qemu_savevm_state_pending, but do not need iterations until not in - * postcopy stage. +/** + * @is_active_iterate + * + * As #SaveVMHandlers.is_active(), will skip an inactive state + * section in qemu_savevm_state_iterate. + * + * For example, it is needed for only-postcopy-states, which needs + * to be handled by qemu_savevm_state_setup() and + * qemu_savevm_state_pending(), but do not need iterations until + * not in postcopy stage. + * + * @opaque: data pointer passed to register_savevm_live() + *
[PULL 18/34] migration: export migration_is_setup_or_active
From: Steve Sistare Delete the MigrationState parameter from migration_is_setup_or_active and move it to the public API in misc.h. Signed-off-by: Steve Sistare Reviewed-by: Philippe Mathieu-Daudé Link: https://lore.kernel.org/r/1710179338-294359-3-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 1 + migration/migration.h| 1 - hw/vfio/common.c | 2 +- migration/migration.c| 12 ++-- migration/ram.c | 5 ++--- net/vhost-vdpa.c | 3 +-- 6 files changed, 11 insertions(+), 13 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 4c226a40bb..79cff6224e 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -61,6 +61,7 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(MigrationState *); +bool migration_is_setup_or_active(void); bool migrate_mode_is_cpr(MigrationState *); typedef enum MigrationEventType { diff --git a/migration/migration.h b/migration/migration.h index 65c0b61cbd..736460aa8b 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -479,7 +479,6 @@ bool migrate_has_error(MigrationState *s); void migrate_fd_connect(MigrationState *s, Error *error_in); -bool migration_is_setup_or_active(int state); bool migration_is_running(int state); int migrate_init(MigrationState *s, Error **errp); diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 059bfdc07a..896eab8103 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -152,7 +152,7 @@ static void vfio_set_migration_error(int err) { MigrationState *ms = migrate_get_current(); -if (migration_is_setup_or_active(ms->state)) { +if (migration_is_setup_or_active()) { WITH_QEMU_LOCK_GUARD(>qemu_file_lock) { if (ms->to_dst_file) { qemu_file_set_error(ms->to_dst_file, err); diff --git a/migration/migration.c b/migration/migration.c index a49fcd53ee..af21403bad 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1081,9 +1081,11 @@ void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value) * Return true if we're already in the middle of a migration * (i.e. any of the active or setup states) */ -bool migration_is_setup_or_active(int state) +bool migration_is_setup_or_active(void) { -switch (state) { +MigrationState *s = current_migration; + +switch (s->state) { case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: @@ -1601,10 +1603,8 @@ bool migration_incoming_postcopy_advised(void) bool migration_in_bg_snapshot(void) { -MigrationState *s = migrate_get_current(); - return migrate_background_snapshot() && -migration_is_setup_or_active(s->state); + migration_is_setup_or_active(); } bool migration_is_idle(void) @@ -2297,7 +2297,7 @@ static void *source_return_path_thread(void *opaque) trace_source_return_path_thread_entry(); rcu_register_thread(); -while (migration_is_setup_or_active(ms->state)) { +while (migration_is_setup_or_active()) { trace_source_return_path_thread_loop_top(); header_type = qemu_get_be16(rp); diff --git a/migration/ram.c b/migration/ram.c index 2cd936d9ce..3ee8cb47d3 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2909,10 +2909,9 @@ void qemu_guest_free_page_hint(void *addr, size_t len) RAMBlock *block; ram_addr_t offset; size_t used_len, start, npages; -MigrationState *s = migrate_get_current(); /* This function is currently expected to be used during live migration */ -if (!migration_is_setup_or_active(s->state)) { +if (!migration_is_setup_or_active()) { return; } @@ -3263,7 +3262,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) out: if (ret >= 0 -&& migration_is_setup_or_active(migrate_get_current()->state)) { +&& migration_is_setup_or_active()) { if (migrate_multifd() && migrate_multifd_flush_after_each_section() && !migrate_mapped_ram()) { ret = multifd_send_sync_main(); diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index e6bdb4562d..8564817073 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -26,7 +26,6 @@ #include #include "standard-headers/linux/virtio_net.h" #include "monitor/monitor.h" -#include "migration/migration.h" #include "migration/misc.h" #include "hw/virtio/vhost.h" @@ -355,7 +354,7 @@ static int vhost_vdpa_net_data_start(NetClientState *nc) assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA); if (s->always_svq || -migration_is_setup_or_active(migrate_get_current()->state)) { +migration_is_setup_or_active()) { v->shadow_vqs_enabled = true; } else { v->shadow_vqs_enabled = false; -- 2.44.0
[PULL 22/34] migration: migration_thread_is_self
From: Steve Sistare Define and export migration_thread_is_self to eliminate a dependency on MigrationState. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-7-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 1 + migration/migration.c| 7 +++ system/dirtylimit.c | 5 + 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 7526977de6..c4b5416357 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -61,6 +61,7 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(void); +bool migration_thread_is_self(void); bool migration_is_setup_or_active(void); bool migrate_mode_is_cpr(MigrationState *); diff --git a/migration/migration.c b/migration/migration.c index 546ba86c63..afe72af0b1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1647,6 +1647,13 @@ bool migration_is_active(void) s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); } +bool migration_thread_is_self(void) +{ +MigrationState *s = current_migration; + +return qemu_thread_is_self(>thread); +} + bool migrate_mode_is_cpr(MigrationState *s) { return s->parameters.mode == MIG_MODE_CPR_REBOOT; diff --git a/system/dirtylimit.c b/system/dirtylimit.c index b0afaa0776..ab20da34bb 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -25,7 +25,6 @@ #include "sysemu/kvm.h" #include "trace.h" #include "migration/misc.h" -#include "migration/migration.h" /* * Dirtylimit stop working if dirty page rate error @@ -448,10 +447,8 @@ static void dirtylimit_cleanup(void) */ static bool dirtylimit_is_allowed(void) { -MigrationState *ms = migrate_get_current(); - if (migration_is_running() && -(!qemu_thread_is_self(>thread)) && +!migration_thread_is_self() && migrate_dirty_limit() && dirtylimit_in_service()) { return false; -- 2.44.0
[PULL 10/34] migration/rdma: Fix a memory issue for migration
From: Yu Zhang In commit 3fa9642ff7 change was made to convert the RDMA backend to accept MigrateAddress struct. However, the assignment of "host" leads to data corruption on the target host and the failure of migration. isock->host = rdma->host; By allocating the memory explicitly for it with g_strdup_printf(), the issue is fixed and the migration doesn't fail any more. Fixes: 3fa9642ff7 ("migration: convert rdma backend to accept MigrateAddress") Cc: qemu-stable Cc: Li Zhijian Link: https://lore.kernel.org/r/CAHEcVy4L_D6tuhJ8h=xlr4wapaprje3nnxzaeyunotrxq6c...@mail.gmail.com Signed-off-by: Yu Zhang [peterx: use g_strdup() instead of g_strdup_printf(), per Zhijian] Signed-off-by: Peter Xu --- migration/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/rdma.c b/migration/rdma.c index a355dcea89..855753c671 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3357,7 +3357,7 @@ static int qemu_rdma_accept(RDMAContext *rdma) goto err_rdma_dest_wait; } -isock->host = rdma->host; +isock->host = g_strdup(rdma->host); isock->port = g_strdup_printf("%d", rdma->port); /* -- 2.44.0
[PULL 28/34] migration/multifd: Allow zero pages in file migration
From: Fabiano Rosas Currently, it's an error to have no data pages in the multifd file migration because zero page detection is done in the migration thread and zero pages don't reach multifd. This is enforced with the pages->num assert. We're about to add zero page detection on the multifd thread. Fix the file_write_ramblock_iov() to stop considering p->iovs_num=0 an error. Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240311180015.3359271-2-hao.xi...@linux.dev Signed-off-by: Peter Xu --- migration/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/file.c b/migration/file.c index 5054a60851..b0b963e0ce 100644 --- a/migration/file.c +++ b/migration/file.c @@ -159,7 +159,7 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp) int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov, int niov, RAMBlock *block, Error **errp) { -ssize_t ret = -1; +ssize_t ret = 0; int i, slice_idx, slice_num; uintptr_t base, next, offset; size_t len; -- 2.44.0
[PULL 34/34] migration/multifd: Add new migration test cases for legacy zero page checking.
From: Hao Xiang Now that zero page checking is done on the multifd sender threads by default, we still provide an option for backward compatibility. This change adds a qtest migration test case to set the zero-page-detection option to "legacy" and run multifd migration with zero page checking on the migration main thread. Signed-off-by: Hao Xiang Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240311180015.3359271-8-hao.xi...@linux.dev Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 52 1 file changed, 52 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 4023d808f9..71895abb7f 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -2771,6 +2771,24 @@ test_migrate_precopy_tcp_multifd_start(QTestState *from, return test_migrate_precopy_tcp_multifd_start_common(from, to, "none"); } +static void * +test_migrate_precopy_tcp_multifd_start_zero_page_legacy(QTestState *from, +QTestState *to) +{ +test_migrate_precopy_tcp_multifd_start_common(from, to, "none"); +migrate_set_parameter_str(from, "zero-page-detection", "legacy"); +return NULL; +} + +static void * +test_migration_precopy_tcp_multifd_start_no_zero_page(QTestState *from, + QTestState *to) +{ +test_migrate_precopy_tcp_multifd_start_common(from, to, "none"); +migrate_set_parameter_str(from, "zero-page-detection", "none"); +return NULL; +} + static void * test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from, QTestState *to) @@ -2812,6 +2830,36 @@ static void test_multifd_tcp_none(void) test_precopy_common(); } +static void test_multifd_tcp_zero_page_legacy(void) +{ +MigrateCommon args = { +.listen_uri = "defer", +.start_hook = test_migrate_precopy_tcp_multifd_start_zero_page_legacy, +/* + * Multifd is more complicated than most of the features, it + * directly takes guest page buffers when sending, make sure + * everything will work alright even if guest page is changing. + */ +.live = true, +}; +test_precopy_common(); +} + +static void test_multifd_tcp_no_zero_page(void) +{ +MigrateCommon args = { +.listen_uri = "defer", +.start_hook = test_migration_precopy_tcp_multifd_start_no_zero_page, +/* + * Multifd is more complicated than most of the features, it + * directly takes guest page buffers when sending, make sure + * everything will work alright even if guest page is changing. + */ +.live = true, +}; +test_precopy_common(); +} + static void test_multifd_tcp_zlib(void) { MigrateCommon args = { @@ -3729,6 +3777,10 @@ int main(int argc, char **argv) } migration_test_add("/migration/multifd/tcp/plain/none", test_multifd_tcp_none); +migration_test_add("/migration/multifd/tcp/plain/zero-page/legacy", + test_multifd_tcp_zero_page_legacy); +migration_test_add("/migration/multifd/tcp/plain/zero-page/none", + test_multifd_tcp_no_zero_page); migration_test_add("/migration/multifd/tcp/plain/cancel", test_multifd_tcp_cancel); migration_test_add("/migration/multifd/tcp/plain/zlib", -- 2.44.0
[PULL 26/34] migration: delete unused accessors
From: Steve Sistare Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-11-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 3 --- migration/migration.c| 10 -- 2 files changed, 13 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index e521cd5229..d563d2c801 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -105,13 +105,10 @@ void migration_add_notifier_mode(NotifierWithReturn *notify, void migration_remove_notifier(NotifierWithReturn *notify); int migration_call_notifiers(MigrationState *s, MigrationEventType type, Error **errp); -bool migration_in_setup(MigrationState *); -bool migration_has_finished(MigrationState *); bool migration_has_failed(MigrationState *); bool migration_is_running(void); void migration_file_set_error(int err); -/* ...and after the device transmission */ /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */ bool migration_in_incoming_postcopy(void); /* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */ diff --git a/migration/migration.c b/migration/migration.c index 216f63d62b..644e073b7d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1548,16 +1548,6 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type, return ret; } -bool migration_in_setup(MigrationState *s) -{ -return s->state == MIGRATION_STATUS_SETUP; -} - -bool migration_has_finished(MigrationState *s) -{ -return s->state == MIGRATION_STATUS_COMPLETED; -} - bool migration_has_failed(MigrationState *s) { return (s->state == MIGRATION_STATUS_CANCELLED || -- 2.44.0
[PULL 09/34] migration/multifd: Don't fsync when closing QIOChannelFile
From: Fabiano Rosas Commit bc38feddeb ("io: fsync before closing a file channel") added a fsync/fdatasync at the closing point of the QIOChannelFile to ensure integrity of the migration stream in case of QEMU crash. The decision to do the sync at qio_channel_close() was not the best since that function runs in the main thread and the fsync can cause QEMU to hang for several minutes, depending on the migration size and disk speed. To fix the hang, remove the fsync from qio_channel_file_close(). At this moment, the migration code is the only user of the fsync and we're taking the tradeoff of not having a sync at all, leaving the responsibility to the upper layers. Fixes: bc38feddeb ("io: fsync before closing a file channel") Reviewed-by: "Daniel P. Berrangé" Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240305195629.9922-1-faro...@suse.de Link: https://lore.kernel.org/r/20240305174332.2553-1-faro...@suse.de [peterx: add more comment to the qio_channel_close()] Signed-off-by: Peter Xu --- docs/devel/migration/main.rst | 3 ++- io/channel-file.c | 5 - migration/multifd.c | 28 +++- 3 files changed, 21 insertions(+), 15 deletions(-) diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index 8024275d6d..54385a23e5 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -44,7 +44,8 @@ over any transport. - file migration: do the migration using a file that is passed to QEMU by path. A file offset option is supported to allow a management application to add its own metadata to the start of the file without - QEMU interference. + QEMU interference. Note that QEMU does not flush cached file + data/metadata at the end of migration. In addition, support is included for migration using RDMA, which transports the page data using ``RDMA``, where the hardware takes care of diff --git a/io/channel-file.c b/io/channel-file.c index d4706fa592..a6ad7770c6 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -242,11 +242,6 @@ static int qio_channel_file_close(QIOChannel *ioc, { QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); -if (qemu_fdatasync(fioc->fd) < 0) { -error_setg_errno(errp, errno, - "Unable to synchronize file data with storage device"); -return -1; -} if (qemu_close(fioc->fd) < 0) { error_setg_errno(errp, errno, "Unable to close file"); diff --git a/migration/multifd.c b/migration/multifd.c index d4a44da559..bf9d483f7a 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -710,16 +710,26 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) if (p->c) { migration_ioc_unregister_yank(p->c); /* - * An explicit close() on the channel here is normally not - * required, but can be helpful for "file:" iochannels, where it - * will include fdatasync() to make sure the data is flushed to the - * disk backend. + * The object_unref() cannot guarantee the fd will always be + * released because finalize() of the iochannel is only + * triggered on the last reference and it's not guaranteed + * that we always hold the last refcount when reaching here. * - * The object_unref() cannot guarantee that because: (1) finalize() - * of the iochannel is only triggered on the last reference, and - * it's not guaranteed that we always hold the last refcount when - * reaching here, and, (2) even if finalize() is invoked, it only - * does a close(fd) without data flush. + * Closing the fd explicitly has the benefit that if there is any + * registered I/O handler callbacks on such fd, that will get a + * POLLNVAL event and will further trigger the cleanup to finally + * release the IOC. + * + * FIXME: It should logically be guaranteed that all multifd + * channels have no I/O handler callback registered when reaching + * here, because migration thread will wait for all multifd channel + * establishments to complete during setup. Since + * migrate_fd_cleanup() will be scheduled in main thread too, all + * previous callbacks should guarantee to be completed when + * reaching here. See multifd_send_state.channels_created and its + * usage. In the future, we could replace this with an assert + * making sure we're the last reference, or simply drop it if above + * is more clear to be justified. */ qio_channel_close(p->c, _abort); object_unref(OBJECT(p->c)); -- 2.44.0
[PULL 33/34] migration/multifd: Enable multifd zero page checking by default.
From: Hao Xiang 1. Set default "zero-page-detection" option to "multifd". Now zero page checking can be done in the multifd threads and this becomes the default configuration. 2. Handle migration QEMU9.0 -> QEMU8.2 compatibility. We provide backward compatibility where zero page checking is done from the migration main thread. Signed-off-by: Hao Xiang Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240311180015.3359271-7-hao.xi...@linux.dev Signed-off-by: Peter Xu --- qapi/migration.json | 6 +++--- hw/core/machine.c | 4 +++- migration/options.c | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index 2684e4e9ac..aa1b39bce1 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -909,7 +909,7 @@ #(Since 8.2) # # @zero-page-detection: Whether and how to detect zero pages. -# See description in @ZeroPageDetection. Default is 'legacy'. +# See description in @ZeroPageDetection. Default is 'multifd'. # (since 9.0) # # Features: @@ -1106,7 +1106,7 @@ #(Since 8.2) # # @zero-page-detection: Whether and how to detect zero pages. -# See description in @ZeroPageDetection. Default is 'legacy'. +# See description in @ZeroPageDetection. Default is 'multifd'. # (since 9.0) # # Features: @@ -1339,7 +1339,7 @@ #(Since 8.2) # # @zero-page-detection: Whether and how to detect zero pages. -# See description in @ZeroPageDetection. Default is 'legacy'. +# See description in @ZeroPageDetection. Default is 'multifd'. # (since 9.0) # # Features: diff --git a/hw/core/machine.c b/hw/core/machine.c index 9ac5d5389a..0e9d646b61 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -32,7 +32,9 @@ #include "hw/virtio/virtio-net.h" #include "audio/audio.h" -GlobalProperty hw_compat_8_2[] = {}; +GlobalProperty hw_compat_8_2[] = { +{ "migration", "zero-page-detection", "legacy"}, +}; const size_t hw_compat_8_2_len = G_N_ELEMENTS(hw_compat_8_2); GlobalProperty hw_compat_8_1[] = { diff --git a/migration/options.c b/migration/options.c index 8f2a3a2fa5..9ed2fe4bee 100644 --- a/migration/options.c +++ b/migration/options.c @@ -181,7 +181,7 @@ Property migration_properties[] = { MIG_MODE_NORMAL), DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState, parameters.zero_page_detection, - ZERO_PAGE_DETECTION_LEGACY), + ZERO_PAGE_DETECTION_MULTIFD), /* Migration capabilities */ DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE), -- 2.44.0
[PULL 06/34] migration: Remove SaveStateHandler and LoadStateHandler typedefs
From: Cédric Le Goater They are only used once. Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu Signed-off-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240304122844.1888308-8-...@redhat.com Signed-off-by: Peter Xu --- include/migration/register.h | 4 ++-- include/qemu/typedefs.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/migration/register.h b/include/migration/register.h index 9ab1f79512..2e6a7d766e 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -18,7 +18,7 @@ typedef struct SaveVMHandlers { /* This runs inside the BQL. */ -SaveStateHandler *save_state; +void (*save_state)(QEMUFile *f, void *opaque); /* * save_prepare is called early, even before migration starts, and can be @@ -71,7 +71,7 @@ typedef struct SaveVMHandlers { /* This calculate the exact remaining data to transfer */ void (*state_pending_exact)(void *opaque, uint64_t *must_precopy, uint64_t *can_postcopy); -LoadStateHandler *load_state; +int (*load_state)(QEMUFile *f, void *opaque, int version_id); int (*load_setup)(QEMUFile *f, void *opaque); int (*load_cleanup)(void *opaque); /* Called when postcopy migration wants to resume from failure */ diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h index a028dba4d0..50c277cf0b 100644 --- a/include/qemu/typedefs.h +++ b/include/qemu/typedefs.h @@ -151,8 +151,6 @@ typedef struct IRQState *qemu_irq; /* * Function types */ -typedef void SaveStateHandler(QEMUFile *f, void *opaque); -typedef int LoadStateHandler(QEMUFile *f, void *opaque, int version_id); typedef void (*qemu_irq_handler)(void *opaque, int n, int level); #endif /* QEMU_TYPEDEFS_H */ -- 2.44.0
[PULL 24/34] migration: migration_file_set_error
From: Steve Sistare Define and export migration_file_set_error to eliminate a dependency on MigrationState. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-9-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 2 ++ hw/vfio/common.c | 9 + hw/vfio/migration.c | 11 +++ migration/migration.c| 11 +++ 4 files changed, 17 insertions(+), 16 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 28cfaed2c7..e521cd5229 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -109,6 +109,8 @@ bool migration_in_setup(MigrationState *); bool migration_has_finished(MigrationState *); bool migration_has_failed(MigrationState *); bool migration_is_running(void); +void migration_file_set_error(int err); + /* ...and after the device transmission */ /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */ bool migration_in_incoming_postcopy(void); diff --git a/hw/vfio/common.c b/hw/vfio/common.c index de010680ff..b44204eade 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -39,7 +39,6 @@ #include "sysemu/runstate.h" #include "trace.h" #include "qapi/error.h" -#include "migration/migration.h" #include "migration/misc.h" #include "migration/blocker.h" #include "migration/qemu-file.h" @@ -150,14 +149,8 @@ bool vfio_viommu_preset(VFIODevice *vbasedev) static void vfio_set_migration_error(int err) { -MigrationState *ms = migrate_get_current(); - if (migration_is_setup_or_active()) { -WITH_QEMU_LOCK_GUARD(>qemu_file_lock) { -if (ms->to_dst_file) { -qemu_file_set_error(ms->to_dst_file, err); -} -} +migration_file_set_error(err); } } diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 49c0016add..a15fd486c6 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -17,13 +17,12 @@ #include "sysemu/runstate.h" #include "hw/vfio/vfio-common.h" -#include "migration/migration.h" +#include "migration/misc.h" #include "migration/savevm.h" #include "migration/vmstate.h" #include "migration/qemu-file.h" #include "migration/register.h" #include "migration/blocker.h" -#include "migration/misc.h" #include "qapi/error.h" #include "exec/ramlist.h" #include "exec/ram_addr.h" @@ -714,9 +713,7 @@ static void vfio_vmstate_change_prepare(void *opaque, bool running, * Migration should be aborted in this case, but vm_state_notify() * currently does not support reporting failures. */ -if (migrate_get_current()->to_dst_file) { -qemu_file_set_error(migrate_get_current()->to_dst_file, ret); -} +migration_file_set_error(ret); } trace_vfio_vmstate_change_prepare(vbasedev->name, running, @@ -746,9 +743,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state) * Migration should be aborted in this case, but vm_state_notify() * currently does not support reporting failures. */ -if (migrate_get_current()->to_dst_file) { -qemu_file_set_error(migrate_get_current()->to_dst_file, ret); -} +migration_file_set_error(ret); } trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state), diff --git a/migration/migration.c b/migration/migration.c index db1e627848..216f63d62b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3038,6 +3038,17 @@ static MigThrError postcopy_pause(MigrationState *s) } } +void migration_file_set_error(int err) +{ +MigrationState *s = current_migration; + +WITH_QEMU_LOCK_GUARD(>qemu_file_lock) { +if (s->to_dst_file) { +qemu_file_set_error(s->to_dst_file, err); +} +} +} + static MigThrError migration_detect_error(MigrationState *s) { int ret; -- 2.44.0
[PULL 12/34] physmem: Reduce local variable scope in flatview_read/write_continue()
From: Jonathan Cameron Precursor to factoring out the inner loops for reuse. Reviewed-by: Peter Xu Signed-off-by: Jonathan Cameron Reviewed-by: David Hildenbrand Reviewed-by: Philippe Mathieu-Daudé Link: https://lore.kernel.org/r/20240307153710.30907-3-jonathan.came...@huawei.com Signed-off-by: Peter Xu --- system/physmem.c | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/system/physmem.c b/system/physmem.c index e92bed50a6..e35aa29343 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -2688,10 +2688,7 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr, hwaddr len, hwaddr mr_addr, hwaddr l, MemoryRegion *mr) { -uint8_t *ram_ptr; -uint64_t val; MemTxResult result = MEMTX_OK; -bool release_lock = false; const uint8_t *buf = ptr; for (;;) { @@ -2699,7 +2696,9 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr, result |= MEMTX_ACCESS_ERROR; /* Keep going. */ } else if (!memory_access_is_direct(mr, true)) { -release_lock |= prepare_mmio_access(mr); +uint64_t val; +bool release_lock = prepare_mmio_access(mr); + l = memory_access_size(mr, l, mr_addr); /* XXX: could force current_cpu to NULL to avoid potential bugs */ @@ -2717,18 +2716,21 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr, val = ldn_he_p(buf, l); result |= memory_region_dispatch_write(mr, mr_addr, val, size_memop(l), attrs); +if (release_lock) { +bql_unlock(); +} + + } else { /* RAM case */ -ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, , false); + +uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, , + false); + memmove(ram_ptr, buf, l); invalidate_and_set_dirty(mr, mr_addr, l); } -if (release_lock) { -bql_unlock(); -release_lock = false; -} - len -= l; buf += l; addr += l; @@ -2767,10 +2769,7 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr, hwaddr len, hwaddr mr_addr, hwaddr l, MemoryRegion *mr) { -uint8_t *ram_ptr; -uint64_t val; MemTxResult result = MEMTX_OK; -bool release_lock = false; uint8_t *buf = ptr; fuzz_dma_read_cb(addr, len, mr); @@ -2780,7 +2779,9 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr, /* Keep going. */ } else if (!memory_access_is_direct(mr, false)) { /* I/O case */ -release_lock |= prepare_mmio_access(mr); +uint64_t val; +bool release_lock = prepare_mmio_access(mr); + l = memory_access_size(mr, l, mr_addr); result |= memory_region_dispatch_read(mr, mr_addr, , size_memop(l), attrs); @@ -2796,17 +2797,16 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr, (l == 8 && len >= 8)); #endif stn_he_p(buf, l, val); +if (release_lock) { +bql_unlock(); +} } else { /* RAM case */ -ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, , false); +uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, , + false); memcpy(buf, ram_ptr, l); } -if (release_lock) { -bql_unlock(); -release_lock = false; -} - len -= l; buf += l; addr += l; -- 2.44.0
[PULL 01/34] migration: Don't serialize devices in qemu_savevm_state_iterate()
From: Avihai Horon Commit 90697be8896c ("live migration: Serialize vmstate saving in stage 2") introduced device serialization in qemu_savevm_state_iterate(). The rationale behind it was to first complete migration of slower changing block devices and only then migrate the RAM, to avoid sending fast changing RAM pages over and over. This commit was added a long time ago, and while it was useful back then, it is not the case anymore: 1. Block migration is deprecated, see commit 66db46ca83b8 ("migration: Deprecate block migration"). 2. Today there are other iterative devices besides RAM and block, such as VFIO, which are registered for migration after RAM. With current serialization behavior, a fast changing device can block other devices from sending their data, which may prevent migration from converging in some cases. The issue described in item 2 was observed in several VFIO migration scenarios with switchover-ack capability enabled, where some workload on the VM prevented RAM from ever reaching a hard zero, thus blocking VFIO initial pre-copy data from being sent. Hence, destination could not ack switchover and migration could not converge. Fix that by not serializing iterative devices in qemu_savevm_state_iterate(). Note that this still doesn't fully prevent device starvation. As correctly pointed out by Peter [1], a fast changing device might constantly consume all allocated bandwidth and block the following devices. However, this scenario is more likely to happen only if max-bandwidth is low. [1] https://lore.kernel.org/qemu-devel/Zd6iw9dBhW6wKNxx@x1n/ Signed-off-by: Avihai Horon Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240304105339.20713-2-avih...@nvidia.com Signed-off-by: Peter Xu --- migration/savevm.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/migration/savevm.c b/migration/savevm.c index dc1fb9c0d3..e84b26e1c8 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1390,7 +1390,8 @@ int qemu_savevm_state_resume_prepare(MigrationState *s) int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy) { SaveStateEntry *se; -int ret = 1; +bool all_finished = true; +int ret; trace_savevm_state_iterate(); QTAILQ_FOREACH(se, _state.handlers, entry) { @@ -1431,16 +1432,12 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy) "%d(%s): %d", se->section_id, se->idstr, ret); qemu_file_set_error(f, ret); -} -if (ret <= 0) { -/* Do not proceed to the next vmstate before this one reported - completion of the current stage. This serializes the migration - and reduces the probability that a faster changing state is - synchronized over and over again. */ -break; +return ret; +} else if (!ret) { +all_finished = false; } } -return ret; +return all_finished; } static bool should_send_vmdesc(void) -- 2.44.0
[PULL 23/34] migration: migration_is_device
From: Steve Sistare Define and export migration_is_device to eliminate a dependency on MigrationState. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-8-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 1 + hw/vfio/common.c | 4 +--- migration/migration.c| 7 +++ 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index c4b5416357..28cfaed2c7 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -61,6 +61,7 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(void); +bool migration_is_device(void); bool migration_thread_is_self(void); bool migration_is_setup_or_active(void); bool migrate_mode_is_cpr(MigrationState *); diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 2dbbf62e15..de010680ff 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -180,10 +180,8 @@ bool vfio_device_state_is_precopy(VFIODevice *vbasedev) static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer) { VFIODevice *vbasedev; -MigrationState *ms = migrate_get_current(); -if (!migration_is_active() && -ms->state != MIGRATION_STATUS_DEVICE) { +if (!migration_is_active() && !migration_is_device()) { return false; } diff --git a/migration/migration.c b/migration/migration.c index afe72af0b1..db1e627848 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1647,6 +1647,13 @@ bool migration_is_active(void) s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); } +bool migration_is_device(void) +{ +MigrationState *s = current_migration; + +return s->state == MIGRATION_STATUS_DEVICE; +} + bool migration_thread_is_self(void) { MigrationState *s = current_migration; -- 2.44.0
[PULL 17/34] migration: remove migration.h references
From: Steve Sistare Remove migration.h from files that no longer need it due to previous commits. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-2-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- hw/vfio/container.c| 1 - hw/virtio/vhost-user.c | 1 - hw/virtio/virtio-balloon.c | 1 - system/qdev-monitor.c | 1 - target/loongarch/kvm/kvm.c | 1 - tests/unit/test-vmstate.c | 1 - 6 files changed, 6 deletions(-) diff --git a/hw/vfio/container.c b/hw/vfio/container.c index bd25b9fbad..ff081a12c2 100644 --- a/hw/vfio/container.c +++ b/hw/vfio/container.c @@ -32,7 +32,6 @@ #include "sysemu/reset.h" #include "trace.h" #include "qapi/error.h" -#include "migration/migration.h" #include "pci.h" VFIOGroupList vfio_group_list = diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index a1eea8547e..1af8621481 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -26,7 +26,6 @@ #include "qemu/sockets.h" #include "sysemu/runstate.h" #include "sysemu/cryptodev.h" -#include "migration/migration.h" #include "migration/postcopy-ram.h" #include "trace.h" #include "exec/ramblock.h" diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index a59ff172bd..609e39a821 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -31,7 +31,6 @@ #include "trace.h" #include "qemu/error-report.h" #include "migration/misc.h" -#include "migration/migration.h" #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c index 09e07cab9b..c1243891c3 100644 --- a/system/qdev-monitor.c +++ b/system/qdev-monitor.c @@ -38,7 +38,6 @@ #include "qemu/option_int.h" #include "sysemu/block-backend.h" #include "migration/misc.h" -#include "migration/migration.h" #include "qemu/cutils.h" #include "hw/qdev-properties.h" #include "hw/clock.h" diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c index c19978a970..11a69a3b4e 100644 --- a/target/loongarch/kvm/kvm.c +++ b/target/loongarch/kvm/kvm.c @@ -22,7 +22,6 @@ #include "hw/irq.h" #include "qemu/log.h" #include "hw/loader.h" -#include "migration/migration.h" #include "sysemu/runstate.h" #include "cpu-csr.h" #include "kvm_loongarch.h" diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c index c4f9faa273..63f28f26f4 100644 --- a/tests/unit/test-vmstate.c +++ b/tests/unit/test-vmstate.c @@ -24,7 +24,6 @@ #include "qemu/osdep.h" -#include "../migration/migration.h" #include "migration/vmstate.h" #include "migration/qemu-file-types.h" #include "../migration/qemu-file.h" -- 2.44.0
[PULL 00/34] Migration 20240311 patches
From: Peter Xu The following changes since commit 7489f7f3f81dcb776df8c1b9a9db281fc21bf05f: Merge tag 'hw-misc-20240309' of https://github.com/philmd/qemu into staging (2024-03-09 20:12:21 +) are available in the Git repository at: https://gitlab.com/peterx/qemu.git tags/migration-20240311-pull-request for you to fetch changes up to 1815338df00fd0a3fe25085564c6966f74c8f43d: migration/multifd: Add new migration test cases for legacy zero page checking. (2024-03-11 16:57:09 -0400) Migration pull request - Avihai's fix to allow vmstate iterators to not starve for VFIO - Maksim's fix on additional check on precopy load error - Fabiano's fix on fdatasync() hang in mapped-ram - Jonathan's fix on vring cached access over MMIO regions - Cedric's cleanup patches 1-4 out of his error report series - Yu's fix for RDMA migration (which used to be broken even for 8.2) - Anthony's small cleanup/fix on err message - Steve's patches on privatize migration.h - Xiang's patchset to enable zero page detections in multifd threads Anthony PERARD (1): migration: Fix format in error message Avihai Horon (3): migration: Don't serialize devices in qemu_savevm_state_iterate() vfio/migration: Refactor vfio_save_state() return value vfio/migration: Add a note about migration rate limiting Cédric Le Goater (4): migration: Report error when shutdown fails migration: Remove SaveStateHandler and LoadStateHandler typedefs migration: Add documentation for SaveVMHandlers migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error Fabiano Rosas (3): migration/multifd: Don't fsync when closing QIOChannelFile migration/multifd: Allow zero pages in file migration migration/multifd: Allow clearing of the file_bmap from multifd Hao Xiang (5): migration/multifd: Add new migration option zero-page-detection. migration/multifd: Implement zero page transmission on the multifd thread. migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page. migration/multifd: Enable multifd zero page checking by default. migration/multifd: Add new migration test cases for legacy zero page checking. Jonathan Cameron (4): physmem: Rename addr1 to more informative mr_addr in flatview_read/write() and similar physmem: Reduce local variable scope in flatview_read/write_continue() physmem: Factor out body of flatview_read/write_continue() loop physmem: Fix wrong address in large address_space_read/write_cached_slow() Maksim Davydov (1): migration/ram: add additional check Steve Sistare (12): migration: export fewer options migration: remove migration.h references migration: export migration_is_setup_or_active migration: export migration_is_active migration: export migration_is_running migration: export vcpu_dirty_limit_period migration: migration_thread_is_self migration: migration_is_device migration: migration_file_set_error migration: privatize colo interfaces migration: delete unused accessors migration: purge MigrationState from public interface Yu Zhang (1): migration/rdma: Fix a memory issue for migration docs/devel/migration/main.rst | 3 +- qapi/migration.json | 38 +++- include/hw/qdev-properties-system.h | 4 + include/migration/client-options.h | 25 +++ include/migration/misc.h| 18 +- include/migration/register.h| 267 +--- include/qemu/typedefs.h | 2 - migration/migration.h | 7 +- migration/multifd.h | 23 ++- migration/options.h | 7 +- migration/ram.h | 3 +- hw/core/machine.c | 4 +- hw/core/qdev-properties-system.c| 10 ++ hw/vfio/common.c| 17 +- hw/vfio/container.c | 1 - hw/vfio/migration.c | 24 ++- hw/virtio/vhost-user.c | 1 - hw/virtio/virtio-balloon.c | 2 - io/channel-file.c | 5 - migration/colo.c| 17 +- migration/file.c| 4 +- migration/migration-hmp-cmds.c | 9 + migration/migration.c | 67 --- migration/multifd-zero-page.c | 87 + migration/multifd-zlib.c| 21 ++- migration/multifd-zstd.c| 20 ++- migration/multifd.c | 120 ++--- migration/options.c | 32 +++- migration/qemu-file.c | 5 +- migration/ram.c | 62 +-- migration/rdma.c| 2 +- migration/savevm.c | 23 +-- net/colo-compare.c | 3 +- net/vhost-vdpa.c| 3 +- stubs/colo.c| 1 - system
[PULL 08/34] migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
From: Cédric Le Goater When commit bd2270608fa0 ("migration/ram.c: add a notifier chain for precopy") added PRECOPY_NOTIFY_SETUP notifiers at the end of qemu_savevm_state_setup(), it didn't take into account a possible error in the loop calling vmstate_save() or .save_setup() handlers. Check ret value before calling the notifiers. Reviewed-by: Peter Xu Signed-off-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240304122844.1888308-10-...@redhat.com Signed-off-by: Peter Xu --- migration/savevm.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/migration/savevm.c b/migration/savevm.c index e84b26e1c8..76b57a9888 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1317,7 +1317,7 @@ void qemu_savevm_state_setup(QEMUFile *f) MigrationState *ms = migrate_get_current(); SaveStateEntry *se; Error *local_err = NULL; -int ret; +int ret = 0; json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size()); json_writer_start_array(ms->vmdesc, "devices"); @@ -1351,6 +1351,10 @@ void qemu_savevm_state_setup(QEMUFile *f) } } +if (ret) { +return; +} + if (precopy_notify(PRECOPY_NOTIFY_SETUP, _err)) { error_report_err(local_err); } -- 2.44.0
[PULL 16/34] migration: export fewer options
From: Steve Sistare A small number of migration options are accessed by migration clients, but to see them clients must include all of options.h, which is mostly for migration core code. migrate_mode() in particular will be needed by multiple clients. Refactor the option declarations so clients can see the necessary few via misc.h, which already exports a portion of the client API. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179319-294320-1-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/client-options.h | 24 include/migration/misc.h | 1 + migration/options.h| 6 +- hw/vfio/migration.c| 1 - hw/virtio/virtio-balloon.c | 1 - system/dirtylimit.c| 1 - 6 files changed, 26 insertions(+), 8 deletions(-) create mode 100644 include/migration/client-options.h diff --git a/include/migration/client-options.h b/include/migration/client-options.h new file mode 100644 index 00..887fea1565 --- /dev/null +++ b/include/migration/client-options.h @@ -0,0 +1,24 @@ +/* + * QEMU public migration capabilities + * + * Copyright (c) 2012-2023 Red Hat Inc + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_MIGRATION_CLIENT_OPTIONS_H +#define QEMU_MIGRATION_CLIENT_OPTIONS_H + +/* capabilities */ + +bool migrate_background_snapshot(void); +bool migrate_dirty_limit(void); +bool migrate_postcopy_ram(void); +bool migrate_switchover_ack(void); + +/* parameters */ + +MigMode migrate_mode(void); + +#endif diff --git a/include/migration/misc.h b/include/migration/misc.h index 5d1aa593ed..4c226a40bb 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -17,6 +17,7 @@ #include "qemu/notify.h" #include "qapi/qapi-types-migration.h" #include "qapi/qapi-types-net.h" +#include "migration/client-options.h" /* migration/ram.c */ diff --git a/migration/options.h b/migration/options.h index 6ddd8dad9b..b6b69c2bb7 100644 --- a/migration/options.h +++ b/migration/options.h @@ -16,6 +16,7 @@ #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" +#include "migration/client-options.h" /* migration properties */ @@ -24,12 +25,10 @@ extern Property migration_properties[]; /* capabilities */ bool migrate_auto_converge(void); -bool migrate_background_snapshot(void); bool migrate_block(void); bool migrate_colo(void); bool migrate_compress(void); bool migrate_dirty_bitmaps(void); -bool migrate_dirty_limit(void); bool migrate_events(void); bool migrate_mapped_ram(void); bool migrate_ignore_shared(void); @@ -38,11 +37,9 @@ bool migrate_multifd(void); bool migrate_pause_before_switchover(void); bool migrate_postcopy_blocktime(void); bool migrate_postcopy_preempt(void); -bool migrate_postcopy_ram(void); bool migrate_rdma_pin_all(void); bool migrate_release_ram(void); bool migrate_return_path(void); -bool migrate_switchover_ack(void); bool migrate_validate_uuid(void); bool migrate_xbzrle(void); bool migrate_zero_blocks(void); @@ -84,7 +81,6 @@ uint8_t migrate_max_cpu_throttle(void); uint64_t migrate_max_bandwidth(void); uint64_t migrate_avail_switchover_bandwidth(void); uint64_t migrate_max_postcopy_bandwidth(void); -MigMode migrate_mode(void); int migrate_multifd_channels(void); MultiFDCompression migrate_multifd_compression(void); int migrate_multifd_zlib_level(void); diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index f82dcabc49..49c0016add 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -18,7 +18,6 @@ #include "sysemu/runstate.h" #include "hw/vfio/vfio-common.h" #include "migration/migration.h" -#include "migration/options.h" #include "migration/savevm.h" #include "migration/vmstate.h" #include "migration/qemu-file.h" diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 89f853fa9e..a59ff172bd 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -32,7 +32,6 @@ #include "qemu/error-report.h" #include "migration/misc.h" #include "migration/migration.h" -#include "migration/options.h" #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" diff --git a/system/dirtylimit.c b/system/dirtylimit.c index b5607eb8c2..774ff44f79 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -26,7 +26,6 @@ #include "trace.h" #include "migration/misc.h" #include "migration/migration.h" -#include "migration/options.h" /* * Dirtylimit stop working if dirty page rate error -- 2.44.0
[PULL 03/34] vfio/migration: Add a note about migration rate limiting
From: Avihai Horon VFIO migration buffer size is currently limited to 1MB. Therefore, there is no need to check if migration rate exceeded, as in the worst case it will exceed by only 1MB. However, if the buffer size is later changed to a bigger value, vfio_save_iterate() should enforce migration rate (similar to migration RAM code). Add a note about this in vfio_save_iterate() to serve as a reminder. Suggested-by: Peter Xu Signed-off-by: Avihai Horon Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240304105339.20713-4-avih...@nvidia.com Signed-off-by: Peter Xu --- hw/vfio/migration.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 0af783a589..f82dcabc49 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -505,6 +505,12 @@ static bool vfio_is_active_iterate(void *opaque) return vfio_device_state_is_precopy(vbasedev); } +/* + * Note about migration rate limiting: VFIO migration buffer size is currently + * limited to 1MB, so there is no need to check if migration rate exceeded (as + * in the worst case it will exceed by 1MB). However, if the buffer size is + * later changed to a bigger value, migration rate should be enforced here. + */ static int vfio_save_iterate(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; -- 2.44.0
[PULL 04/34] migration/ram: add additional check
From: Maksim Davydov If a migration stream is broken, the address and flag reading can return zero. Thus, an irrelevant flag error will be returned instead of EIO. It can be fixed by additional check after the reading. Signed-off-by: Maksim Davydov Link: https://lore.kernel.org/r/20240304144203.158477-1-davydov-...@yandex-team.ru Signed-off-by: Peter Xu --- migration/ram.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 003c28e133..2cd936d9ce 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4214,6 +4214,12 @@ static int ram_load_precopy(QEMUFile *f) i++; addr = qemu_get_be64(f); +ret = qemu_file_get_error(f); +if (ret) { +error_report("Getting RAM address failed"); +break; +} + flags = addr & ~TARGET_PAGE_MASK; addr &= TARGET_PAGE_MASK; -- 2.44.0
[PULL 02/34] vfio/migration: Refactor vfio_save_state() return value
From: Avihai Horon Currently, vfio_save_state() returns 1 regardless of whether there is more data to send or not. This was done to prevent a fast changing VFIO device from potentially blocking other devices from sending their data, as qemu_savevm_state_iterate() serialized devices. Now that qemu_savevm_state_iterate() no longer serializes devices, there is no need for that. Refactor vfio_save_state() to return 0 if more data is available and 1 if no more data is available. Signed-off-by: Avihai Horon Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240304105339.20713-3-avih...@nvidia.com Signed-off-by: Peter Xu --- hw/vfio/migration.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 50140eda87..0af783a589 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -529,11 +529,7 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque) trace_vfio_save_iterate(vbasedev->name, migration->precopy_init_size, migration->precopy_dirty_size); -/* - * A VFIO device's pre-copy dirty_bytes is not guaranteed to reach zero. - * Return 1 so following handlers will not be potentially blocked. - */ -return 1; +return !migration->precopy_init_size && !migration->precopy_dirty_size; } static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) -- 2.44.0
[PULL 19/34] migration: export migration_is_active
From: Steve Sistare Delete the MigrationState parameter from migration_is_active so it can be exported and used without including migration.h. Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1710179338-294359-4-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 2 +- hw/vfio/common.c | 4 ++-- migration/migration.c| 10 ++ system/dirtylimit.c | 2 +- 4 files changed, 10 insertions(+), 8 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 79cff6224e..e1f1bf853e 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -60,7 +60,7 @@ void dump_vmstate_json_to_file(FILE *out_fp); void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); -bool migration_is_active(MigrationState *); +bool migration_is_active(void); bool migration_is_setup_or_active(void); bool migrate_mode_is_cpr(MigrationState *); diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 896eab8103..2dbbf62e15 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -182,7 +182,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer) VFIODevice *vbasedev; MigrationState *ms = migrate_get_current(); -if (ms->state != MIGRATION_STATUS_ACTIVE && +if (!migration_is_active() && ms->state != MIGRATION_STATUS_DEVICE) { return false; } @@ -225,7 +225,7 @@ vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer) { VFIODevice *vbasedev; -if (!migration_is_active(migrate_get_current())) { +if (!migration_is_active()) { return false; } diff --git a/migration/migration.c b/migration/migration.c index af21403bad..17859cbaee 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1406,7 +1406,7 @@ static void migrate_fd_cleanup(MigrationState *s) qemu_fclose(tmp); } -assert(!migration_is_active(s)); +assert(!migration_is_active()); if (s->state == MIGRATION_STATUS_CANCELLING) { migrate_set_state(>state, MIGRATION_STATUS_CANCELLING, @@ -1637,8 +1637,10 @@ bool migration_is_idle(void) return false; } -bool migration_is_active(MigrationState *s) +bool migration_is_active(void) { +MigrationState *s = current_migration; + return (s->state == MIGRATION_STATUS_ACTIVE || s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); } @@ -3461,7 +3463,7 @@ static void *migration_thread(void *opaque) trace_migration_thread_setup_complete(); -while (migration_is_active(s)) { +while (migration_is_active()) { if (urgent || !migration_rate_exceeded(s->to_dst_file)) { MigIterateState iter_state = migration_iteration_run(s); if (iter_state == MIG_ITERATE_SKIP) { @@ -3607,7 +3609,7 @@ static void *bg_migration_thread(void *opaque) migration_bh_schedule(bg_migration_vm_start_bh, s); bql_unlock(); -while (migration_is_active(s)) { +while (migration_is_active()) { MigIterateState iter_state = bg_migration_iteration_run(s); if (iter_state == MIG_ITERATE_SKIP) { continue; diff --git a/system/dirtylimit.c b/system/dirtylimit.c index 774ff44f79..051e0311c1 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -83,7 +83,7 @@ static void vcpu_dirty_rate_stat_collect(void) int64_t period = DIRTYLIMIT_CALC_TIME_MS; if (migrate_dirty_limit() && -migration_is_active(s)) { +migration_is_active()) { period = s->parameters.x_vcpu_dirty_limit_period; } -- 2.44.0
[PULL 14/34] physmem: Fix wrong address in large address_space_read/write_cached_slow()
From: Jonathan Cameron If the access is bigger than the MemoryRegion supports, flatview_read/write_continue() will attempt to update the Memory Region. but the address passed to flatview_translate() is relative to the cache, not to the FlatView. On arm/virt with interleaved CXL memory emulation and virtio-blk-pci this lead to the first part of descriptor being read from the CXL memory and the second part from PA 0x8 which happens to be a blank region of a flash chip and all ffs on this particular configuration. Note this test requires the out of tree ARM support for CXL, but the problem is more general. Avoid this by adding new address_space_read_continue_cached() and address_space_write_continue_cached() which share all the logic with the flatview versions except for the MemoryRegion lookup which is unnecessary as the MemoryRegionCache only covers one MemoryRegion. Signed-off-by: Jonathan Cameron Link: https://lore.kernel.org/r/20240307153710.30907-5-jonathan.came...@huawei.com Signed-off-by: Peter Xu --- system/physmem.c | 63 +++- 1 file changed, 57 insertions(+), 6 deletions(-) diff --git a/system/physmem.c b/system/physmem.c index 737869a3f5..6cfb7a80ab 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -3370,6 +3370,59 @@ static inline MemoryRegion *address_space_translate_cached( return section.mr; } +/* Called within RCU critical section. */ +static MemTxResult address_space_write_continue_cached(MemTxAttrs attrs, + const void *ptr, + hwaddr len, + hwaddr mr_addr, + hwaddr l, + MemoryRegion *mr) +{ +MemTxResult result = MEMTX_OK; +const uint8_t *buf = ptr; + +for (;;) { +result |= flatview_write_continue_step(attrs, buf, len, mr_addr, , + mr); + +len -= l; +buf += l; +mr_addr += l; + +if (!len) { +break; +} + +l = len; +} + +return result; +} + +/* Called within RCU critical section. */ +static MemTxResult address_space_read_continue_cached(MemTxAttrs attrs, + void *ptr, hwaddr len, + hwaddr mr_addr, hwaddr l, + MemoryRegion *mr) +{ +MemTxResult result = MEMTX_OK; +uint8_t *buf = ptr; + +for (;;) { +result |= flatview_read_continue_step(attrs, buf, len, mr_addr, , mr); +len -= l; +buf += l; +mr_addr += l; + +if (!len) { +break; +} +l = len; +} + +return result; +} + /* Called from RCU critical section. address_space_read_cached uses this * out of line function when the target is an MMIO or IOMMU region. */ @@ -3383,9 +3436,8 @@ address_space_read_cached_slow(MemoryRegionCache *cache, hwaddr addr, l = len; mr = address_space_translate_cached(cache, addr, _addr, , false, MEMTXATTRS_UNSPECIFIED); -return flatview_read_continue(cache->fv, - addr, MEMTXATTRS_UNSPECIFIED, buf, len, - mr_addr, l, mr); +return address_space_read_continue_cached(MEMTXATTRS_UNSPECIFIED, + buf, len, mr_addr, l, mr); } /* Called from RCU critical section. address_space_write_cached uses this @@ -3401,9 +3453,8 @@ address_space_write_cached_slow(MemoryRegionCache *cache, hwaddr addr, l = len; mr = address_space_translate_cached(cache, addr, _addr, , true, MEMTXATTRS_UNSPECIFIED); -return flatview_write_continue(cache->fv, - addr, MEMTXATTRS_UNSPECIFIED, buf, len, - mr_addr, l, mr); +return address_space_write_continue_cached(MEMTXATTRS_UNSPECIFIED, + buf, len, mr_addr, l, mr); } #define ARG1_DECLMemoryRegionCache *cache -- 2.44.0
[PULL 13/34] physmem: Factor out body of flatview_read/write_continue() loop
From: Jonathan Cameron This code will be reused for the address_space_cached accessors shortly. Also reduce scope of result variable now we aren't directly calling this in the loop. Signed-off-by: Jonathan Cameron Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/r/20240307153710.30907-4-jonathan.came...@huawei.com Signed-off-by: Peter Xu --- system/physmem.c | 169 +++ 1 file changed, 99 insertions(+), 70 deletions(-) diff --git a/system/physmem.c b/system/physmem.c index e35aa29343..737869a3f5 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -2681,6 +2681,56 @@ static bool flatview_access_allowed(MemoryRegion *mr, MemTxAttrs attrs, return false; } +static MemTxResult flatview_write_continue_step(MemTxAttrs attrs, +const uint8_t *buf, +hwaddr len, hwaddr mr_addr, +hwaddr *l, MemoryRegion *mr) +{ +if (!flatview_access_allowed(mr, attrs, mr_addr, *l)) { +return MEMTX_ACCESS_ERROR; +} + +if (!memory_access_is_direct(mr, true)) { +uint64_t val; +MemTxResult result; +bool release_lock = prepare_mmio_access(mr); + +*l = memory_access_size(mr, *l, mr_addr); +/* + * XXX: could force current_cpu to NULL to avoid + * potential bugs + */ + +/* + * Assure Coverity (and ourselves) that we are not going to OVERRUN + * the buffer by following ldn_he_p(). + */ +#ifdef QEMU_STATIC_ANALYSIS +assert((*l == 1 && len >= 1) || + (*l == 2 && len >= 2) || + (*l == 4 && len >= 4) || + (*l == 8 && len >= 8)); +#endif +val = ldn_he_p(buf, *l); +result = memory_region_dispatch_write(mr, mr_addr, val, + size_memop(*l), attrs); +if (release_lock) { +bql_unlock(); +} + +return result; +} else { +/* RAM case */ +uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, l, + false); + +memmove(ram_ptr, buf, *l); +invalidate_and_set_dirty(mr, mr_addr, *l); + +return MEMTX_OK; +} +} + /* Called within RCU critical section. */ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr, MemTxAttrs attrs, @@ -2692,44 +2742,8 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr, const uint8_t *buf = ptr; for (;;) { -if (!flatview_access_allowed(mr, attrs, mr_addr, l)) { -result |= MEMTX_ACCESS_ERROR; -/* Keep going. */ -} else if (!memory_access_is_direct(mr, true)) { -uint64_t val; -bool release_lock = prepare_mmio_access(mr); - -l = memory_access_size(mr, l, mr_addr); -/* XXX: could force current_cpu to NULL to avoid - potential bugs */ - -/* - * Assure Coverity (and ourselves) that we are not going to OVERRUN - * the buffer by following ldn_he_p(). - */ -#ifdef QEMU_STATIC_ANALYSIS -assert((l == 1 && len >= 1) || - (l == 2 && len >= 2) || - (l == 4 && len >= 4) || - (l == 8 && len >= 8)); -#endif -val = ldn_he_p(buf, l); -result |= memory_region_dispatch_write(mr, mr_addr, val, - size_memop(l), attrs); -if (release_lock) { -bql_unlock(); -} - - -} else { -/* RAM case */ - -uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, , - false); - -memmove(ram_ptr, buf, l); -invalidate_and_set_dirty(mr, mr_addr, l); -} +result |= flatview_write_continue_step(attrs, buf, len, mr_addr, , + mr); len -= l; buf += l; @@ -2763,6 +2777,52 @@ static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs, mr_addr, l, mr); } +static MemTxResult flatview_read_continue_step(MemTxAttrs attrs, uint8_t *buf, + hwaddr len, hwaddr mr_addr, + hwaddr *l, + MemoryRegion *mr) +{ +if (!flatview_access_allowed(mr, attrs, mr_addr, *l)) { +return MEMTX_ACCESS_ERROR; +} + +if (!memory_access_is_direct(mr, false)) { +/* I/O case */ +uint64_t val; +MemTxResult result; +bool release_lock = prepare_mmio_access(mr); + +*l = memory_access_size(mr,
[PULL 11/34] physmem: Rename addr1 to more informative mr_addr in flatview_read/write() and similar
From: Jonathan Cameron The calls to flatview_read/write[_continue]() have parameters addr and addr1 but the names give no indication of what they are addresses of. Rename addr1 to mr_addr to reflect that it is the translated address offset within the MemoryRegion returned by flatview_translate(). Similarly rename the parameter in address_space_read/write_cached_slow() Suggested-by: Peter Xu Signed-off-by: Jonathan Cameron Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/r/20240307153710.30907-2-jonathan.came...@huawei.com Signed-off-by: Peter Xu --- system/physmem.c | 50 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/system/physmem.c b/system/physmem.c index 6e9ed97597..e92bed50a6 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -2685,7 +2685,7 @@ static bool flatview_access_allowed(MemoryRegion *mr, MemTxAttrs attrs, static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr, MemTxAttrs attrs, const void *ptr, - hwaddr len, hwaddr addr1, + hwaddr len, hwaddr mr_addr, hwaddr l, MemoryRegion *mr) { uint8_t *ram_ptr; @@ -2695,12 +2695,12 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr, const uint8_t *buf = ptr; for (;;) { -if (!flatview_access_allowed(mr, attrs, addr1, l)) { +if (!flatview_access_allowed(mr, attrs, mr_addr, l)) { result |= MEMTX_ACCESS_ERROR; /* Keep going. */ } else if (!memory_access_is_direct(mr, true)) { release_lock |= prepare_mmio_access(mr); -l = memory_access_size(mr, l, addr1); +l = memory_access_size(mr, l, mr_addr); /* XXX: could force current_cpu to NULL to avoid potential bugs */ @@ -2715,13 +2715,13 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr, (l == 8 && len >= 8)); #endif val = ldn_he_p(buf, l); -result |= memory_region_dispatch_write(mr, addr1, val, +result |= memory_region_dispatch_write(mr, mr_addr, val, size_memop(l), attrs); } else { /* RAM case */ -ram_ptr = qemu_ram_ptr_length(mr->ram_block, addr1, , false); +ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, , false); memmove(ram_ptr, buf, l); -invalidate_and_set_dirty(mr, addr1, l); +invalidate_and_set_dirty(mr, mr_addr, l); } if (release_lock) { @@ -2738,7 +2738,7 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr, } l = len; -mr = flatview_translate(fv, addr, , , true, attrs); +mr = flatview_translate(fv, addr, _addr, , true, attrs); } return result; @@ -2749,22 +2749,22 @@ static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs, const void *buf, hwaddr len) { hwaddr l; -hwaddr addr1; +hwaddr mr_addr; MemoryRegion *mr; l = len; -mr = flatview_translate(fv, addr, , , true, attrs); +mr = flatview_translate(fv, addr, _addr, , true, attrs); if (!flatview_access_allowed(mr, attrs, addr, len)) { return MEMTX_ACCESS_ERROR; } return flatview_write_continue(fv, addr, attrs, buf, len, - addr1, l, mr); + mr_addr, l, mr); } /* Called within RCU critical section. */ MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr, MemTxAttrs attrs, void *ptr, - hwaddr len, hwaddr addr1, hwaddr l, + hwaddr len, hwaddr mr_addr, hwaddr l, MemoryRegion *mr) { uint8_t *ram_ptr; @@ -2775,14 +2775,14 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr, fuzz_dma_read_cb(addr, len, mr); for (;;) { -if (!flatview_access_allowed(mr, attrs, addr1, l)) { +if (!flatview_access_allowed(mr, attrs, mr_addr, l)) { result |= MEMTX_ACCESS_ERROR; /* Keep going. */ } else if (!memory_access_is_direct(mr, false)) { /* I/O case */ release_lock |= prepare_mmio_access(mr); -l = memory_access_size(mr, l, addr1); -result |= memory_region_dispatch_read(mr, addr1, , +l = memory_access_size(mr, l, mr_addr); +result |= memory_region_dispatch_read(mr, mr_addr, , size_memop(l), attrs); /* @@ -2798,7 +2798,7 @@ MemTxResult
[PULL 24/27] migration/multifd: Support incoming mapped-ram stream format
From: Fabiano Rosas For the incoming mapped-ram migration we need to read the ramblock headers, get the pages bitmap and send the host address of each non-zero page to the multifd channel thread for writing. Usage on HMP is: (qemu) migrate_set_capability multifd on (qemu) migrate_set_capability mapped-ram on (qemu) migrate_incoming file:migfile (the ram.h include needs to move because we've been previously relying on it being included from migration.c. Now file.h will start including multifd.h before migration.o is processed) Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-22-faro...@suse.de Signed-off-by: Peter Xu --- migration/file.h| 2 ++ migration/multifd.h | 2 ++ migration/file.c| 18 +- migration/multifd.c | 31 --- migration/ram.c | 26 -- 5 files changed, 73 insertions(+), 6 deletions(-) diff --git a/migration/file.h b/migration/file.h index 01a338cac7..9f71e87f74 100644 --- a/migration/file.h +++ b/migration/file.h @@ -11,6 +11,7 @@ #include "qapi/qapi-types-migration.h" #include "io/task.h" #include "channel.h" +#include "multifd.h" void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp); @@ -21,4 +22,5 @@ void file_cleanup_outgoing_migration(void); bool file_send_channel_create(gpointer opaque, Error **errp); int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov, int niov, RAMBlock *block, Error **errp); +int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp); #endif diff --git a/migration/multifd.h b/migration/multifd.h index db8887f088..7447c2bea3 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -13,6 +13,8 @@ #ifndef QEMU_MIGRATION_MULTIFD_H #define QEMU_MIGRATION_MULTIFD_H +#include "ram.h" + typedef struct MultiFDRecvData MultiFDRecvData; bool multifd_send_setup(void); diff --git a/migration/file.c b/migration/file.c index d949a941d0..499d2782fe 100644 --- a/migration/file.c +++ b/migration/file.c @@ -13,7 +13,6 @@ #include "channel.h" #include "file.h" #include "migration.h" -#include "multifd.h" #include "io/channel-file.h" #include "io/channel-util.h" #include "options.h" @@ -204,3 +203,20 @@ int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov, return (ret < 0) ? ret : 0; } + +int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp) +{ +MultiFDRecvData *data = p->data; +size_t ret; + +ret = qio_channel_pread(p->c, (char *) data->opaque, +data->size, data->file_offset, errp); +if (ret != data->size) { +error_prepend(errp, + "multifd recv (%u): read 0x%zx, expected 0x%zx", + p->id, ret, data->size); +return -1; +} + +return 0; +} diff --git a/migration/multifd.c b/migration/multifd.c index 8118145428..419feb7df1 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -18,7 +18,6 @@ #include "qemu/error-report.h" #include "qapi/error.h" #include "file.h" -#include "ram.h" #include "migration.h" #include "migration-stats.h" #include "socket.h" @@ -251,7 +250,7 @@ static int nocomp_recv(MultiFDRecvParams *p, Error **errp) uint32_t flags; if (!multifd_use_packets()) { -return 0; +return multifd_file_recv_data(p, errp); } flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK; @@ -1331,22 +1330,48 @@ void multifd_recv_cleanup(void) void multifd_recv_sync_main(void) { int thread_count = migrate_multifd_channels(); +bool file_based = !multifd_use_packets(); int i; -if (!migrate_multifd() || !multifd_use_packets()) { +if (!migrate_multifd()) { return; } +/* + * File-based channels don't use packets and therefore need to + * wait for more work. Release them to start the sync. + */ +if (file_based) { +for (i = 0; i < thread_count; i++) { +MultiFDRecvParams *p = _recv_state->params[i]; + +trace_multifd_recv_sync_main_signal(p->id); +qemu_sem_post(>sem); +} +} + /* * Initiate the synchronization by waiting for all channels. + * * For socket-based migration this means each channel has received * the SYNC packet on the stream. + * + * For file-based migration this means each channel is done with + * the work (pending_job=false). */ for (i = 0; i < thread_count; i++) { trace_multifd_recv_sync_main_wait(i); qemu_sem_wait(_recv_state->sem_sync); } +if (file_based) { +/* + * For file-based loading is done in one iteration. We're + * done. + */ +return; +} + /* * Sync done. Release the channels for the next iteration. */ diff --git a/migration/ram.c b/migration/ram.c index 87cb73fd76..1f1b5297cf 100644 ---
[PULL 18/27] migration/multifd: Allow receiving pages without packets
From: Fabiano Rosas Currently multifd does not need to have knowledge of pages on the receiving side because all the information needed is within the packets that come in the stream. We're about to add support to mapped-ram migration, which cannot use packets because it expects the ramblock section in the migration file to contain only the guest pages data. Add a data structure to transfer pages between the ram migration code and the multifd receiving threads. We don't want to reuse MultiFDPages_t for two reasons: a) multifd threads don't really need to know about the data they're receiving. b) the receiving side has to be stopped to load the pages, which means we can experiment with larger granularities than page size when transferring data. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-16-faro...@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 15 ++ migration/file.c| 1 + migration/multifd.c | 129 +--- 3 files changed, 138 insertions(+), 7 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 6a54377cc1..1be985978e 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -13,6 +13,8 @@ #ifndef QEMU_MIGRATION_MULTIFD_H #define QEMU_MIGRATION_MULTIFD_H +typedef struct MultiFDRecvData MultiFDRecvData; + bool multifd_send_setup(void); void multifd_send_shutdown(void); int multifd_recv_setup(Error **errp); @@ -23,6 +25,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp); void multifd_recv_sync_main(void); int multifd_send_sync_main(void); bool multifd_queue_page(RAMBlock *block, ram_addr_t offset); +bool multifd_recv(void); +MultiFDRecvData *multifd_get_recv_data(void); /* Multifd Compression flags */ #define MULTIFD_FLAG_SYNC (1 << 0) @@ -63,6 +67,13 @@ typedef struct { RAMBlock *block; } MultiFDPages_t; +struct MultiFDRecvData { +void *opaque; +size_t size; +/* for preadv */ +off_t file_offset; +}; + typedef struct { /* Fields are only written at creating/deletion time */ /* No lock required for them, they are read only */ @@ -152,6 +163,8 @@ typedef struct { /* syncs main thread and channels */ QemuSemaphore sem_sync; +/* sem where to wait for more work */ +QemuSemaphore sem; /* this mutex protects the following parameters */ QemuMutex mutex; @@ -161,6 +174,8 @@ typedef struct { uint32_t flags; /* global number of generated multifd packets */ uint64_t packet_num; +int pending_job; +MultiFDRecvData *data; /* thread local variables. No locking required */ diff --git a/migration/file.c b/migration/file.c index 5d4975f43e..22d052a71f 100644 --- a/migration/file.c +++ b/migration/file.c @@ -6,6 +6,7 @@ */ #include "qemu/osdep.h" +#include "exec/ramblock.h" #include "qemu/cutils.h" #include "qapi/error.h" #include "channel.h" diff --git a/migration/multifd.c b/migration/multifd.c index 8c43424c81..d470af73ba 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -81,9 +81,13 @@ struct { struct { MultiFDRecvParams *params; +MultiFDRecvData *data; /* number of created threads */ int count; -/* syncs main thread and channels */ +/* + * This is always posted by the recv threads, the migration thread + * uses it to wait for recv threads to finish assigned tasks. + */ QemuSemaphore sem_sync; /* global number of generated multifd packets */ uint64_t packet_num; @@ -1119,6 +1123,57 @@ bool multifd_send_setup(void) return true; } +bool multifd_recv(void) +{ +int i; +static int next_recv_channel; +MultiFDRecvParams *p = NULL; +MultiFDRecvData *data = multifd_recv_state->data; + +/* + * next_channel can remain from a previous migration that was + * using more channels, so ensure it doesn't overflow if the + * limit is lower now. + */ +next_recv_channel %= migrate_multifd_channels(); +for (i = next_recv_channel;; i = (i + 1) % migrate_multifd_channels()) { +if (multifd_recv_should_exit()) { +return false; +} + +p = _recv_state->params[i]; + +if (qatomic_read(>pending_job) == false) { +next_recv_channel = (i + 1) % migrate_multifd_channels(); +break; +} +} + +/* + * Order pending_job read before manipulating p->data below. Pairs + * with qatomic_store_release() at multifd_recv_thread(). + */ +smp_mb_acquire(); + +assert(!p->data->size); +multifd_recv_state->data = p->data; +p->data = data; + +/* + * Order p->data update before setting pending_job. Pairs with + * qatomic_load_acquire() at multifd_recv_thread(). + */ +qatomic_store_release(>pending_job, true); +qemu_sem_post(>sem); + +return true; +} + +MultiFDRecvData *multifd_get_recv_data(void) +{ +return
[PULL 07/27] io: implement io_pwritev/preadv for QIOChannelFile
From: Nikolay Borisov The upcoming 'mapped-ram' feature will require qemu to write data to (and restore from) specific offsets of the migration file. Add a minimal implementation of pwritev/preadv and expose them via the io_pwritev and io_preadv interfaces. Signed-off-by: Nikolay Borisov Reviewed-by: "Daniel P. Berrangé" Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-5-faro...@suse.de Signed-off-by: Peter Xu --- io/channel-file.c | 56 +++ 1 file changed, 56 insertions(+) diff --git a/io/channel-file.c b/io/channel-file.c index f91bf6db1c..a6ad7770c6 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -146,6 +146,58 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc, return ret; } +#ifdef CONFIG_PREADV +static ssize_t qio_channel_file_preadv(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + off_t offset, + Error **errp) +{ +QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); +ssize_t ret; + + retry: +ret = preadv(fioc->fd, iov, niov, offset); +if (ret < 0) { +if (errno == EAGAIN) { +return QIO_CHANNEL_ERR_BLOCK; +} +if (errno == EINTR) { +goto retry; +} + +error_setg_errno(errp, errno, "Unable to read from file"); +return -1; +} + +return ret; +} + +static ssize_t qio_channel_file_pwritev(QIOChannel *ioc, +const struct iovec *iov, +size_t niov, +off_t offset, +Error **errp) +{ +QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); +ssize_t ret; + + retry: +ret = pwritev(fioc->fd, iov, niov, offset); +if (ret <= 0) { +if (errno == EAGAIN) { +return QIO_CHANNEL_ERR_BLOCK; +} +if (errno == EINTR) { +goto retry; +} +error_setg_errno(errp, errno, "Unable to write to file"); +return -1; +} +return ret; +} +#endif /* CONFIG_PREADV */ + static int qio_channel_file_set_blocking(QIOChannel *ioc, bool enabled, Error **errp) @@ -231,6 +283,10 @@ static void qio_channel_file_class_init(ObjectClass *klass, ioc_klass->io_writev = qio_channel_file_writev; ioc_klass->io_readv = qio_channel_file_readv; ioc_klass->io_set_blocking = qio_channel_file_set_blocking; +#ifdef CONFIG_PREADV +ioc_klass->io_pwritev = qio_channel_file_pwritev; +ioc_klass->io_preadv = qio_channel_file_preadv; +#endif ioc_klass->io_seek = qio_channel_file_seek; ioc_klass->io_close = qio_channel_file_close; ioc_klass->io_create_watch = qio_channel_file_create_watch; -- 2.44.0
[PULL 12/27] migration/ram: Add outgoing 'mapped-ram' migration
From: Fabiano Rosas Implement the outgoing migration side for the 'mapped-ram' capability. A bitmap is introduced to track which pages have been written in the migration file. Pages are written at a fixed location for every ramblock. Zero pages are ignored as they'd be zero in the destination migration as well. The migration stream is altered to put the dirty pages for a ramblock after its header instead of having a sequential stream of pages that follow the ramblock headers. Without mapped-ram (current):With mapped-ram (new): - | ramblock 1 header | | ramblock 1 header| - | ramblock 2 header | | ramblock 1 mapped-ram header | - | ... | | padding to next 1MB boundary | - | ... | | ramblock n header | - | ramblock 1 pages | | RAM_SAVE_FLAG_EOS | | ... | - | stream of pages | | ramblock 2 header| | (iter 1) | | ... | | ramblock 2 mapped-ram header | - | RAM_SAVE_FLAG_EOS | | padding to next 1MB boundary | - | ... | | stream of pages | | (iter 2) | | ramblock 2 pages | | ... | | ... | - | ... | | ... | - | RAM_SAVE_FLAG_EOS| | ... | where: - ramblock header: the generic information for a ramblock, such as idstr, used_len, etc. - ramblock mapped-ram header: the new information added by this feature: bitmap of pages written, bitmap size and offset of pages in the migration file. Signed-off-by: Nikolay Borisov Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-10-faro...@suse.de Signed-off-by: Peter Xu --- include/exec/ramblock.h | 13 migration/ram.c | 131 +--- 2 files changed, 135 insertions(+), 9 deletions(-) diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h index 3eb79723c6..848915ea5b 100644 --- a/include/exec/ramblock.h +++ b/include/exec/ramblock.h @@ -44,6 +44,19 @@ struct RAMBlock { size_t page_size; /* dirty bitmap used during migration */ unsigned long *bmap; + +/* + * Below fields are only used by mapped-ram migration + */ +/* bitmap of pages present in the migration file */ +unsigned long *file_bmap; +/* + * offset in the file pages belonging to this ramblock are saved, + * used only during migration to a file. + */ +off_t bitmap_offset; +uint64_t pages_offset; + /* bitmap of already received pages in postcopy */ unsigned long *receivedmap; diff --git a/migration/ram.c b/migration/ram.c index 45a00b45ed..f807824d49 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -94,6 +94,18 @@ #define RAM_SAVE_FLAG_MULTIFD_FLUSH0x200 /* We can't use any flag that is bigger than 0x200 */ +/* + * mapped-ram migration supports O_DIRECT, so we need to make sure the + * userspace buffer, the IO operation size and the file offset are + * aligned according to the underlying device's block size. The first + * two are already aligned to page size, but we need to add padding to + * the file to align the offset. We cannot read the block size + * dynamically because the migration file can be moved between + * different systems, so use 1M to cover most block sizes and to keep + * the file offset aligned at page size as well. + */ +#define MAPPED_RAM_FILE_OFFSET_ALIGNMENT 0x10 + XBZRLECacheStats xbzrle_counters; /* used by the search for pages to send */ @@ -1126,12 +1138,18 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss, return 0; } +stat64_add(_stats.zero_pages, 1); + +if (migrate_mapped_ram()) { +/* zero pages are not transferred with mapped-ram */ +clear_bit(offset >>
[PULL 03/27] tests/migration: Set compression level in migration tests
From: Bryan Zhang Adds calls to set compression level for `zstd` and `zlib` migration tests, just to make sure that the calls work. Signed-off-by: Bryan Zhang Link: https://lore.kernel.org/r/20240301035901.4006936-3-bryan.zh...@bytedance.com Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 83512bce85..8c35f3457b 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -2664,6 +2664,13 @@ static void * test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from, QTestState *to) { +/* + * Overloading this test to also check that set_parameter does not error. + * This is also done in the tests for the other compression methods. + */ +migrate_set_parameter_int(from, "multifd-zlib-level", 2); +migrate_set_parameter_int(to, "multifd-zlib-level", 2); + return test_migrate_precopy_tcp_multifd_start_common(from, to, "zlib"); } @@ -2672,6 +2679,9 @@ static void * test_migrate_precopy_tcp_multifd_zstd_start(QTestState *from, QTestState *to) { +migrate_set_parameter_int(from, "multifd-zstd-level", 2); +migrate_set_parameter_int(to, "multifd-zstd-level", 2); + return test_migrate_precopy_tcp_multifd_start_common(from, to, "zstd"); } #endif /* CONFIG_ZSTD */ -- 2.44.0
[PULL 13/27] migration/ram: Add incoming 'mapped-ram' migration
From: Fabiano Rosas Add the necessary code to parse the format changes for the 'mapped-ram' capability. One of the more notable changes in behavior is that in the 'mapped-ram' case ram pages are restored in one go rather than constantly looping through the migration stream. Signed-off-by: Nikolay Borisov Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-11-faro...@suse.de Signed-off-by: Peter Xu --- migration/ram.c | 143 +++- 1 file changed, 141 insertions(+), 2 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index f807824d49..18620784c6 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -106,6 +106,12 @@ */ #define MAPPED_RAM_FILE_OFFSET_ALIGNMENT 0x10 +/* + * When doing mapped-ram migration, this is the amount we read from + * the pages region in the migration file at a time. + */ +#define MAPPED_RAM_LOAD_BUF_SIZE 0x10 + XBZRLECacheStats xbzrle_counters; /* used by the search for pages to send */ @@ -2998,6 +3004,35 @@ static void mapped_ram_setup_ramblock(QEMUFile *file, RAMBlock *block) qemu_set_offset(file, block->pages_offset + block->used_length, SEEK_SET); } +static bool mapped_ram_read_header(QEMUFile *file, MappedRamHeader *header, + Error **errp) +{ +size_t ret, header_size = sizeof(MappedRamHeader); + +ret = qemu_get_buffer(file, (uint8_t *)header, header_size); +if (ret != header_size) { +error_setg(errp, "Could not read whole mapped-ram migration header " + "(expected %zd, got %zd bytes)", header_size, ret); +return false; +} + +/* migration stream is big-endian */ +header->version = be32_to_cpu(header->version); + +if (header->version > MAPPED_RAM_HDR_VERSION) { +error_setg(errp, "Migration mapped-ram capability version not " + "supported (expected <= %d, got %d)", MAPPED_RAM_HDR_VERSION, + header->version); +return false; +} + +header->page_size = be64_to_cpu(header->page_size); +header->bitmap_offset = be64_to_cpu(header->bitmap_offset); +header->pages_offset = be64_to_cpu(header->pages_offset); + +return true; +} + /* * Each of ram_save_setup, ram_save_iterate and ram_save_complete has * long-running RCU critical section. When rcu-reclaims in the code @@ -3899,22 +3934,126 @@ void colo_flush_ram_cache(void) trace_colo_flush_ram_cache_end(); } +static bool read_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block, + long num_pages, unsigned long *bitmap, + Error **errp) +{ +ERRP_GUARD(); +unsigned long set_bit_idx, clear_bit_idx; +ram_addr_t offset; +void *host; +size_t read, unread, size; + +for (set_bit_idx = find_first_bit(bitmap, num_pages); + set_bit_idx < num_pages; + set_bit_idx = find_next_bit(bitmap, num_pages, clear_bit_idx + 1)) { + +clear_bit_idx = find_next_zero_bit(bitmap, num_pages, set_bit_idx + 1); + +unread = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx); +offset = set_bit_idx << TARGET_PAGE_BITS; + +while (unread > 0) { +host = host_from_ram_block_offset(block, offset); +if (!host) { +error_setg(errp, "page outside of ramblock %s range", + block->idstr); +return false; +} + +size = MIN(unread, MAPPED_RAM_LOAD_BUF_SIZE); + +read = qemu_get_buffer_at(f, host, size, + block->pages_offset + offset); +if (!read) { +goto err; +} +offset += read; +unread -= read; +} +} + +return true; + +err: +qemu_file_get_error_obj(f, errp); +error_prepend(errp, "(%s) failed to read page " RAM_ADDR_FMT + "from file offset %" PRIx64 ": ", block->idstr, offset, + block->pages_offset + offset); +return false; +} + +static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block, + ram_addr_t length, Error **errp) +{ +g_autofree unsigned long *bitmap = NULL; +MappedRamHeader header; +size_t bitmap_size; +long num_pages; + +if (!mapped_ram_read_header(f, , errp)) { +return; +} + +block->pages_offset = header.pages_offset; + +/* + * Check the alignment of the file region that contains pages. We + * don't enforce MAPPED_RAM_FILE_OFFSET_ALIGNMENT to allow that + * value to change in the future. Do only a sanity check with page + * size alignment. + */ +if (!QEMU_IS_ALIGNED(block->pages_offset, TARGET_PAGE_SIZE)) { +error_setg(errp, + "Error reading ramblock %s pages, region has bad alignment", + block->idstr); +
[PULL 21/27] migration/multifd: Add incoming QIOChannelFile support
From: Fabiano Rosas On the receiving side we don't need to differentiate between main channel and threads, so whichever channel is defined first gets to be the main one. And since there are no packets, use the atomic channel count to index into the params array. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-19-faro...@suse.de Signed-off-by: Peter Xu --- migration/file.c | 35 +++ migration/migration.c | 3 ++- migration/multifd.c | 3 +-- 3 files changed, 30 insertions(+), 11 deletions(-) diff --git a/migration/file.c b/migration/file.c index a350dd61f0..2f8b626b27 100644 --- a/migration/file.c +++ b/migration/file.c @@ -8,6 +8,7 @@ #include "qemu/osdep.h" #include "exec/ramblock.h" #include "qemu/cutils.h" +#include "qemu/error-report.h" #include "qapi/error.h" #include "channel.h" #include "file.h" @@ -15,6 +16,7 @@ #include "multifd.h" #include "io/channel-file.h" #include "io/channel-util.h" +#include "options.h" #include "trace.h" #define OFFSET_OPTION ",offset=" @@ -112,7 +114,8 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp) g_autofree char *filename = g_strdup(file_args->filename); QIOChannelFile *fioc = NULL; uint64_t offset = file_args->offset; -QIOChannel *ioc; +int channels = 1; +int i = 0; trace_migration_file_incoming(filename); @@ -121,13 +124,29 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp) return; } -ioc = QIO_CHANNEL(fioc); -if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) { +if (offset && +qio_channel_io_seek(QIO_CHANNEL(fioc), offset, SEEK_SET, errp) < 0) { return; } -qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming"); -qio_channel_add_watch_full(ioc, G_IO_IN, - file_accept_incoming_migration, - NULL, NULL, - g_main_context_get_thread_default()); + +if (migrate_multifd()) { +channels += migrate_multifd_channels(); +} + +do { +QIOChannel *ioc = QIO_CHANNEL(fioc); + +qio_channel_set_name(ioc, "migration-file-incoming"); +qio_channel_add_watch_full(ioc, G_IO_IN, + file_accept_incoming_migration, + NULL, NULL, + g_main_context_get_thread_default()); + +fioc = qio_channel_file_new_fd(dup(fioc->fd)); + +if (!fioc || fioc->fd == -1) { +error_setg(errp, "Error creating migration incoming channel"); +break; +} +} while (++i < channels); } diff --git a/migration/migration.c b/migration/migration.c index 2669600d25..faeb75a59b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -910,7 +910,8 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) uint32_t channel_magic = 0; int ret = 0; -if (migrate_multifd() && !migrate_postcopy_ram() && +if (migrate_multifd() && !migrate_mapped_ram() && +!migrate_postcopy_ram() && qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { /* * With multiple channels, it is possible that we receive channels diff --git a/migration/multifd.c b/migration/multifd.c index caef1076ca..ea08f1aa9e 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -1545,8 +1545,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp) } trace_multifd_recv_new_channel(id); } else { -/* next patch gives this a meaningful value */ -id = 0; +id = qatomic_read(_recv_state->count); } p = _recv_state->params[id]; -- 2.44.0
[PULL 27/27] migration/multifd: Document two places for mapped-ram
From: Peter Xu Add two documentations for mapped-ram migration on two spots that may not be extremely clear. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240301091524.39900-1-pet...@redhat.com Cc: Prasad Pandit [peterx: fix two English errors per Prasad] Signed-off-by: Peter Xu --- migration/multifd.c | 12 migration/ram.c | 8 +++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index b4e5a9dfcc..d4a44da559 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -709,6 +709,18 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) { if (p->c) { migration_ioc_unregister_yank(p->c); +/* + * An explicit close() on the channel here is normally not + * required, but can be helpful for "file:" iochannels, where it + * will include fdatasync() to make sure the data is flushed to the + * disk backend. + * + * The object_unref() cannot guarantee that because: (1) finalize() + * of the iochannel is only triggered on the last reference, and + * it's not guaranteed that we always hold the last refcount when + * reaching here, and, (2) even if finalize() is invoked, it only + * does a close(fd) without data flush. + */ qio_channel_close(p->c, _abort); object_unref(OBJECT(p->c)); p->c = NULL; diff --git a/migration/ram.c b/migration/ram.c index 1f1b5297cf..c79e3de521 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4258,7 +4258,13 @@ static int ram_load_precopy(QEMUFile *f) switch (flags & ~RAM_SAVE_FLAG_CONTINUE) { case RAM_SAVE_FLAG_MEM_SIZE: ret = parse_ramblocks(f, addr); - +/* + * For mapped-ram migration (to a file) using multifd, we sync + * once and for all here to make sure all tasks we queued to + * multifd threads are completed, so that all the ramblocks + * (including all the guest memory pages within) are fully + * loaded after this sync returns. + */ if (migrate_mapped_ram()) { multifd_recv_sync_main(); } -- 2.44.0
[PULL 22/27] migration/multifd: Prepare multifd sync for mapped-ram migration
From: Fabiano Rosas The mapped-ram migration can be performed live or non-live, but it is always asynchronous, i.e. the source machine and the destination machine are not migrating at the same time. We only need some pieces of the multifd sync operations. multifd_send_sync_main() Issued by the ram migration code on the migration thread, causes the multifd send channels to synchronize with the migration thread and makes the sending side emit a packet with the MULTIFD_FLUSH flag. With mapped-ram we want to maintain the sync on the sending side because that provides ordering between the rounds of dirty pages when migrating live. MULTIFD_FLUSH - On the receiving side, the presence of the MULTIFD_FLUSH flag on a packet causes the receiving channels to start synchronizing with the main thread. We're not using packets with mapped-ram, so there's no MULTIFD_FLUSH flag and therefore no channel sync on the receiving side. multifd_recv_sync_main() Issued by the migration thread when the ram migration flag RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread on the receiving side to start synchronizing with the recv channels. Due to compatibility, this is also issued when RAM_SAVE_FLAG_EOS is received. For mapped-ram we only need to synchronize the channels at the end of migration to avoid doing cleanup before the channels have finished their IO. Make sure the multifd syncs are only issued at the appropriate times. Note that due to pre-existing backward compatibility issues, we have the multifd_flush_after_each_section property that can cause a sync to happen at EOS. Since the EOS flag is needed on the stream, allow mapped-ram to just ignore it. Also emit an error if any other unexpected flags are found on the stream. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240229153017.2221-20-faro...@suse.de Signed-off-by: Peter Xu --- migration/ram.c | 38 +++--- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 18620784c6..329153d97d 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1362,14 +1362,18 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) pss->block = QLIST_NEXT_RCU(pss->block, next); if (!pss->block) { if (migrate_multifd() && -!migrate_multifd_flush_after_each_section()) { +(!migrate_multifd_flush_after_each_section() || + migrate_mapped_ram())) { QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; int ret = multifd_send_sync_main(); if (ret < 0) { return ret; } -qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); -qemu_fflush(f); + +if (!migrate_mapped_ram()) { +qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); +qemu_fflush(f); +} } /* * If memory migration starts over, we will meet a dirtied page @@ -3111,7 +3115,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque) return ret; } -if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) { +if (migrate_multifd() && !migrate_multifd_flush_after_each_section() +&& !migrate_mapped_ram()) { qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); } @@ -3242,7 +3247,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) out: if (ret >= 0 && migration_is_setup_or_active(migrate_get_current()->state)) { -if (migrate_multifd() && migrate_multifd_flush_after_each_section()) { +if (migrate_multifd() && migrate_multifd_flush_after_each_section() && +!migrate_mapped_ram()) { ret = multifd_send_sync_main(); if (ret < 0) { return ret; @@ -3334,7 +3340,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque) } } -if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) { +if (migrate_multifd() && !migrate_multifd_flush_after_each_section() && +!migrate_mapped_ram()) { qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); } qemu_put_be64(f, RAM_SAVE_FLAG_EOS); @@ -4137,6 +4144,12 @@ static int ram_load_precopy(QEMUFile *f) invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE; } +if (migrate_mapped_ram()) { +invalid_flags |= (RAM_SAVE_FLAG_HOOK | RAM_SAVE_FLAG_MULTIFD_FLUSH | + RAM_SAVE_FLAG_PAGE | RAM_SAVE_FLAG_XBZRLE | + RAM_SAVE_FLAG_ZERO); +} + while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { ram_addr_t addr; void *host = NULL, *host_bak = NULL; @@ -4158,6 +4171,8 @@ static int
[PULL 23/27] migration/multifd: Support outgoing mapped-ram stream format
From: Fabiano Rosas The new mapped-ram stream format uses a file transport and puts ram pages in the migration file at their respective offsets and can be done in parallel by using the pwritev system call which takes iovecs and an offset. Add support to enabling the new format along with multifd to make use of the threading and page handling already in place. This requires multifd to stop sending headers and leaving the stream format to the mapped-ram code. When it comes time to write the data, we need to call a version of qio_channel_write that can take an offset. Usage on HMP is: (qemu) stop (qemu) migrate_set_capability multifd on (qemu) migrate_set_capability mapped-ram on (qemu) migrate_set_parameter max-bandwidth 0 (qemu) migrate_set_parameter multifd-channels 8 (qemu) migrate file:migfile Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-21-faro...@suse.de Signed-off-by: Peter Xu --- include/qemu/bitops.h | 13 +++ migration/file.h | 2 ++ migration/ram.h | 1 + migration/file.c | 54 +++ migration/migration.c | 17 ++ migration/multifd.c | 24 +-- migration/options.c | 13 ++- migration/ram.c | 17 +++--- 8 files changed, 125 insertions(+), 16 deletions(-) diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h index cb3526d1f4..2c0a2fe751 100644 --- a/include/qemu/bitops.h +++ b/include/qemu/bitops.h @@ -67,6 +67,19 @@ static inline void clear_bit(long nr, unsigned long *addr) *p &= ~mask; } +/** + * clear_bit_atomic - Clears a bit in memory atomically + * @nr: Bit to clear + * @addr: Address to start counting from + */ +static inline void clear_bit_atomic(long nr, unsigned long *addr) +{ +unsigned long mask = BIT_MASK(nr); +unsigned long *p = addr + BIT_WORD(nr); + +return qatomic_and(p, ~mask); +} + /** * change_bit - Toggle a bit in memory * @nr: Bit to change diff --git a/migration/file.h b/migration/file.h index 4577f9efdd..01a338cac7 100644 --- a/migration/file.h +++ b/migration/file.h @@ -19,4 +19,6 @@ void file_start_outgoing_migration(MigrationState *s, int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp); void file_cleanup_outgoing_migration(void); bool file_send_channel_create(gpointer opaque, Error **errp); +int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov, +int niov, RAMBlock *block, Error **errp); #endif diff --git a/migration/ram.h b/migration/ram.h index 9b937a446b..b9ac0da587 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -75,6 +75,7 @@ bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp); bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); void postcopy_preempt_shutdown_file(MigrationState *s); void *postcopy_preempt_thread(void *opaque); +void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset); /* ram cache */ int colo_init_ram_cache(void); diff --git a/migration/file.c b/migration/file.c index 2f8b626b27..d949a941d0 100644 --- a/migration/file.c +++ b/migration/file.c @@ -150,3 +150,57 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp) } } while (++i < channels); } + +int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov, +int niov, RAMBlock *block, Error **errp) +{ +ssize_t ret = -1; +int i, slice_idx, slice_num; +uintptr_t base, next, offset; +size_t len; + +slice_idx = 0; +slice_num = 1; + +/* + * If the iov array doesn't have contiguous elements, we need to + * split it in slices because we only have one file offset for the + * whole iov. Do this here so callers don't need to break the iov + * array themselves. + */ +for (i = 0; i < niov; i++, slice_num++) { +base = (uintptr_t) iov[i].iov_base; + +if (i != niov - 1) { +len = iov[i].iov_len; +next = (uintptr_t) iov[i + 1].iov_base; + +if (base + len == next) { +continue; +} +} + +/* + * Use the offset of the first element of the segment that + * we're sending. + */ +offset = (uintptr_t) iov[slice_idx].iov_base - (uintptr_t) block->host; +if (offset >= block->used_length) { +error_setg(errp, "offset " RAM_ADDR_FMT + "outside of ramblock %s range", offset, block->idstr); +ret = -1; +break; +} + +ret = qio_channel_pwritev(ioc, [slice_idx], slice_num, + block->pages_offset + offset, errp); +if (ret < 0) { +break; +} + +slice_idx += slice_num; +slice_num = 0; +} + +return (ret < 0) ? ret : 0; +} diff --git a/migration/migration.c b/migration/migration.c
[PULL 25/27] migration/multifd: Add mapped-ram support to fd: URI
From: Fabiano Rosas If we receive a file descriptor that points to a regular file, there's nothing stopping us from doing multifd migration with mapped-ram to that file. Enable the fd: URI to work with multifd + mapped-ram. Note that the fds passed into multifd are duplicated because we want to avoid cross-thread effects when doing cleanup (i.e. close(fd)). The original fd doesn't need to be duplicated because monitor_get_fd() transfers ownership to the caller. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240229153017.2221-23-faro...@suse.de Signed-off-by: Peter Xu --- migration/fd.h| 2 ++ migration/fd.c| 44 +++ migration/file.c | 18 -- migration/migration.c | 4 migration/multifd.c | 2 ++ 5 files changed, 64 insertions(+), 6 deletions(-) diff --git a/migration/fd.h b/migration/fd.h index b901bc014e..0c0a18d9e7 100644 --- a/migration/fd.h +++ b/migration/fd.h @@ -20,4 +20,6 @@ void fd_start_incoming_migration(const char *fdname, Error **errp); void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp); +void fd_cleanup_outgoing_migration(void); +int fd_args_get_fd(void); #endif diff --git a/migration/fd.c b/migration/fd.c index 0eb677dcae..d4ae72d132 100644 --- a/migration/fd.c +++ b/migration/fd.c @@ -15,18 +15,41 @@ */ #include "qemu/osdep.h" +#include "qapi/error.h" #include "channel.h" #include "fd.h" #include "migration.h" #include "monitor/monitor.h" +#include "io/channel-file.h" #include "io/channel-util.h" +#include "options.h" #include "trace.h" +static struct FdOutgoingArgs { +int fd; +} outgoing_args; + +int fd_args_get_fd(void) +{ +return outgoing_args.fd; +} + +void fd_cleanup_outgoing_migration(void) +{ +if (outgoing_args.fd > 0) { +close(outgoing_args.fd); +outgoing_args.fd = -1; +} +} + void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) { QIOChannel *ioc; int fd = monitor_get_fd(monitor_cur(), fdname, errp); + +outgoing_args.fd = -1; + if (fd == -1) { return; } @@ -38,6 +61,8 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error ** return; } +outgoing_args.fd = fd; + qio_channel_set_name(ioc, "migration-fd-outgoing"); migration_channel_connect(s, ioc, NULL, NULL); object_unref(OBJECT(ioc)); @@ -73,4 +98,23 @@ void fd_start_incoming_migration(const char *fdname, Error **errp) fd_accept_incoming_migration, NULL, NULL, g_main_context_get_thread_default()); + +if (migrate_multifd()) { +int channels = migrate_multifd_channels(); + +while (channels--) { +ioc = QIO_CHANNEL(qio_channel_file_new_fd(dup(fd))); + +if (QIO_CHANNEL_FILE(ioc)->fd == -1) { +error_setg(errp, "Failed to duplicate fd %d", fd); +return; +} + +qio_channel_set_name(ioc, "migration-fd-incoming"); +qio_channel_add_watch_full(ioc, G_IO_IN, + fd_accept_incoming_migration, + NULL, NULL, + g_main_context_get_thread_default()); +} +} } diff --git a/migration/file.c b/migration/file.c index 499d2782fe..164b079966 100644 --- a/migration/file.c +++ b/migration/file.c @@ -11,6 +11,7 @@ #include "qemu/error-report.h" #include "qapi/error.h" #include "channel.h" +#include "fd.h" #include "file.h" #include "migration.h" #include "io/channel-file.h" @@ -53,15 +54,20 @@ bool file_send_channel_create(gpointer opaque, Error **errp) { QIOChannelFile *ioc; int flags = O_WRONLY; -bool ret = true; - -ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp); -if (!ioc) { -ret = false; -goto out; +bool ret = false; +int fd = fd_args_get_fd(); + +if (fd && fd != -1) { +ioc = qio_channel_file_new_fd(dup(fd)); +} else { +ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp); +if (!ioc) { +goto out; +} } multifd_channel_connect(opaque, QIO_CHANNEL(ioc)); +ret = true; out: /* diff --git a/migration/migration.c b/migration/migration.c index b9baab543a..a49fcd53ee 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -140,6 +140,10 @@ static bool transport_supports_multi_channels(MigrationAddress *addr) if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) { SocketAddress *saddr = >u.socket; +if (saddr->type == SOCKET_ADDRESS_TYPE_FD) { +return migrate_mapped_ram(); +} + return (saddr->type == SOCKET_ADDRESS_TYPE_INET ||
[PULL 10/27] migration/ram: Introduce 'mapped-ram' migration capability
From: Fabiano Rosas Add a new migration capability 'mapped-ram'. The core of the feature is to ensure that RAM pages are mapped directly to offsets in the resulting migration file instead of being streamed at arbitrary points. The reasons why we'd want such behavior are: - The resulting file will have a bounded size, since pages which are dirtied multiple times will always go to a fixed location in the file, rather than constantly being added to a sequential stream. This eliminates cases where a VM with, say, 1G of RAM can result in a migration file that's 10s of GBs, provided that the workload constantly redirties memory. - It paves the way to implement O_DIRECT-enabled save/restore of the migration stream as the pages are ensured to be written at aligned offsets. - It allows the usage of multifd so we can write RAM pages to the migration file in parallel. For now, enabling the capability has no effect. The next couple of patches implement the core functionality. Acked-by: Markus Armbruster Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-8-faro...@suse.de Signed-off-by: Peter Xu --- docs/devel/migration/features.rst | 1 + docs/devel/migration/mapped-ram.rst | 138 qapi/migration.json | 6 +- migration/options.h | 1 + migration/migration.c | 7 ++ migration/options.c | 34 +++ migration/savevm.c | 1 + 7 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 docs/devel/migration/mapped-ram.rst diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst index a9acaf618e..9d1abd2587 100644 --- a/docs/devel/migration/features.rst +++ b/docs/devel/migration/features.rst @@ -10,3 +10,4 @@ Migration has plenty of features to support different use cases. dirty-limit vfio virtio + mapped-ram diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst new file mode 100644 index 00..fa4cefd9fc --- /dev/null +++ b/docs/devel/migration/mapped-ram.rst @@ -0,0 +1,138 @@ +Mapped-ram +== + +Mapped-ram is a new stream format for the RAM section designed to +supplement the existing ``file:`` migration and make it compatible +with ``multifd``. This enables parallel migration of a guest's RAM to +a file. + +The core of the feature is to ensure that RAM pages are mapped +directly to offsets in the resulting migration file. This enables the +``multifd`` threads to write exclusively to those offsets even if the +guest is constantly dirtying pages (i.e. live migration). Another +benefit is that the resulting file will have a bounded size, since +pages which are dirtied multiple times will always go to a fixed +location in the file, rather than constantly being added to a +sequential stream. Having the pages at fixed offsets also allows the +usage of O_DIRECT for save/restore of the migration stream as the +pages are ensured to be written respecting O_DIRECT alignment +restrictions (direct-io support not yet implemented). + +Usage +- + +On both source and destination, enable the ``multifd`` and +``mapped-ram`` capabilities: + +``migrate_set_capability multifd on`` + +``migrate_set_capability mapped-ram on`` + +Use a ``file:`` URL for migration: + +``migrate file:/path/to/migration/file`` + +Mapped-ram migration is best done non-live, i.e. by stopping the VM on +the source side before migrating. + +Use-cases +- + +The mapped-ram feature was designed for use cases where the migration +stream will be directed to a file in the filesystem and not +immediately restored on the destination VM [#]_. These could be +thought of as snapshots. We can further categorize them into live and +non-live. + +- Non-live snapshot + +If the use case requires a VM to be stopped before taking a snapshot, +that's the ideal scenario for mapped-ram migration. Not having to +track dirty pages, the migration will write the RAM pages to the disk +as fast as it can. + +Note: if a snapshot is taken of a running VM, but the VM will be +stopped after the snapshot by the admin, then consider stopping it +right before the snapshot to take benefit of the performance gains +mentioned above. + +- Live snapshot + +If the use case requires that the VM keeps running during and after +the snapshot operation, then mapped-ram migration can still be used, +but will be less performant. Other strategies such as +background-snapshot should be evaluated as well. One benefit of +mapped-ram in this scenario is portability since background-snapshot +depends on async dirty tracking (KVM_GET_DIRTY_LOG) which is not +supported outside of Linux. + +.. [#] While this same effect could be obtained with the usage of + snapshots or the ``file:`` migration alone, mapped-ram provides + a performance increase for VMs with larger RAM sizes (10s
[PULL 17/27] migration/multifd: Allow multifd without packets
From: Fabiano Rosas For the upcoming support to the new 'mapped-ram' migration stream format, we cannot use multifd packets because each write into the ramblock section in the migration file is expected to contain only the guest pages. They are written at their respective offsets relative to the ramblock section header. There is no space for the packet information and the expected gains from the new approach come partly from being able to write the pages sequentially without extraneous data in between. The new format also simply doesn't need the packets and all necessary information can be taken from the standard migration headers with some (future) changes to multifd code. Use the presence of the mapped-ram capability to decide whether to send packets. This only moves code under multifd_use_packets(), it has no effect for now as mapped-ram cannot yet be enabled with multifd. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-15-faro...@suse.de Signed-off-by: Peter Xu --- migration/multifd.c | 175 +--- 1 file changed, 114 insertions(+), 61 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 3a8520097b..8c43424c81 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -92,6 +92,11 @@ struct { MultiFDMethods *ops; } *multifd_recv_state; +static bool multifd_use_packets(void) +{ +return !migrate_mapped_ram(); +} + /* Multifd without compression */ /** @@ -122,6 +127,19 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp) return; } +static void multifd_send_prepare_iovs(MultiFDSendParams *p) +{ +MultiFDPages_t *pages = p->pages; + +for (int i = 0; i < pages->num; i++) { +p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i]; +p->iov[p->iovs_num].iov_len = p->page_size; +p->iovs_num++; +} + +p->next_packet_size = pages->num * p->page_size; +} + /** * nocomp_send_prepare: prepare date to be able to send * @@ -136,9 +154,13 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp) static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) { bool use_zero_copy_send = migrate_zero_copy_send(); -MultiFDPages_t *pages = p->pages; int ret; +if (!multifd_use_packets()) { +multifd_send_prepare_iovs(p); +return 0; +} + if (!use_zero_copy_send) { /* * Only !zerocopy needs the header in IOV; zerocopy will @@ -147,13 +169,7 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) multifd_send_prepare_header(p); } -for (int i = 0; i < pages->num; i++) { -p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i]; -p->iov[p->iovs_num].iov_len = p->page_size; -p->iovs_num++; -} - -p->next_packet_size = pages->num * p->page_size; +multifd_send_prepare_iovs(p); p->flags |= MULTIFD_FLAG_NOCOMP; multifd_send_fill_packet(p); @@ -208,7 +224,13 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p) */ static int nocomp_recv(MultiFDRecvParams *p, Error **errp) { -uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK; +uint32_t flags; + +if (!multifd_use_packets()) { +return 0; +} + +flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK; if (flags != MULTIFD_FLAG_NOCOMP) { error_setg(errp, "multifd %u: flags received %x flags expected %x", @@ -795,15 +817,18 @@ static void *multifd_send_thread(void *opaque) MigrationThread *thread = NULL; Error *local_err = NULL; int ret = 0; +bool use_packets = multifd_use_packets(); thread = migration_threads_add(p->name, qemu_get_thread_id()); trace_multifd_send_thread_start(p->id); rcu_register_thread(); -if (multifd_send_initial_packet(p, _err) < 0) { -ret = -1; -goto out; +if (use_packets) { +if (multifd_send_initial_packet(p, _err) < 0) { +ret = -1; +goto out; +} } while (true) { @@ -854,16 +879,20 @@ static void *multifd_send_thread(void *opaque) * it doesn't require explicit memory barriers. */ assert(qatomic_read(>pending_sync)); -p->flags = MULTIFD_FLAG_SYNC; -multifd_send_fill_packet(p); -ret = qio_channel_write_all(p->c, (void *)p->packet, -p->packet_len, _err); -if (ret != 0) { -break; + +if (use_packets) { +p->flags = MULTIFD_FLAG_SYNC; +multifd_send_fill_packet(p); +ret = qio_channel_write_all(p->c, (void *)p->packet, +p->packet_len, _err); +if (ret != 0) { +break; +} +/* p->next_packet_size will always
[PULL 20/27] migration/multifd: Add outgoing QIOChannelFile support
From: Fabiano Rosas Allow multifd to open file-backed channels. This will be used when enabling the mapped-ram migration stream format which expects a seekable transport. The QIOChannel read and write methods will use the preadv/pwritev versions which don't update the file offset at each call so we can reuse the fd without re-opening for every channel. Contrary to the socket migration, the file migration doesn't need an asynchronous channel creation process, so expose multifd_channel_connect() and call it directly. Note that this is just setup code and multifd cannot yet make use of the file channels. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240229153017.2221-18-faro...@suse.de Signed-off-by: Peter Xu --- migration/file.h| 4 migration/multifd.h | 1 + migration/file.c| 37 + migration/multifd.c | 18 +++--- 4 files changed, 57 insertions(+), 3 deletions(-) diff --git a/migration/file.h b/migration/file.h index 37d6a08bfc..4577f9efdd 100644 --- a/migration/file.h +++ b/migration/file.h @@ -9,10 +9,14 @@ #define QEMU_MIGRATION_FILE_H #include "qapi/qapi-types-migration.h" +#include "io/task.h" +#include "channel.h" void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp); void file_start_outgoing_migration(MigrationState *s, FileMigrationArgs *file_args, Error **errp); int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp); +void file_cleanup_outgoing_migration(void); +bool file_send_channel_create(gpointer opaque, Error **errp); #endif diff --git a/migration/multifd.h b/migration/multifd.h index 1d8bbaf96b..db8887f088 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -227,5 +227,6 @@ static inline void multifd_send_prepare_header(MultiFDSendParams *p) p->iovs_num++; } +void multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc); #endif diff --git a/migration/file.c b/migration/file.c index 22d052a71f..a350dd61f0 100644 --- a/migration/file.c +++ b/migration/file.c @@ -12,12 +12,17 @@ #include "channel.h" #include "file.h" #include "migration.h" +#include "multifd.h" #include "io/channel-file.h" #include "io/channel-util.h" #include "trace.h" #define OFFSET_OPTION ",offset=" +static struct FileOutgoingArgs { +char *fname; +} outgoing_args; + /* Remove the offset option from @filespec and return it in @offsetp. */ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp) @@ -37,6 +42,36 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp) return 0; } +void file_cleanup_outgoing_migration(void) +{ +g_free(outgoing_args.fname); +outgoing_args.fname = NULL; +} + +bool file_send_channel_create(gpointer opaque, Error **errp) +{ +QIOChannelFile *ioc; +int flags = O_WRONLY; +bool ret = true; + +ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp); +if (!ioc) { +ret = false; +goto out; +} + +multifd_channel_connect(opaque, QIO_CHANNEL(ioc)); + +out: +/* + * File channel creation is synchronous. However posting this + * semaphore here is simpler than adding a special case. + */ +multifd_send_channel_created(); + +return ret; +} + void file_start_outgoing_migration(MigrationState *s, FileMigrationArgs *file_args, Error **errp) { @@ -53,6 +88,8 @@ void file_start_outgoing_migration(MigrationState *s, return; } +outgoing_args.fname = g_strdup(filename); + ioc = QIO_CHANNEL(fioc); if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) { return; diff --git a/migration/multifd.c b/migration/multifd.c index 3574fd3953..caef1076ca 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -17,6 +17,7 @@ #include "exec/ramblock.h" #include "qemu/error-report.h" #include "qapi/error.h" +#include "file.h" #include "ram.h" #include "migration.h" #include "migration-stats.h" @@ -28,6 +29,7 @@ #include "threadinfo.h" #include "options.h" #include "qemu/yank.h" +#include "io/channel-file.h" #include "io/channel-socket.h" #include "yank_functions.h" @@ -694,6 +696,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) { if (p->c) { migration_ioc_unregister_yank(p->c); +qio_channel_close(p->c, _abort); object_unref(OBJECT(p->c)); p->c = NULL; } @@ -715,6 +718,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) static void multifd_send_cleanup_state(void) { +file_cleanup_outgoing_migration(); socket_cleanup_outgoing_migration(); qemu_sem_destroy(_send_state->channels_created); qemu_sem_destroy(_send_state->channels_ready); @@ -977,7 +981,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, return true;
[PULL 26/27] tests/qtest/migration: Add a multifd + mapped-ram migration test
From: Fabiano Rosas Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-24-faro...@suse.de Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 68 1 file changed, 68 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 4c5551f7d0..4023d808f9 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -2248,6 +2248,46 @@ static void test_precopy_file_mapped_ram(void) test_file_common(, true); } +static void *migrate_multifd_mapped_ram_start(QTestState *from, QTestState *to) +{ +migrate_mapped_ram_start(from, to); + +migrate_set_parameter_int(from, "multifd-channels", 4); +migrate_set_parameter_int(to, "multifd-channels", 4); + +migrate_set_capability(from, "multifd", true); +migrate_set_capability(to, "multifd", true); + +return NULL; +} + +static void test_multifd_file_mapped_ram_live(void) +{ +g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, + FILE_TEST_FILENAME); +MigrateCommon args = { +.connect_uri = uri, +.listen_uri = "defer", +.start_hook = migrate_multifd_mapped_ram_start, +}; + +test_file_common(, false); +} + +static void test_multifd_file_mapped_ram(void) +{ +g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, + FILE_TEST_FILENAME); +MigrateCommon args = { +.connect_uri = uri, +.listen_uri = "defer", +.start_hook = migrate_multifd_mapped_ram_start, +}; + +test_file_common(, true); +} + + static void test_precopy_tcp_plain(void) { MigrateCommon args = { @@ -2524,6 +2564,25 @@ static void test_migrate_precopy_fd_file_mapped_ram(void) }; test_file_common(, true); } + +static void *migrate_multifd_fd_mapped_ram_start(QTestState *from, +QTestState *to) +{ +migrate_multifd_mapped_ram_start(from, to); +return migrate_precopy_fd_file_start(from, to); +} + +static void test_multifd_fd_mapped_ram(void) +{ +MigrateCommon args = { +.connect_uri = "fd:fd-mig", +.listen_uri = "defer", +.start_hook = migrate_multifd_fd_mapped_ram_start, +.finish_hook = test_migrate_fd_finish_hook +}; + +test_file_common(, true); +} #endif /* _WIN32 */ static void do_test_validate_uuid(MigrateStart *args, bool should_fail) @@ -3576,6 +3635,15 @@ int main(int argc, char **argv) migration_test_add("/migration/precopy/file/mapped-ram/live", test_precopy_file_mapped_ram_live); +migration_test_add("/migration/multifd/file/mapped-ram", + test_multifd_file_mapped_ram); +migration_test_add("/migration/multifd/file/mapped-ram/live", + test_multifd_file_mapped_ram_live); +#ifndef _WIN32 +migration_test_add("/migration/multifd/fd/mapped-ram", + test_multifd_fd_mapped_ram); +#endif + #ifdef CONFIG_GNUTLS migration_test_add("/migration/precopy/unix/tls/psk", test_precopy_unix_tls_psk); -- 2.44.0
[PULL 19/27] migration/multifd: Add a wrapper for channels_created
From: Fabiano Rosas We'll need to access multifd_send_state->channels_created from outside multifd.c, so introduce a helper for that. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-17-faro...@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 1 + migration/multifd.c | 7 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/migration/multifd.h b/migration/multifd.h index 1be985978e..1d8bbaf96b 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -17,6 +17,7 @@ typedef struct MultiFDRecvData MultiFDRecvData; bool multifd_send_setup(void); void multifd_send_shutdown(void); +void multifd_send_channel_created(void); int multifd_recv_setup(Error **errp); void multifd_recv_cleanup(void); void multifd_recv_shutdown(void); diff --git a/migration/multifd.c b/migration/multifd.c index d470af73ba..3574fd3953 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -101,6 +101,11 @@ static bool multifd_use_packets(void) return !migrate_mapped_ram(); } +void multifd_send_channel_created(void) +{ +qemu_sem_post(_send_state->channels_created); +} + /* Multifd without compression */ /** @@ -1023,7 +1028,7 @@ out: * Here we're not interested whether creation succeeded, only that * it happened at all. */ -qemu_sem_post(_send_state->channels_created); +multifd_send_channel_created(); if (ret) { return; -- 2.44.0
[PULL 14/27] tests/qtest/migration: Add tests for mapped-ram file-based migration
From: Fabiano Rosas Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-12-faro...@suse.de Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 59 1 file changed, 59 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 8c35f3457b..4c5551f7d0 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -2200,6 +2200,14 @@ static void *test_mode_reboot_start(QTestState *from, QTestState *to) return NULL; } +static void *migrate_mapped_ram_start(QTestState *from, QTestState *to) +{ +migrate_set_capability(from, "mapped-ram", true); +migrate_set_capability(to, "mapped-ram", true); + +return NULL; +} + static void test_mode_reboot(void) { g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, @@ -2214,6 +,32 @@ static void test_mode_reboot(void) test_file_common(, true); } +static void test_precopy_file_mapped_ram_live(void) +{ +g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, + FILE_TEST_FILENAME); +MigrateCommon args = { +.connect_uri = uri, +.listen_uri = "defer", +.start_hook = migrate_mapped_ram_start, +}; + +test_file_common(, false); +} + +static void test_precopy_file_mapped_ram(void) +{ +g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, + FILE_TEST_FILENAME); +MigrateCommon args = { +.connect_uri = uri, +.listen_uri = "defer", +.start_hook = migrate_mapped_ram_start, +}; + +test_file_common(, true); +} + static void test_precopy_tcp_plain(void) { MigrateCommon args = { @@ -2462,6 +2496,13 @@ static void *migrate_precopy_fd_file_start(QTestState *from, QTestState *to) return NULL; } +static void *migrate_fd_file_mapped_ram_start(QTestState *from, QTestState *to) +{ +migrate_mapped_ram_start(from, to); + +return migrate_precopy_fd_file_start(from, to); +} + static void test_migrate_precopy_fd_file(void) { MigrateCommon args = { @@ -2472,6 +2513,17 @@ static void test_migrate_precopy_fd_file(void) }; test_file_common(, true); } + +static void test_migrate_precopy_fd_file_mapped_ram(void) +{ +MigrateCommon args = { +.listen_uri = "defer", +.connect_uri = "fd:fd-mig", +.start_hook = migrate_fd_file_mapped_ram_start, +.finish_hook = test_migrate_fd_finish_hook +}; +test_file_common(, true); +} #endif /* _WIN32 */ static void do_test_validate_uuid(MigrateStart *args, bool should_fail) @@ -3519,6 +3571,11 @@ int main(int argc, char **argv) migration_test_add("/migration/mode/reboot", test_mode_reboot); } +migration_test_add("/migration/precopy/file/mapped-ram", + test_precopy_file_mapped_ram); +migration_test_add("/migration/precopy/file/mapped-ram/live", + test_precopy_file_mapped_ram_live); + #ifdef CONFIG_GNUTLS migration_test_add("/migration/precopy/unix/tls/psk", test_precopy_unix_tls_psk); @@ -3580,6 +3637,8 @@ int main(int argc, char **argv) test_migrate_precopy_fd_socket); migration_test_add("/migration/precopy/fd/file", test_migrate_precopy_fd_file); +migration_test_add("/migration/precopy/fd/file/mapped-ram", + test_migrate_precopy_fd_file_mapped_ram); #endif migration_test_add("/migration/validate_uuid", test_validate_uuid); migration_test_add("/migration/validate_uuid_error", -- 2.44.0
[PULL 04/27] migration/multifd: Cleanup multifd_recv_sync_main
From: Fabiano Rosas Some minor cleanups and documentation for multifd_recv_sync_main. Use thread_count as done in other parts of the code. Remove p->id from the multifd_recv_state sync, since that is global and not tied to a channel. Add documentation for the sync steps. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-2-faro...@suse.de Signed-off-by: Peter Xu --- migration/multifd.c| 17 + migration/trace-events | 2 +- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 6c07f19af1..c7389bf833 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -1182,18 +1182,27 @@ void multifd_recv_cleanup(void) void multifd_recv_sync_main(void) { +int thread_count = migrate_multifd_channels(); int i; if (!migrate_multifd()) { return; } -for (i = 0; i < migrate_multifd_channels(); i++) { -MultiFDRecvParams *p = _recv_state->params[i]; -trace_multifd_recv_sync_main_wait(p->id); +/* + * Initiate the synchronization by waiting for all channels. + * For socket-based migration this means each channel has received + * the SYNC packet on the stream. + */ +for (i = 0; i < thread_count; i++) { +trace_multifd_recv_sync_main_wait(i); qemu_sem_wait(_recv_state->sem_sync); } -for (i = 0; i < migrate_multifd_channels(); i++) { + +/* + * Sync done. Release the channels for the next iteration. + */ +for (i = 0; i < thread_count; i++) { MultiFDRecvParams *p = _recv_state->params[i]; WITH_QEMU_LOCK_GUARD(>mutex) { diff --git a/migration/trace-events b/migration/trace-events index 298ad2b0dd..bf1a069632 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -132,7 +132,7 @@ multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uin multifd_recv_new_channel(uint8_t id) "channel %u" multifd_recv_sync_main(long packet_num) "packet num %ld" multifd_recv_sync_main_signal(uint8_t id) "channel %u" -multifd_recv_sync_main_wait(uint8_t id) "channel %u" +multifd_recv_sync_main_wait(uint8_t id) "iter %u" multifd_recv_terminate_threads(bool error) "error %d" multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %" PRIu64 multifd_recv_thread_start(uint8_t id) "%u" -- 2.44.0
[PULL 15/27] migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data
From: Fabiano Rosas Use a more specific name for the compression data so we can use the generic for the multifd core code. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-13-faro...@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 4 ++-- migration/multifd-zlib.c | 20 ++-- migration/multifd-zstd.c | 20 ++-- 3 files changed, 22 insertions(+), 22 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index b3fe27ae93..adccd3532f 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -127,7 +127,7 @@ typedef struct { /* number of iovs used */ uint32_t iovs_num; /* used for compression methods */ -void *data; +void *compress_data; } MultiFDSendParams; typedef struct { @@ -183,7 +183,7 @@ typedef struct { /* num of non zero pages */ uint32_t normal_num; /* used for de-compression methods */ -void *data; +void *compress_data; } MultiFDRecvParams; typedef struct { diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c index 012e3bdea1..2a8f5fc9a6 100644 --- a/migration/multifd-zlib.c +++ b/migration/multifd-zlib.c @@ -69,7 +69,7 @@ static int zlib_send_setup(MultiFDSendParams *p, Error **errp) err_msg = "out of memory for buf"; goto err_free_zbuff; } -p->data = z; +p->compress_data = z; return 0; err_free_zbuff: @@ -92,15 +92,15 @@ err_free_z: */ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp) { -struct zlib_data *z = p->data; +struct zlib_data *z = p->compress_data; deflateEnd(>zs); g_free(z->zbuff); z->zbuff = NULL; g_free(z->buf); z->buf = NULL; -g_free(p->data); -p->data = NULL; +g_free(p->compress_data); +p->compress_data = NULL; } /** @@ -117,7 +117,7 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp) static int zlib_send_prepare(MultiFDSendParams *p, Error **errp) { MultiFDPages_t *pages = p->pages; -struct zlib_data *z = p->data; +struct zlib_data *z = p->compress_data; z_stream *zs = >zs; uint32_t out_size = 0; int ret; @@ -194,7 +194,7 @@ static int zlib_recv_setup(MultiFDRecvParams *p, Error **errp) struct zlib_data *z = g_new0(struct zlib_data, 1); z_stream *zs = >zs; -p->data = z; +p->compress_data = z; zs->zalloc = Z_NULL; zs->zfree = Z_NULL; zs->opaque = Z_NULL; @@ -224,13 +224,13 @@ static int zlib_recv_setup(MultiFDRecvParams *p, Error **errp) */ static void zlib_recv_cleanup(MultiFDRecvParams *p) { -struct zlib_data *z = p->data; +struct zlib_data *z = p->compress_data; inflateEnd(>zs); g_free(z->zbuff); z->zbuff = NULL; -g_free(p->data); -p->data = NULL; +g_free(p->compress_data); +p->compress_data = NULL; } /** @@ -246,7 +246,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p) */ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp) { -struct zlib_data *z = p->data; +struct zlib_data *z = p->compress_data; z_stream *zs = >zs; uint32_t in_size = p->next_packet_size; /* we measure the change of total_out */ diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c index dc8fe43e94..593cf290ad 100644 --- a/migration/multifd-zstd.c +++ b/migration/multifd-zstd.c @@ -52,7 +52,7 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp) struct zstd_data *z = g_new0(struct zstd_data, 1); int res; -p->data = z; +p->compress_data = z; z->zcs = ZSTD_createCStream(); if (!z->zcs) { g_free(z); @@ -90,14 +90,14 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp) */ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp) { -struct zstd_data *z = p->data; +struct zstd_data *z = p->compress_data; ZSTD_freeCStream(z->zcs); z->zcs = NULL; g_free(z->zbuff); z->zbuff = NULL; -g_free(p->data); -p->data = NULL; +g_free(p->compress_data); +p->compress_data = NULL; } /** @@ -114,7 +114,7 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp) static int zstd_send_prepare(MultiFDSendParams *p, Error **errp) { MultiFDPages_t *pages = p->pages; -struct zstd_data *z = p->data; +struct zstd_data *z = p->compress_data; int ret; uint32_t i; @@ -183,7 +183,7 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error **errp) struct zstd_data *z = g_new0(struct zstd_data, 1); int ret; -p->data = z; +p->compress_data = z; z->zds = ZSTD_createDStream(); if (!z->zds) { g_free(z); @@ -221,14 +221,14 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error **errp) */ static void zstd_recv_cleanup(MultiFDRecvParams *p) { -struct zstd_data *z = p->data; +struct zstd_data *z = p->compress_data; ZSTD_freeDStream(z->zds);
[PULL 16/27] migration/multifd: Decouple recv method from pages
From: Fabiano Rosas Next patches will abstract the type of data being received by the channels, so do some cleanup now to remove references to pages and dependency on 'normal_num'. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-14-faro...@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 4 ++-- migration/multifd-zlib.c | 6 +++--- migration/multifd-zstd.c | 6 +++--- migration/multifd.c | 13 - 4 files changed, 16 insertions(+), 13 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index adccd3532f..6a54377cc1 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -197,8 +197,8 @@ typedef struct { int (*recv_setup)(MultiFDRecvParams *p, Error **errp); /* Cleanup for receiving side */ void (*recv_cleanup)(MultiFDRecvParams *p); -/* Read all pages */ -int (*recv_pages)(MultiFDRecvParams *p, Error **errp); +/* Read all data */ +int (*recv)(MultiFDRecvParams *p, Error **errp); } MultiFDMethods; void multifd_register_ops(int method, MultiFDMethods *ops); diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c index 2a8f5fc9a6..6120faad65 100644 --- a/migration/multifd-zlib.c +++ b/migration/multifd-zlib.c @@ -234,7 +234,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p) } /** - * zlib_recv_pages: read the data from the channel into actual pages + * zlib_recv: read the data from the channel into actual pages * * Read the compressed buffer, and uncompress it into the actual * pages. @@ -244,7 +244,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p) * @p: Params for the channel that we are using * @errp: pointer to an error */ -static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp) +static int zlib_recv(MultiFDRecvParams *p, Error **errp) { struct zlib_data *z = p->compress_data; z_stream *zs = >zs; @@ -319,7 +319,7 @@ static MultiFDMethods multifd_zlib_ops = { .send_prepare = zlib_send_prepare, .recv_setup = zlib_recv_setup, .recv_cleanup = zlib_recv_cleanup, -.recv_pages = zlib_recv_pages +.recv = zlib_recv }; static void multifd_zlib_register(void) diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c index 593cf290ad..cac236833d 100644 --- a/migration/multifd-zstd.c +++ b/migration/multifd-zstd.c @@ -232,7 +232,7 @@ static void zstd_recv_cleanup(MultiFDRecvParams *p) } /** - * zstd_recv_pages: read the data from the channel into actual pages + * zstd_recv: read the data from the channel into actual pages * * Read the compressed buffer, and uncompress it into the actual * pages. @@ -242,7 +242,7 @@ static void zstd_recv_cleanup(MultiFDRecvParams *p) * @p: Params for the channel that we are using * @errp: pointer to an error */ -static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp) +static int zstd_recv(MultiFDRecvParams *p, Error **errp) { uint32_t in_size = p->next_packet_size; uint32_t out_size = 0; @@ -310,7 +310,7 @@ static MultiFDMethods multifd_zstd_ops = { .send_prepare = zstd_send_prepare, .recv_setup = zstd_recv_setup, .recv_cleanup = zstd_recv_cleanup, -.recv_pages = zstd_recv_pages +.recv = zstd_recv }; static void multifd_zstd_register(void) diff --git a/migration/multifd.c b/migration/multifd.c index c7389bf833..3a8520097b 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -197,7 +197,7 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p) } /** - * nocomp_recv_pages: read the data from the channel into actual pages + * nocomp_recv: read the data from the channel * * For no compression we just need to read things into the correct place. * @@ -206,7 +206,7 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p) * @p: Params for the channel that we are using * @errp: pointer to an error */ -static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp) +static int nocomp_recv(MultiFDRecvParams *p, Error **errp) { uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK; @@ -228,7 +228,7 @@ static MultiFDMethods multifd_nocomp_ops = { .send_prepare = nocomp_send_prepare, .recv_setup = nocomp_recv_setup, .recv_cleanup = nocomp_recv_cleanup, -.recv_pages = nocomp_recv_pages +.recv = nocomp_recv }; static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = { @@ -1227,6 +1227,8 @@ static void *multifd_recv_thread(void *opaque) while (true) { uint32_t flags; +bool has_data = false; +p->normal_num = 0; if (multifd_recv_should_exit()) { break; @@ -1248,10 +1250,11 @@ static void *multifd_recv_thread(void *opaque) flags = p->flags; /* recv methods don't know how to handle the SYNC flag */ p->flags &= ~MULTIFD_FLAG_SYNC; +has_data = !!p->normal_num; qemu_mutex_unlock(>mutex); -if (p->normal_num) { -ret =
[PULL 00/27] Migration next patches
From: Peter Xu The following changes since commit c0c6a0e3528b88aaad0b9d333e295707a195587b: Merge tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu into staging (2024-02-28 17:27:10 +) are available in the Git repository at: https://gitlab.com/peterx/qemu.git tags/migration-next-pull-request for you to fetch changes up to 1a6e217c35b6dbab10fdc1e02640b8d60b2dc663: migration/multifd: Document two places for mapped-ram (2024-03-04 08:31:11 +0800) Migartion pull request for 20240304 - Bryan's fix on multifd compression level API - Fabiano's mapped-ram series (base + multifd only) - Steve's amend on cpr document in qapi/ Bryan Zhang (2): migration: Properly apply migration compression level parameters tests/migration: Set compression level in migration tests Fabiano Rosas (20): migration/multifd: Cleanup multifd_recv_sync_main io: fsync before closing a file channel migration/qemu-file: add utility methods for working with seekable channels migration/ram: Introduce 'mapped-ram' migration capability migration: Add mapped-ram URI compatibility check migration/ram: Add outgoing 'mapped-ram' migration migration/ram: Add incoming 'mapped-ram' migration tests/qtest/migration: Add tests for mapped-ram file-based migration migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data migration/multifd: Decouple recv method from pages migration/multifd: Allow multifd without packets migration/multifd: Allow receiving pages without packets migration/multifd: Add a wrapper for channels_created migration/multifd: Add outgoing QIOChannelFile support migration/multifd: Add incoming QIOChannelFile support migration/multifd: Prepare multifd sync for mapped-ram migration migration/multifd: Support outgoing mapped-ram stream format migration/multifd: Support incoming mapped-ram stream format migration/multifd: Add mapped-ram support to fd: URI tests/qtest/migration: Add a multifd + mapped-ram migration test Nikolay Borisov (3): io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file io: Add generic pwritev/preadv interface io: implement io_pwritev/preadv for QIOChannelFile Peter Xu (1): migration/multifd: Document two places for mapped-ram Steve Sistare (1): migration: massage cpr-reboot documentation docs/devel/migration/features.rst | 1 + docs/devel/migration/mapped-ram.rst | 138 + qapi/migration.json | 42 +-- include/exec/ramblock.h | 13 + include/io/channel.h| 83 ++ include/migration/qemu-file-types.h | 2 + include/qemu/bitops.h | 13 + migration/fd.h | 2 + migration/file.h| 8 + migration/multifd.h | 27 +- migration/options.h | 1 + migration/qemu-file.h | 6 + migration/ram.h | 1 + io/channel-file.c | 69 + io/channel.c| 58 migration/fd.c | 44 +++ migration/file.c| 149 +- migration/migration.c | 56 +++- migration/multifd-zlib.c| 26 +- migration/multifd-zstd.c| 26 +- migration/multifd.c | 417 ++-- migration/options.c | 47 migration/qemu-file.c | 106 +++ migration/ram.c | 351 +-- migration/savevm.c | 1 + tests/qtest/migration-test.c| 137 + migration/trace-events | 2 +- 27 files changed, 1666 insertions(+), 160 deletions(-) create mode 100644 docs/devel/migration/mapped-ram.rst -- 2.44.0
[PULL 08/27] io: fsync before closing a file channel
From: Fabiano Rosas Make sure the data is flushed to disk before closing file channels. This is to ensure data is on disk and not lost in the event of a host crash. This is currently being implemented to affect the migration code when migrating to a file, but all QIOChannelFile users should benefit from the change. Reviewed-by: "Daniel P. Berrangé" Acked-by: "Daniel P. Berrangé" Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-6-faro...@suse.de Signed-off-by: Peter Xu --- io/channel-file.c | 5 + 1 file changed, 5 insertions(+) diff --git a/io/channel-file.c b/io/channel-file.c index a6ad7770c6..d4706fa592 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -242,6 +242,11 @@ static int qio_channel_file_close(QIOChannel *ioc, { QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); +if (qemu_fdatasync(fioc->fd) < 0) { +error_setg_errno(errp, errno, + "Unable to synchronize file data with storage device"); +return -1; +} if (qemu_close(fioc->fd) < 0) { error_setg_errno(errp, errno, "Unable to close file"); -- 2.44.0
[PULL 02/27] migration: Properly apply migration compression level parameters
From: Bryan Zhang Some glue code was missing, so that using `qmp_migrate_set_parameters` to set `multifd-zstd-level` or `multifd-zlib-level` did not work. This commit adds the glue code to fix that. Signed-off-by: Bryan Zhang Link: https://lore.kernel.org/r/20240301035901.4006936-2-bryan.zh...@bytedance.com Signed-off-by: Peter Xu --- migration/options.c | 12 1 file changed, 12 insertions(+) diff --git a/migration/options.c b/migration/options.c index 3e3e0b93b4..1cd3cc7c33 100644 --- a/migration/options.c +++ b/migration/options.c @@ -1312,6 +1312,12 @@ static void migrate_params_test_apply(MigrateSetParameters *params, if (params->has_multifd_compression) { dest->multifd_compression = params->multifd_compression; } +if (params->has_multifd_zlib_level) { +dest->multifd_zlib_level = params->multifd_zlib_level; +} +if (params->has_multifd_zstd_level) { +dest->multifd_zstd_level = params->multifd_zstd_level; +} if (params->has_xbzrle_cache_size) { dest->xbzrle_cache_size = params->xbzrle_cache_size; } @@ -1447,6 +1453,12 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) if (params->has_multifd_compression) { s->parameters.multifd_compression = params->multifd_compression; } +if (params->has_multifd_zlib_level) { +s->parameters.multifd_zlib_level = params->multifd_zlib_level; +} +if (params->has_multifd_zstd_level) { +s->parameters.multifd_zstd_level = params->multifd_zstd_level; +} if (params->has_xbzrle_cache_size) { s->parameters.xbzrle_cache_size = params->xbzrle_cache_size; xbzrle_cache_resize(params->xbzrle_cache_size, errp); -- 2.44.0
[PULL 06/27] io: Add generic pwritev/preadv interface
From: Nikolay Borisov Introduce basic pwritev/preadv support in the generic channel layer. Specific implementation will follow for the file channel as this is required in order to support migration streams with fixed location of each ram page. Signed-off-by: Nikolay Borisov Reviewed-by: "Daniel P. Berrangé" Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-4-faro...@suse.de Signed-off-by: Peter Xu --- include/io/channel.h | 82 io/channel.c | 58 +++ 2 files changed, 140 insertions(+) diff --git a/include/io/channel.h b/include/io/channel.h index fcb19fd672..7986c49c71 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -131,6 +131,16 @@ struct QIOChannelClass { Error **errp); /* Optional callbacks */ +ssize_t (*io_pwritev)(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + off_t offset, + Error **errp); +ssize_t (*io_preadv)(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + off_t offset, + Error **errp); int (*io_shutdown)(QIOChannel *ioc, QIOChannelShutdown how, Error **errp); @@ -529,6 +539,78 @@ void qio_channel_set_follow_coroutine_ctx(QIOChannel *ioc, bool enabled); int qio_channel_close(QIOChannel *ioc, Error **errp); +/** + * qio_channel_pwritev + * @ioc: the channel object + * @iov: the array of memory regions to write data from + * @niov: the length of the @iov array + * @offset: offset in the channel where writes should begin + * @errp: pointer to a NULL-initialized error object + * + * Not all implementations will support this facility, so may report + * an error. To avoid errors, the caller may check for the feature + * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method. + * + * Behaves as qio_channel_writev_full, apart from not supporting + * sending of file handles as well as beginning the write at the + * passed @offset + * + */ +ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov, +size_t niov, off_t offset, Error **errp); + +/** + * qio_channel_pwrite + * @ioc: the channel object + * @buf: the memory region to write data into + * @buflen: the number of bytes to @buf + * @offset: offset in the channel where writes should begin + * @errp: pointer to a NULL-initialized error object + * + * Not all implementations will support this facility, so may report + * an error. To avoid errors, the caller may check for the feature + * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method. + * + */ +ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, size_t buflen, + off_t offset, Error **errp); + +/** + * qio_channel_preadv + * @ioc: the channel object + * @iov: the array of memory regions to read data into + * @niov: the length of the @iov array + * @offset: offset in the channel where writes should begin + * @errp: pointer to a NULL-initialized error object + * + * Not all implementations will support this facility, so may report + * an error. To avoid errors, the caller may check for the feature + * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method. + * + * Behaves as qio_channel_readv_full, apart from not supporting + * receiving of file handles as well as beginning the read at the + * passed @offset + * + */ +ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov, + size_t niov, off_t offset, Error **errp); + +/** + * qio_channel_pread + * @ioc: the channel object + * @buf: the memory region to write data into + * @buflen: the number of bytes to @buf + * @offset: offset in the channel where writes should begin + * @errp: pointer to a NULL-initialized error object + * + * Not all implementations will support this facility, so may report + * an error. To avoid errors, the caller may check for the feature + * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method. + * + */ +ssize_t qio_channel_pread(QIOChannel *ioc, char *buf, size_t buflen, + off_t offset, Error **errp); + /** * qio_channel_shutdown: * @ioc: the channel object diff --git a/io/channel.c b/io/channel.c index 86c5834510..a1f12f8e90 100644 --- a/io/channel.c +++ b/io/channel.c @@ -454,6 +454,64 @@ GSource *qio_channel_add_watch_source(QIOChannel *ioc, } +ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov, +size_t niov, off_t offset, Error **errp) +{ +QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc); + +if (!klass->io_pwritev) { +error_setg(errp, "Channel does not support pwritev"); +
[PULL 11/27] migration: Add mapped-ram URI compatibility check
From: Fabiano Rosas The mapped-ram migration format needs a channel that supports seeking to be able to write each page to an arbitrary offset in the migration stream. Reviewed-by: "Daniel P. Berrangé" Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-9-faro...@suse.de Signed-off-by: Peter Xu --- migration/migration.c | 29 + 1 file changed, 29 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 69f68f940d..2669600d25 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -148,10 +148,39 @@ static bool transport_supports_multi_channels(MigrationAddress *addr) return false; } +static bool migration_needs_seekable_channel(void) +{ +return migrate_mapped_ram(); +} + +static bool transport_supports_seeking(MigrationAddress *addr) +{ +if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) { +return true; +} + +/* + * At this point, the user might not yet have passed the file + * descriptor to QEMU, so we cannot know for sure whether it + * refers to a plain file or a socket. Let it through anyway. + */ +if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) { +return addr->u.socket.type == SOCKET_ADDRESS_TYPE_FD; +} + +return false; +} + static bool migration_channels_and_transport_compatible(MigrationAddress *addr, Error **errp) { +if (migration_needs_seekable_channel() && +!transport_supports_seeking(addr)) { +error_setg(errp, "Migration requires seekable transport (e.g. file)"); +return false; +} + if (migration_needs_multiple_sockets() && !transport_supports_multi_channels(addr)) { error_setg(errp, "Migration requires multi-channel URIs (e.g. tcp)"); -- 2.44.0
[PULL 01/27] migration: massage cpr-reboot documentation
From: Steve Sistare Re-wrap the cpr-reboot documentation to 70 columns, use '@' for cpr-reboot references, capitalize COLO and VFIO, and tweak the wording. Suggested-by: Markus Armbruster Signed-off-by: Steve Sistare Link: https://lore.kernel.org/r/1709218462-3640-1-git-send-email-steven.sist...@oracle.com [peterx: s/qemu/QEMU per Markus's suggestion] Reviewed-by: Markus Armbruster Signed-off-by: Peter Xu --- qapi/migration.json | 46 +++-- 1 file changed, 24 insertions(+), 22 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index 0b33a71ab4..b603aa6f25 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -636,28 +636,30 @@ # # @normal: the original form of migration. (since 8.2) # -# @cpr-reboot: The migrate command stops the VM and saves state to the URI. -# After quitting qemu, the user resumes by running qemu -incoming. -# -# This mode allows the user to quit qemu, and restart an updated version -# of qemu. The user may even update and reboot the OS before restarting, -# as long as the URI persists across a reboot. -# -# Unlike normal mode, the use of certain local storage options does not -# block the migration, but the user must not modify guest block devices -# between the quit and restart. -# -# This mode supports vfio devices provided the user first puts the guest -# in the suspended runstate, such as by issuing guest-suspend-ram to the -# qemu guest agent. -# -# Best performance is achieved when the memory backend is shared and the -# @x-ignore-shared migration capability is set, but this is not required. -# Further, if the user reboots before restarting such a configuration, the -# shared backend must be be non-volatile across reboot, such as by backing -# it with a dax device. -# -# cpr-reboot may not be used with postcopy, colo, or background-snapshot. +# @cpr-reboot: The migrate command stops the VM and saves state to +# the URI. After quitting QEMU, the user resumes by running +# QEMU -incoming. +# +# This mode allows the user to quit QEMU, optionally update and +# reboot the OS, and restart QEMU. If the user reboots, the URI +# must persist across the reboot, such as by using a file. +# +# Unlike normal mode, the use of certain local storage options +# does not block the migration, but the user must not modify the +# contents of guest block devices between the quit and restart. +# +# This mode supports VFIO devices provided the user first puts +# the guest in the suspended runstate, such as by issuing +# guest-suspend-ram to the QEMU guest agent. +# +# Best performance is achieved when the memory backend is shared +# and the @x-ignore-shared migration capability is set, but this +# is not required. Further, if the user reboots before restarting +# such a configuration, the shared memory must persist across the +# reboot, such as by backing it with a dax device. +# +# @cpr-reboot may not be used with postcopy, background-snapshot, +# or COLO. # # (since 8.2) ## -- 2.44.0
[PULL 05/27] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
From: Nikolay Borisov Add a generic QIOChannel feature SEEKABLE which would be used by the qemu_file* apis. For the time being this will be only implemented for file channels. Signed-off-by: Nikolay Borisov Reviewed-by: "Daniel P. Berrangé" Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240229153017.2221-3-faro...@suse.de Signed-off-by: Peter Xu --- include/io/channel.h | 1 + io/channel-file.c| 8 2 files changed, 9 insertions(+) diff --git a/include/io/channel.h b/include/io/channel.h index 5f9dbaab65..fcb19fd672 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -44,6 +44,7 @@ enum QIOChannelFeature { QIO_CHANNEL_FEATURE_LISTEN, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY, QIO_CHANNEL_FEATURE_READ_MSG_PEEK, +QIO_CHANNEL_FEATURE_SEEKABLE, }; diff --git a/io/channel-file.c b/io/channel-file.c index 4a12c61886..f91bf6db1c 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -36,6 +36,10 @@ qio_channel_file_new_fd(int fd) ioc->fd = fd; +if (lseek(fd, 0, SEEK_CUR) != (off_t)-1) { +qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE); +} + trace_qio_channel_file_new_fd(ioc, fd); return ioc; @@ -60,6 +64,10 @@ qio_channel_file_new_path(const char *path, return NULL; } +if (lseek(ioc->fd, 0, SEEK_CUR) != (off_t)-1) { +qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE); +} + trace_qio_channel_file_new_path(ioc, path, flags, mode, ioc->fd); return ioc; -- 2.44.0
[PULL 09/27] migration/qemu-file: add utility methods for working with seekable channels
From: Fabiano Rosas Add utility methods that will be needed when implementing 'mapped-ram' migration capability. Signed-off-by: Fabiano Rosas Reviewed-by: "Daniel P. Berrangé" Link: https://lore.kernel.org/r/20240229153017.2221-7-faro...@suse.de Signed-off-by: Peter Xu --- include/migration/qemu-file-types.h | 2 + migration/qemu-file.h | 6 ++ migration/qemu-file.c | 106 3 files changed, 114 insertions(+) diff --git a/include/migration/qemu-file-types.h b/include/migration/qemu-file-types.h index 9ba163f333..adec5abc07 100644 --- a/include/migration/qemu-file-types.h +++ b/include/migration/qemu-file-types.h @@ -50,6 +50,8 @@ unsigned int qemu_get_be16(QEMUFile *f); unsigned int qemu_get_be32(QEMUFile *f); uint64_t qemu_get_be64(QEMUFile *f); +bool qemu_file_is_seekable(QEMUFile *f); + static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv) { qemu_put_be64(f, *pv); diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 8aec9fabf7..32fd4a34fd 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -75,6 +75,12 @@ QEMUFile *qemu_file_get_return_path(QEMUFile *f); int qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); +void qemu_set_offset(QEMUFile *f, off_t off, int whence); +off_t qemu_get_offset(QEMUFile *f); +void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, +off_t pos); +size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, + off_t pos); QIOChannel *qemu_file_get_ioc(QEMUFile *file); diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 94231ff295..b10c882629 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -33,6 +33,7 @@ #include "options.h" #include "qapi/error.h" #include "rdma.h" +#include "io/channel-file.h" #define IO_BUF_SIZE 32768 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64) @@ -255,6 +256,10 @@ static void qemu_iovec_release_ram(QEMUFile *f) memset(f->may_free, 0, sizeof(f->may_free)); } +bool qemu_file_is_seekable(QEMUFile *f) +{ +return qio_channel_has_feature(f->ioc, QIO_CHANNEL_FEATURE_SEEKABLE); +} /** * Flushes QEMUFile buffer @@ -447,6 +452,107 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size) } } +void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, +off_t pos) +{ +Error *err = NULL; +size_t ret; + +if (f->last_error) { +return; +} + +qemu_fflush(f); +ret = qio_channel_pwrite(f->ioc, (char *)buf, buflen, pos, ); + +if (err) { +qemu_file_set_error_obj(f, -EIO, err); +return; +} + +if ((ssize_t)ret == QIO_CHANNEL_ERR_BLOCK) { +qemu_file_set_error_obj(f, -EAGAIN, NULL); +return; +} + +if (ret != buflen) { +error_setg(, "Partial write of size %zu, expected %zu", ret, + buflen); +qemu_file_set_error_obj(f, -EIO, err); +return; +} + +stat64_add(_stats.qemu_file_transferred, buflen); + +return; +} + + +size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, + off_t pos) +{ +Error *err = NULL; +size_t ret; + +if (f->last_error) { +return 0; +} + +ret = qio_channel_pread(f->ioc, (char *)buf, buflen, pos, ); + +if ((ssize_t)ret == -1 || err) { +qemu_file_set_error_obj(f, -EIO, err); +return 0; +} + +if ((ssize_t)ret == QIO_CHANNEL_ERR_BLOCK) { +qemu_file_set_error_obj(f, -EAGAIN, NULL); +return 0; +} + +if (ret != buflen) { +error_setg(, "Partial read of size %zu, expected %zu", ret, buflen); +qemu_file_set_error_obj(f, -EIO, err); +return 0; +} + +return ret; +} + +void qemu_set_offset(QEMUFile *f, off_t off, int whence) +{ +Error *err = NULL; +off_t ret; + +if (qemu_file_is_writable(f)) { +qemu_fflush(f); +} else { +/* Drop all cached buffers if existed; will trigger a re-fill later */ +f->buf_index = 0; +f->buf_size = 0; +} + +ret = qio_channel_io_seek(f->ioc, off, whence, ); +if (ret == (off_t)-1) { +qemu_file_set_error_obj(f, -EIO, err); +} +} + +off_t qemu_get_offset(QEMUFile *f) +{ +Error *err = NULL; +off_t ret; + +qemu_fflush(f); + +ret = qio_channel_io_seek(f->ioc, 0, SEEK_CUR, ); +if (ret == (off_t)-1) { +qemu_file_set_error_obj(f, -EIO, err); +} +return ret; +} + + void qemu_put_byte(QEMUFile *f, int v) { if (f->last_error) { -- 2.44.0
[PATCH] migration/multifd: Document two places for mapped-ram
From: Peter Xu Add two documentations for mapped-ram migration on two spots that may not be extremely clear. Signed-off-by: Peter Xu --- Based-on: <20240229153017.2221-1-faro...@suse.de> --- migration/multifd.c | 12 migration/ram.c | 8 +++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index b4e5a9dfcc..2942395ce2 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -709,6 +709,18 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) { if (p->c) { migration_ioc_unregister_yank(p->c); +/* + * An explicitly close() on the channel here is normally not + * required, but can be helpful for "file:" iochannels, where it + * will include an fdatasync() to make sure the data is flushed to + * the disk backend. + * + * The object_unref() cannot guarantee that because: (1) finalize() + * of the iochannel is only triggered on the last reference, and + * it's not guaranteed that we always hold the last refcount when + * reaching here, and, (2) even if finalize() is invoked, it only + * does a close(fd) without data flush. + */ qio_channel_close(p->c, _abort); object_unref(OBJECT(p->c)); p->c = NULL; diff --git a/migration/ram.c b/migration/ram.c index 1f1b5297cf..c79e3de521 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4258,7 +4258,13 @@ static int ram_load_precopy(QEMUFile *f) switch (flags & ~RAM_SAVE_FLAG_CONTINUE) { case RAM_SAVE_FLAG_MEM_SIZE: ret = parse_ramblocks(f, addr); - +/* + * For mapped-ram migration (to a file) using multifd, we sync + * once and for all here to make sure all tasks we queued to + * multifd threads are completed, so that all the ramblocks + * (including all the guest memory pages within) are fully + * loaded after this sync returns. + */ if (migrate_mapped_ram()) { multifd_recv_sync_main(); } -- 2.44.0
[PULL 21/25] migration: update cpr-reboot description
From: Steve Sistare Clarify qapi for cpr-reboot migration mode, and add vfio support. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/1708622920-68779-14-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- qapi/migration.json | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index 7303e57e8e..bee5e71fe3 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -636,19 +636,28 @@ # # @normal: the original form of migration. (since 8.2) # -# @cpr-reboot: The migrate command saves state to a file, allowing one to -# quit qemu, reboot to an updated kernel, and restart an updated -# version of qemu. The caller must specify a migration URI -# that writes to and reads from a file. Unlike normal mode, -# the use of certain local storage options does not block the -# migration, but the caller must not modify guest block devices -# between the quit and restart. To avoid saving guest RAM to the -# file, the memory backend must be shared, and the @x-ignore-shared -# migration capability must be set. Guest RAM must be non-volatile -# across reboot, such as by backing it with a dax device, but this -# is not enforced. The restarted qemu arguments must match those -# used to initially start qemu, plus the -incoming option. -# (since 8.2) +# @cpr-reboot: The migrate command stops the VM and saves state to the URI. +# After quitting qemu, the user resumes by running qemu -incoming. +# +# This mode allows the user to quit qemu, and restart an updated version +# of qemu. The user may even update and reboot the OS before restarting, +# as long as the URI persists across a reboot. +# +# Unlike normal mode, the use of certain local storage options does not +# block the migration, but the user must not modify guest block devices +# between the quit and restart. +# +# This mode supports vfio devices provided the user first puts the guest +# in the suspended runstate, such as by issuing guest-suspend-ram to the +# qemu guest agent. +# +# Best performance is achieved when the memory backend is shared and the +# @x-ignore-shared migration capability is set, but this is not required. +# Further, if the user reboots before restarting such a configuration, the +# shared backend must be be non-volatile across reboot, such as by backing +# it with a dax device. +# +# (since 8.2) ## { 'enum': 'MigMode', 'data': [ 'normal', 'cpr-reboot' ] } -- 2.43.0
[PULL 17/25] migration: per-mode notifiers
From: Steve Sistare Keep a separate list of migration notifiers for each migration mode. Suggested-by: Peter Xu Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/r/1708622920-68779-8-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 6 ++ migration/migration.c| 22 +- 2 files changed, 23 insertions(+), 5 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index e36a1f3ec4..4dc06a92b7 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -86,6 +86,12 @@ typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify, void migration_add_notifier(NotifierWithReturn *notify, MigrationNotifyFunc func); +/* + * Same as migration_add_notifier, but applies to be specified @mode. + */ +void migration_add_notifier_mode(NotifierWithReturn *notify, + MigrationNotifyFunc func, MigMode mode); + void migration_remove_notifier(NotifierWithReturn *notify); void migration_call_notifiers(MigrationState *s, MigrationEventType type); bool migration_in_setup(MigrationState *); diff --git a/migration/migration.c b/migration/migration.c index 33149c462c..925103b61a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -69,8 +69,13 @@ #include "qemu/sockets.h" #include "sysemu/kvm.h" -static NotifierWithReturnList migration_state_notifiers = -NOTIFIER_WITH_RETURN_LIST_INITIALIZER(migration_state_notifiers); +#define NOTIFIER_ELEM_INIT(array, elem)\ +[elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem]) + +static NotifierWithReturnList migration_state_notifiers[] = { +NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL), +NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT), +}; /* Messages sent on the return path from destination to source */ enum mig_rp_message_type { @@ -1463,11 +1468,17 @@ static void migrate_fd_cancel(MigrationState *s) } } +void migration_add_notifier_mode(NotifierWithReturn *notify, + MigrationNotifyFunc func, MigMode mode) +{ +notify->notify = (NotifierWithReturnFunc)func; +notifier_with_return_list_add(_state_notifiers[mode], notify); +} + void migration_add_notifier(NotifierWithReturn *notify, MigrationNotifyFunc func) { -notify->notify = (NotifierWithReturnFunc)func; -notifier_with_return_list_add(_state_notifiers, notify); +migration_add_notifier_mode(notify, func, MIG_MODE_NORMAL); } void migration_remove_notifier(NotifierWithReturn *notify) @@ -1480,10 +1491,11 @@ void migration_remove_notifier(NotifierWithReturn *notify) void migration_call_notifiers(MigrationState *s, MigrationEventType type) { +MigMode mode = s->parameters.mode; MigrationEvent e; e.type = type; -notifier_with_return_list_notify(_state_notifiers, , 0); +notifier_with_return_list_notify(_state_notifiers[mode], , 0); } bool migration_in_setup(MigrationState *s) -- 2.43.0
[PULL 13/25] migration: convert to NotifierWithReturn
From: Steve Sistare Change all migration notifiers to type NotifierWithReturn, so notifiers can return an error status in a future patch. For now, pass NULL for the notifier error parameter, and do not check the return value. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/r/1708622920-68779-4-git-send-email-steven.sist...@oracle.com [peterx: dropped unexpected update to roms/seabios-hppa] Signed-off-by: Peter Xu --- include/hw/vfio/vfio-common.h | 2 +- include/hw/virtio/virtio-net.h | 2 +- include/migration/misc.h | 6 +++--- include/qemu/notify.h | 1 + hw/net/virtio-net.c| 4 +++- hw/vfio/migration.c| 4 +++- migration/migration.c | 16 net/vhost-vdpa.c | 6 -- ui/spice-core.c| 8 +--- 9 files changed, 29 insertions(+), 20 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 9b7ef7d02b..4a6c262f77 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -62,7 +62,7 @@ typedef struct VFIORegion { typedef struct VFIOMigration { struct VFIODevice *vbasedev; VMChangeStateEntry *vm_state; -Notifier migration_state; +NotifierWithReturn migration_state; uint32_t device_state; int data_fd; void *data_buffer; diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index 55977f01f0..eaee8f4243 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -221,7 +221,7 @@ struct VirtIONet { DeviceListener primary_listener; QDict *primary_opts; bool primary_opts_from_json; -Notifier migration_state; +NotifierWithReturn migration_state; VirtioNetRssData rss_data; struct NetRxPkt *rx_pkt; struct EBPFRSSContext ebpf_rss; diff --git a/include/migration/misc.h b/include/migration/misc.h index 5e65c18f1a..b62e351d96 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -60,9 +60,9 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(MigrationState *); -void migration_add_notifier(Notifier *notify, -void (*func)(Notifier *notifier, void *data)); -void migration_remove_notifier(Notifier *notify); +void migration_add_notifier(NotifierWithReturn *notify, +NotifierWithReturnFunc func); +void migration_remove_notifier(NotifierWithReturn *notify); void migration_call_notifiers(MigrationState *s); bool migration_in_setup(MigrationState *); bool migration_has_finished(MigrationState *); diff --git a/include/qemu/notify.h b/include/qemu/notify.h index 9a85631864..abf18dbf59 100644 --- a/include/qemu/notify.h +++ b/include/qemu/notify.h @@ -45,6 +45,7 @@ bool notifier_list_empty(NotifierList *list); /* Same as Notifier but allows .notify() to return errors */ typedef struct NotifierWithReturn NotifierWithReturn; +/* Return int to allow for different failure modes and recovery actions */ typedef int (*NotifierWithReturnFunc)(NotifierWithReturn *notifier, void *data, Error **errp); diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 5a79bc3a3a..75f4e8664d 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -3534,11 +3534,13 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s) } } -static void virtio_net_migration_state_notifier(Notifier *notifier, void *data) +static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier, + void *data, Error **errp) { MigrationState *s = data; VirtIONet *n = container_of(notifier, VirtIONet, migration_state); virtio_net_handle_migration_primary(n, s); +return 0; } static bool failover_hide_primary_device(DeviceListener *listener, diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 70e6b1a709..6b6acc4140 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -754,7 +754,8 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state) mig_state_to_str(new_state)); } -static void vfio_migration_state_notifier(Notifier *notifier, void *data) +static int vfio_migration_state_notifier(NotifierWithReturn *notifier, + void *data, Error **errp) { MigrationState *s = data; VFIOMigration *migration = container_of(notifier, VFIOMigration, @@ -770,6 +771,7 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data) case MIGRATION_STATUS_FAILED: vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_RUNNING); } +return 0; } static void vfio_migration_free(VFIODevice *vbasedev) diff --git a/migration/migration.c b/migration/migration.c index ab21de2cad
[PULL 25/25] migration: Use migrate_has_error() in close_return_path_on_source()
From: Cédric Le Goater close_return_path_on_source() retrieves the migration error from the the QEMUFile '->to_dst_file' to know if a shutdown is required. This shutdown is required to exit the return-path thread. Avoid relying on '->to_dst_file' and use migrate_has_error() instead. (using to_dst_file is a heuristic to infer whether rp_state.from_dst_file might be stuck on a recvmsg(). Using a generic method for detecting errors is more reliable. We also want to reduce dependency on QEMUFile::last_error) Suggested-by: Peter Xu Signed-off-by: Cédric Le Goater Reviewed-by: Peter Xu [added some words about the motivation for this patch] Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240226203122.22894-3-faro...@suse.de Signed-off-by: Peter Xu --- migration/migration.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 7ba2b60e46..bab68bcbef 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2429,8 +2429,7 @@ static bool close_return_path_on_source(MigrationState *ms) * cause it to unblock if it's stuck waiting for the destination. */ WITH_QEMU_LOCK_GUARD(>qemu_file_lock) { -if (ms->to_dst_file && ms->rp_state.from_dst_file && -qemu_file_get_error(ms->to_dst_file)) { +if (migrate_has_error(ms) && ms->rp_state.from_dst_file) { qemu_file_shutdown(ms->rp_state.from_dst_file); } } -- 2.43.0
[PULL 16/25] migration: MigrationNotifyFunc
From: Steve Sistare Define MigrationNotifyFunc to improve type safety and simplify migration notifiers. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/r/1708622920-68779-7-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 5 - hw/net/virtio-net.c | 4 +--- hw/vfio/migration.c | 3 +-- migration/migration.c| 4 ++-- net/vhost-vdpa.c | 6 ++ ui/spice-core.c | 4 +--- 6 files changed, 11 insertions(+), 15 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index e6150009e0..e36a1f3ec4 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -72,6 +72,9 @@ typedef struct MigrationEvent { MigrationEventType type; } MigrationEvent; +typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify, + MigrationEvent *e, Error **errp); + /* * Register the notifier @notify to be called when a migration event occurs * for MIG_MODE_NORMAL, as specified by the MigrationEvent passed to @func. @@ -81,7 +84,7 @@ typedef struct MigrationEvent { *- MIG_EVENT_PRECOPY_FAILED */ void migration_add_notifier(NotifierWithReturn *notify, -NotifierWithReturnFunc func); +MigrationNotifyFunc func); void migration_remove_notifier(NotifierWithReturn *notify); void migration_call_notifiers(MigrationState *s, MigrationEventType type); diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index e803f98c3a..a3c711b56d 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -3535,10 +3535,8 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationEvent *e) } static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier, - void *data, Error **errp) + MigrationEvent *e, Error **errp) { -MigrationEvent *e = data; - VirtIONet *n = container_of(notifier, VirtIONet, migration_state); virtio_net_handle_migration_primary(n, e); return 0; diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 869d8417d6..50140eda87 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -755,9 +755,8 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state) } static int vfio_migration_state_notifier(NotifierWithReturn *notifier, - void *data, Error **errp) + MigrationEvent *e, Error **errp) { -MigrationEvent *e = data; VFIOMigration *migration = container_of(notifier, VFIOMigration, migration_state); VFIODevice *vbasedev = migration->vbasedev; diff --git a/migration/migration.c b/migration/migration.c index 8f7f2d92f4..33149c462c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1464,9 +1464,9 @@ static void migrate_fd_cancel(MigrationState *s) } void migration_add_notifier(NotifierWithReturn *notify, -NotifierWithReturnFunc func) +MigrationNotifyFunc func) { -notify->notify = func; +notify->notify = (NotifierWithReturnFunc)func; notifier_with_return_list_add(_state_notifiers, notify); } diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index a29d18a9ef..e6bdb4562d 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -323,11 +323,9 @@ static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable) } static int vdpa_net_migration_state_notifier(NotifierWithReturn *notifier, - void *data, Error **errp) + MigrationEvent *e, Error **errp) { -MigrationEvent *e = data; -VhostVDPAState *s = container_of(notifier, VhostVDPAState, - migration_state); +VhostVDPAState *s = container_of(notifier, VhostVDPAState, migration_state); if (e->type == MIG_EVENT_PRECOPY_SETUP) { vhost_vdpa_net_log_global_enable(s, true); diff --git a/ui/spice-core.c b/ui/spice-core.c index 0a59876da2..15be640286 100644 --- a/ui/spice-core.c +++ b/ui/spice-core.c @@ -569,10 +569,8 @@ static SpiceInfo *qmp_query_spice_real(Error **errp) } static int migration_state_notifier(NotifierWithReturn *notifier, -void *data, Error **errp) +MigrationEvent *e, Error **errp) { -MigrationEvent *e = data; - if (!spice_have_target_host) { return 0; } -- 2.43.0
[PULL 22/25] migration: options incompatible with cpr
From: Steve Sistare Fail the migration request if options are set that are incompatible with cpr. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/1708622920-68779-15-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- qapi/migration.json | 2 ++ migration/migration.c | 17 + 2 files changed, 19 insertions(+) diff --git a/qapi/migration.json b/qapi/migration.json index bee5e71fe3..0b33a71ab4 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -657,6 +657,8 @@ # shared backend must be be non-volatile across reboot, such as by backing # it with a dax device. # +# cpr-reboot may not be used with postcopy, colo, or background-snapshot. +# # (since 8.2) ## { 'enum': 'MigMode', diff --git a/migration/migration.c b/migration/migration.c index 90a90947fb..7652fd4d14 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1953,6 +1953,23 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc, return false; } +if (migrate_mode_is_cpr(s)) { +const char *conflict = NULL; + +if (migrate_postcopy()) { +conflict = "postcopy"; +} else if (migrate_background_snapshot()) { +conflict = "background snapshot"; +} else if (migrate_colo()) { +conflict = "COLO"; +} + +if (conflict) { +error_setg(errp, "Cannot use %s with CPR", conflict); +return false; +} +} + if (blk || blk_inc) { if (migrate_colo()) { error_setg(errp, "No disk migration is required in COLO mode"); -- 2.43.0
[PULL 19/25] migration: notifier error checking
From: Steve Sistare Check the status returned by migration notifiers for event type MIG_EVENT_PRECOPY_SETUP, and report errors. None of the notifiers return an error status at this time. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/1708622920-68779-10-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 8 +++- migration/migration.c| 25 - 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 4dc06a92b7..e4933b815b 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -72,6 +72,11 @@ typedef struct MigrationEvent { MigrationEventType type; } MigrationEvent; +/* + * A MigrationNotifyFunc may return an error code and an Error object, + * but only when @e->type is MIG_EVENT_PRECOPY_SETUP. The code is an int + * to allow for different failure modes and recovery actions. + */ typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify, MigrationEvent *e, Error **errp); @@ -93,7 +98,8 @@ void migration_add_notifier_mode(NotifierWithReturn *notify, MigrationNotifyFunc func, MigMode mode); void migration_remove_notifier(NotifierWithReturn *notify); -void migration_call_notifiers(MigrationState *s, MigrationEventType type); +int migration_call_notifiers(MigrationState *s, MigrationEventType type, + Error **errp); bool migration_in_setup(MigrationState *); bool migration_has_finished(MigrationState *); bool migration_has_failed(MigrationState *); diff --git a/migration/migration.c b/migration/migration.c index 6a115d28b8..37c836b0b0 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1376,7 +1376,7 @@ static void migrate_fd_cleanup(MigrationState *s) } type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED : MIG_EVENT_PRECOPY_DONE; -migration_call_notifiers(s, type); +migration_call_notifiers(s, type, NULL); block_cleanup_parameters(); yank_unregister_instance(MIGRATION_YANK_INSTANCE); } @@ -1489,13 +1489,18 @@ void migration_remove_notifier(NotifierWithReturn *notify) } } -void migration_call_notifiers(MigrationState *s, MigrationEventType type) +int migration_call_notifiers(MigrationState *s, MigrationEventType type, + Error **errp) { MigMode mode = s->parameters.mode; MigrationEvent e; +int ret; e.type = type; -notifier_with_return_list_notify(_state_notifiers[mode], , 0); +ret = notifier_with_return_list_notify(_state_notifiers[mode], + , errp); +assert(!ret || type == MIG_EVENT_PRECOPY_SETUP); +return ret; } bool migration_in_setup(MigrationState *s) @@ -2549,7 +2554,7 @@ static int postcopy_start(MigrationState *ms, Error **errp) * at the transition to postcopy and after the device state; in particular * spice needs to trigger a transition now */ -migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE); +migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE, NULL); migration_downtime_end(ms); @@ -2569,11 +2574,10 @@ static int postcopy_start(MigrationState *ms, Error **errp) ret = qemu_file_get_error(ms->to_dst_file); if (ret) { -error_setg(errp, "postcopy_start: Migration stream errored"); -migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE, - MIGRATION_STATUS_FAILED); +error_setg_errno(errp, -ret, "postcopy_start: Migration stream error"); +bql_lock(); +goto fail; } - trace_postcopy_preempt_enabled(migrate_postcopy_preempt()); return ret; @@ -2594,6 +2598,7 @@ fail: error_report_err(local_err); } } +migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL); bql_unlock(); return -1; } @@ -3613,7 +3618,9 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) rate_limit = migrate_max_bandwidth(); /* Notify before starting migration thread */ -migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP); +if (migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP, _err)) { +goto fail; +} } migration_rate_set(rate_limit); -- 2.43.0
[PULL 24/25] migration: Join the return path thread before releasing to_dst_file
From: Fabiano Rosas The return path thread might hang at a blocking system call. Before joining the thread we might need to issue a shutdown() on the socket file descriptor to release it. To determine whether the shutdown() is necessary we look at the QEMUFile error. Make sure we only clean up the QEMUFile after the return path has been waited for. This fixes a hang when qemu_savevm_state_setup() produced an error that was detected by migration_detect_error(). That skips migration_completion() so close_return_path_on_source() would get stuck waiting for the RP thread to terminate. Reported-by: Cédric Le Goater Tested-by: Cédric Le Goater Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240226203122.22894-2-faro...@suse.de Signed-off-by: Peter Xu --- migration/migration.c | 22 +- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index ccb13fa94a..7ba2b60e46 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1342,6 +1342,8 @@ static void migrate_fd_cleanup(MigrationState *s) qemu_savevm_state_cleanup(); +close_return_path_on_source(s); + if (s->to_dst_file) { QEMUFile *tmp; @@ -1366,12 +1368,6 @@ static void migrate_fd_cleanup(MigrationState *s) qemu_fclose(tmp); } -/* - * We already cleaned up to_dst_file, so errors from the return - * path might be due to that, ignore them. - */ -close_return_path_on_source(s); - assert(!migration_is_active(s)); if (s->state == MIGRATION_STATUS_CANCELLING) { @@ -2914,6 +2910,13 @@ static MigThrError postcopy_pause(MigrationState *s) while (true) { QEMUFile *file; +/* + * We're already pausing, so ignore any errors on the return + * path and just wait for the thread to finish. It will be + * re-created when we resume. + */ +close_return_path_on_source(s); + /* * Current channel is possibly broken. Release it. Note that this is * guaranteed even without lock because to_dst_file should only be @@ -2933,13 +2936,6 @@ static MigThrError postcopy_pause(MigrationState *s) qemu_file_shutdown(file); qemu_fclose(file); -/* - * We're already pausing, so ignore any errors on the return - * path and just wait for the thread to finish. It will be - * re-created when we resume. - */ -close_return_path_on_source(s); - migrate_set_state(>state, s->state, MIGRATION_STATUS_POSTCOPY_PAUSED); -- 2.43.0
[PULL 15/25] migration: remove postcopy_after_devices
From: Steve Sistare postcopy_after_devices and migration_in_postcopy_after_devices are no longer used, so delete them. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/1708622920-68779-6-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 1 - migration/migration.h| 2 -- migration/migration.c| 7 --- 3 files changed, 10 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 9e4abae97f..e6150009e0 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -89,7 +89,6 @@ bool migration_in_setup(MigrationState *); bool migration_has_finished(MigrationState *); bool migration_has_failed(MigrationState *); /* ...and after the device transmission */ -bool migration_in_postcopy_after_devices(MigrationState *); /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */ bool migration_in_incoming_postcopy(void); /* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */ diff --git a/migration/migration.h b/migration/migration.h index f2c8b8f286..aef8afbe1f 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -348,8 +348,6 @@ struct MigrationState { /* Flag set once the migration has been asked to enter postcopy */ bool start_postcopy; -/* Flag set after postcopy has sent the device state */ -bool postcopy_after_devices; /* Flag set once the migration thread is running (and needs joining) */ bool migration_thread_running; diff --git a/migration/migration.c b/migration/migration.c index 4650c21f67..8f7f2d92f4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1527,11 +1527,6 @@ bool migration_postcopy_is_alive(int state) } } -bool migration_in_postcopy_after_devices(MigrationState *s) -{ -return migration_in_postcopy() && s->postcopy_after_devices; -} - bool migration_in_incoming_postcopy(void) { PostcopyState ps = postcopy_state_get(); @@ -1613,7 +1608,6 @@ int migrate_init(MigrationState *s, Error **errp) s->expected_downtime = 0; s->setup_time = 0; s->start_postcopy = false; -s->postcopy_after_devices = false; s->migration_thread_running = false; error_free(s->error); s->error = NULL; @@ -2543,7 +2537,6 @@ static int postcopy_start(MigrationState *ms, Error **errp) * at the transition to postcopy and after the device state; in particular * spice needs to trigger a transition now */ -ms->postcopy_after_devices = true; migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE); migration_downtime_end(ms); -- 2.43.0
[PULL 20/25] migration: stop vm for cpr
From: Steve Sistare When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being out of sync, and guarantees that a guest in the suspended state remains suspended, because qmp_cont rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 1 + migration/migration.h| 2 -- migration/migration.c| 51 3 files changed, 32 insertions(+), 22 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index e4933b815b..5d1aa593ed 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -60,6 +60,7 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(MigrationState *); +bool migrate_mode_is_cpr(MigrationState *); typedef enum MigrationEventType { MIG_EVENT_PRECOPY_SETUP, diff --git a/migration/migration.h b/migration/migration.h index aef8afbe1f..65c0b61cbd 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s); */ void migration_rp_kick(MigrationState *s); -int migration_stop_vm(RunState state); - #endif diff --git a/migration/migration.c b/migration/migration.c index 37c836b0b0..90a90947fb 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp) return (a > b) - (a < b); } -int migration_stop_vm(RunState state) +static int migration_stop_vm(MigrationState *s, RunState state) { -int ret = vm_stop_force_state(state); +int ret; + +migration_downtime_start(s); + +s->vm_old_state = runstate_get(); +global_state_store(); + +ret = vm_stop_force_state(state); trace_vmstate_downtime_checkpoint("src-vm-stopped"); +trace_migration_completion_vm_stop(ret); return ret; } @@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s) s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); } +bool migrate_mode_is_cpr(MigrationState *s) +{ +return s->parameters.mode == MIG_MODE_CPR_REBOOT; +} + int migrate_init(MigrationState *s, Error **errp) { int ret; @@ -2454,10 +2467,7 @@ static int postcopy_start(MigrationState *ms, Error **errp) bql_lock(); trace_postcopy_start_set_run(); -migration_downtime_start(ms); - -global_state_store(); -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); +ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE); if (ret < 0) { goto fail; } @@ -2652,15 +2662,12 @@ static int migration_completion_precopy(MigrationState *s, int ret; bql_lock(); -migration_downtime_start(s); - -s->vm_old_state = runstate_get(); -global_state_store(); -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); -trace_migration_completion_vm_stop(ret); -if (ret < 0) { -goto out_unlock; +if (!migrate_mode_is_cpr(s)) { +ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE); +if (ret < 0) { +goto out_unlock; +} } ret = migration_maybe_pause(s, current_active_state, @@ -3500,15 +3507,10 @@ static void *bg_migration_thread(void *opaque) s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; trace_migration_thread_setup_complete(); -migration_downtime_start(s); bql_lock(); -s->vm_old_state = runstate_get(); - -global_state_store(); -/* Forcibly stop VM before saving state of vCPUs and devices */ -if (migration_stop_vm(RUN_STATE_PAUSED)) { +if (migration_stop_vm(s, RUN_STATE_PAUSED)) { goto fail; } /* @@ -3584,6 +3586,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) Error *local_err = NULL; uint64_t rate_limit; bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED; +int ret; /* * If there's a previous error, free it and prepare for another one. @@ -3655,6 +3658,14 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) return; } +if (migrate_mode_is_cpr(s)) { +ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE); +if (ret < 0) { +error_setg(_err, "migration_stop_vm failed, error %d", -ret); +goto fail; +} +} + if (migrate_background_snapshot()) { qemu_thread_create(>thread, "bg_snapshot", bg_migration_thread, s, QEMU_THREAD_JOINABLE); -- 2.43.0
[PULL 23/25] migration: Fix qmp_query_migrate mbps value
From: Fabiano Rosas The QMP command query_migrate might see incorrect throughput numbers if it runs after we've set the migration completion status but before migration_calculate_complete() has updated s->total_time and s->mbps. The migration status would show COMPLETED, but the throughput value would be the one from the last iteration and not the one from the whole migration. This will usually be a larger value due to the time period being smaller (one iteration). Move migration_calculate_complete() earlier so that the status MIGRATION_STATUS_COMPLETED is only emitted after the final counters update. Keep everything under the BQL so the QMP thread sees the updates as atomic. Rename migration_calculate_complete to migration_completion_end to reflect its new purpose of also updating s->state. Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240226143335.14282-1-faro...@suse.de Signed-off-by: Peter Xu --- migration/migration.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 7652fd4d14..ccb13fa94a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -107,6 +107,7 @@ static int migration_maybe_pause(MigrationState *s, int new_state); static void migrate_fd_cancel(MigrationState *s); static bool close_return_path_on_source(MigrationState *s); +static void migration_completion_end(MigrationState *s); static void migration_downtime_start(MigrationState *s) { @@ -2787,8 +2788,7 @@ static void migration_completion(MigrationState *s) migrate_set_state(>state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); } else { -migrate_set_state(>state, current_active_state, - MIGRATION_STATUS_COMPLETED); +migration_completion_end(s); } return; @@ -2825,8 +2825,7 @@ static void bg_migration_completion(MigrationState *s) goto fail; } -migrate_set_state(>state, current_active_state, - MIGRATION_STATUS_COMPLETED); +migration_completion_end(s); return; fail: @@ -3028,18 +3027,28 @@ static MigThrError migration_detect_error(MigrationState *s) } } -static void migration_calculate_complete(MigrationState *s) +static void migration_completion_end(MigrationState *s) { uint64_t bytes = migration_transferred_bytes(); int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); int64_t transfer_time; +/* + * Take the BQL here so that query-migrate on the QMP thread sees: + * - atomic update of s->total_time and s->mbps; + * - correct ordering of s->mbps update vs. s->state; + */ +bql_lock(); migration_downtime_end(s); s->total_time = end_time - s->start_time; transfer_time = s->total_time - s->setup_time; if (transfer_time) { s->mbps = ((double) bytes * 8.0) / transfer_time / 1000; } + +migrate_set_state(>state, s->state, + MIGRATION_STATUS_COMPLETED); +bql_unlock(); } static void update_iteration_initial_status(MigrationState *s) @@ -3186,7 +3195,6 @@ static void migration_iteration_finish(MigrationState *s) bql_lock(); switch (s->state) { case MIGRATION_STATUS_COMPLETED: -migration_calculate_complete(s); runstate_set(RUN_STATE_POSTMIGRATE); break; case MIGRATION_STATUS_COLO: @@ -3230,9 +3238,6 @@ static void bg_migration_iteration_finish(MigrationState *s) bql_lock(); switch (s->state) { case MIGRATION_STATUS_COMPLETED: -migration_calculate_complete(s); -break; - case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_FAILED: case MIGRATION_STATUS_CANCELLED: -- 2.43.0
[PULL 18/25] migration: refactor migrate_fd_connect failures
From: Steve Sistare Move common code for the error path in migrate_fd_connect to a shared fail label. No functional change. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/r/1708622920-68779-9-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- migration/migration.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 925103b61a..6a115d28b8 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3627,11 +3627,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) if (migrate_postcopy_ram() || migrate_return_path()) { if (open_return_path_on_source(s)) { error_setg(_err, "Unable to open return-path for postcopy"); -migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED); -migrate_set_error(s, local_err); -error_report_err(local_err); -migrate_fd_cleanup(s); -return; +goto fail; } } @@ -3660,6 +3656,13 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) migration_thread, s, QEMU_THREAD_JOINABLE); } s->migration_thread_running = true; +return; + +fail: +migrate_set_error(s, local_err); +migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED); +error_report_err(local_err); +migrate_fd_cleanup(s); } static void migration_class_init(ObjectClass *klass, void *data) -- 2.43.0
[PULL 12/25] migration: remove error from notifier data
From: Steve Sistare Remove the error object from opaque data passed to notifiers. Use the new error parameter passed to the notifier instead. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/r/1708622920-68779-3-git-send-email-steven.sist...@oracle.com Signed-off-by: Peter Xu --- include/migration/misc.h | 1 - migration/postcopy-ram.h | 1 - hw/virtio/vhost-user.c | 8 migration/postcopy-ram.c | 1 - migration/ram.c | 1 - 5 files changed, 4 insertions(+), 8 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 1bc8902e6d..5e65c18f1a 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -31,7 +31,6 @@ typedef enum PrecopyNotifyReason { typedef struct PrecopyNotifyData { enum PrecopyNotifyReason reason; -Error **errp; } PrecopyNotifyData; void precopy_infrastructure_init(void); diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h index 442ab89752..ecae941211 100644 --- a/migration/postcopy-ram.h +++ b/migration/postcopy-ram.h @@ -128,7 +128,6 @@ enum PostcopyNotifyReason { struct PostcopyNotifyData { enum PostcopyNotifyReason reason; -Error **errp; }; void postcopy_add_notifier(NotifierWithReturn *nn); diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index f502345f37..a1eea8547e 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2096,20 +2096,20 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier, if (!virtio_has_feature(dev->protocol_features, VHOST_USER_PROTOCOL_F_PAGEFAULT)) { /* TODO: Get the device name into this error somehow */ -error_setg(pnd->errp, +error_setg(errp, "vhost-user backend not capable of postcopy"); return -ENOENT; } break; case POSTCOPY_NOTIFY_INBOUND_ADVISE: -return vhost_user_postcopy_advise(dev, pnd->errp); +return vhost_user_postcopy_advise(dev, errp); case POSTCOPY_NOTIFY_INBOUND_LISTEN: -return vhost_user_postcopy_listen(dev, pnd->errp); +return vhost_user_postcopy_listen(dev, errp); case POSTCOPY_NOTIFY_INBOUND_END: -return vhost_user_postcopy_end(dev, pnd->errp); +return vhost_user_postcopy_end(dev, errp); default: /* We ignore notifications we don't know */ diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 3ab2f6b8fd..0273dc6a94 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -77,7 +77,6 @@ int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp) { struct PostcopyNotifyData pnd; pnd.reason = reason; -pnd.errp = errp; return notifier_with_return_list_notify(_notifier_list, , errp); diff --git a/migration/ram.c b/migration/ram.c index 5b6b09edd9..45a00b45ed 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -426,7 +426,6 @@ int precopy_notify(PrecopyNotifyReason reason, Error **errp) { PrecopyNotifyData pnd; pnd.reason = reason; -pnd.errp = errp; return notifier_with_return_list_notify(_notifier_list, , errp); } -- 2.43.0
[PULL 05/25] migration/multifd: Release recv sem_sync earlier
From: Fabiano Rosas Now that multifd_recv_terminate_threads() is called only once, release the recv side sem_sync earlier like we do for the send side. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240220224138.24759-6-faro...@suse.de Signed-off-by: Peter Xu --- migration/multifd.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index fba00b9e8f..43f0820996 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -1104,6 +1104,12 @@ static void multifd_recv_terminate_threads(Error *err) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDRecvParams *p = _recv_state->params[i]; +/* + * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code, + * however try to wakeup it without harm in cleanup phase. + */ +qemu_sem_post(>sem_sync); + /* * We could arrive here for two reasons: * - normal quit, i.e. everything went fine, just finished @@ -1162,12 +1168,6 @@ void multifd_recv_cleanup(void) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDRecvParams *p = _recv_state->params[i]; -/* - * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code, - * however try to wakeup it without harm in cleanup phase. - */ -qemu_sem_post(>sem_sync); - if (p->thread_created) { qemu_thread_join(>thread); } -- 2.43.0