from:"peterx"

[PULL 2/2] MAINTAINERS: Adjust migration documentation files

2024-04-07 Thread peterx

From: Avihai Horon 

Commit 8cb2f8b172e7 ("docs/migration: Create migration/ directory")
changed migration documentation file structure but forgot to update the
entries in the MAINTAINERS file.

Commit 4c6f8a79ae53 ("docs/migration: Split 'dirty limit'") extracted
dirty limit documentation to a new file without updating dirty limit
section in MAINTAINERS file.

Fix the above.

Fixes: 8cb2f8b172e7 ("docs/migration: Create migration/ directory")
Fixes: 4c6f8a79ae53 ("docs/migration: Split 'dirty limit'")
Signed-off-by: Avihai Horon 
Link: https://lore.kernel.org/r/20240407081125.13951-1-avih...@nvidia.com
Signed-off-by: Peter Xu 
---
 MAINTAINERS | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index e71183eef9..d3fc2a06e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2170,7 +2170,7 @@ S: Supported
 F: hw/vfio/*
 F: include/hw/vfio/
 F: docs/igd-assign.txt
-F: docs/devel/vfio-migration.rst
+F: docs/devel/migration/vfio.rst
 
 vfio-ccw
 M: Eric Farman 
@@ -2231,6 +2231,7 @@ F: qapi/virtio.json
 F: net/vhost-user.c
 F: include/hw/virtio/
 F: docs/devel/virtio*
+F: docs/devel/migration/virtio.rst
 
 virtio-balloon
 M: Michael S. Tsirkin 
@@ -3422,7 +3423,7 @@ F: migration/
 F: scripts/vmstate-static-checker.py
 F: tests/vmstate-static-checker-data/
 F: tests/qtest/migration-test.c
-F: docs/devel/migration.rst
+F: docs/devel/migration/
 F: qapi/migration.json
 F: tests/migration/
 F: util/userfaultfd.c
@@ -3442,6 +3443,7 @@ F: include/sysemu/dirtylimit.h
 F: migration/dirtyrate.c
 F: migration/dirtyrate.h
 F: include/sysemu/dirtyrate.h
+F: docs/devel/migration/dirty-limit.rst
 
 Detached LUKS header
 M: Hyman Huang 
-- 
2.44.0

[PULL 0/2] Migration 20240407 patches

2024-04-07 Thread peterx

From: Peter Xu 

The following changes since commit ce64e6224affb8b4e4b019f76d2950270b391af5:

  Merge tag 'qemu-sparc-20240404' of https://github.com/mcayland/qemu into 
staging (2024-04-04 15:28:06 +0100)

are available in the Git repository at:

  https://gitlab.com/peterx/qemu.git tags/migration-20240407-pull-request

for you to fetch changes up to 8e0b21e375f0f6e6dbaeaecc1d52e2220f163e40:

  MAINTAINERS: Adjust migration documentation files (2024-04-07 14:40:55 -0400)


Migration pull for 9.0-rc3

- Wei/Lei's fix on a rare postcopy race that can hang the channel (since 8.0)
- Avihai's fix on maintainers file, points to the right doc links



Avihai Horon (1):
  MAINTAINERS: Adjust migration documentation files

Wei Wang (1):
  migration/postcopy: ensure preempt channel is ready before loading
states

 MAINTAINERS|  6 --
 migration/savevm.c | 21 +
 2 files changed, 25 insertions(+), 2 deletions(-)

-- 
2.44.0

[PULL 1/2] migration/postcopy: ensure preempt channel is ready before loading states

2024-04-07 Thread peterx

From: Wei Wang 

Before loading the guest states, ensure that the preempt channel has been
ready to use, as some of the states (e.g. via virtio_load) might trigger
page faults that will be handled through the preempt channel. So yield to
the main thread in the case that the channel create event hasn't been
dispatched.

Cc: qemu-stable 
Fixes: 9358982744 ("migration: Send requested page directly in rp-return 
thread")
Originally-by: Lei Wang 
Link: 
https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced...@intel.com/T/
Signed-off-by: Lei Wang 
Signed-off-by: Wei Wang 
Link: https://lore.kernel.org/r/20240405034056.23933-1-wei.w.w...@intel.com
[peterx: add a todo section, add Fixes and copy stable for 8.0+]
Signed-off-by: Peter Xu 
---
 migration/savevm.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 388d7af7cd..e7c1215671 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2342,6 +2342,27 @@ static int 
loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
 
 QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
 
+/*
+ * Before loading the guest states, ensure that the preempt channel has
+ * been ready to use, as some of the states (e.g. via virtio_load) might
+ * trigger page faults that will be handled through the preempt channel.
+ * So yield to the main thread in the case that the channel create event
+ * hasn't been dispatched.
+ *
+ * TODO: if we can move migration loadvm out of main thread, then we
+ * won't block main thread from polling the accept() fds.  We can drop
+ * this as a whole when that is done.
+ */
+do {
+if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
+mis->postcopy_qemufile_dst) {
+break;
+}
+
+aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
+qemu_coroutine_yield();
+} while (1);
+
 ret = qemu_loadvm_state_main(packf, mis);
 trace_loadvm_handle_cmd_packaged_main(ret);
 qemu_fclose(packf);
-- 
2.44.0

[PULL 0/2] Migration 20240331 patches

2024-03-31 Thread peterx

From: Peter Xu 

The following changes since commit b9dbf6f9bf533564f6a4277d03906fcd32bb0245:

  Merge tag 'pull-tcg-20240329' of https://gitlab.com/rth7680/qemu into staging 
(2024-03-30 14:54:57 +)

are available in the Git repository at:

  https://gitlab.com/peterx/qemu.git tags/migration-20240331-pull-request

for you to fetch changes up to d0ad271a7613459bd0a3397c8071a4ad06f3f7eb:

  migration/postcopy: Ensure postcopy_start() sets errp if it fails (2024-03-31 
14:30:03 -0400)


Migration pull for 9.0-rc2

- Avihai's two fixes on error paths



Avihai Horon (2):
  migration: Set migration error in migration_completion()
  migration/postcopy: Ensure postcopy_start() sets errp if it fails

 migration/migration.c | 18 ++
 1 file changed, 18 insertions(+)

-- 
2.44.0

[PULL 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails

2024-03-31 Thread peterx

From: Avihai Horon 

There are several places where postcopy_start() fails without setting
errp. This can cause a null pointer de-reference, as in case of error,
the caller of postcopy_start() copies/prints the error set in errp.

Fix it by setting errp in all of postcopy_start() error paths.

Cc: qemu-stable 
Fixes: 908927db28ea ("migration: Update error description whenever migration 
fails")
Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240328140252.16756-3-avih...@nvidia.com
Signed-off-by: Peter Xu 
---
 migration/migration.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index b73ae3a72c..86bf76e925 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2510,6 +2510,8 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 migration_wait_main_channel(ms);
 if (postcopy_preempt_establish_channel(ms)) {
 migrate_set_state(>state, ms->state, MIGRATION_STATUS_FAILED);
+error_setg(errp, "%s: Failed to establish preempt channel",
+   __func__);
 return -1;
 }
 }
@@ -2525,17 +2527,22 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 
 ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "%s: Failed to stop the VM", __func__);
 goto fail;
 }
 
 ret = migration_maybe_pause(ms, _state,
 MIGRATION_STATUS_POSTCOPY_ACTIVE);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()",
+ __func__);
 goto fail;
 }
 
 ret = bdrv_inactivate_all();
 if (ret < 0) {
+error_setg_errno(errp, -ret, "%s: Failed in bdrv_inactivate_all()",
+ __func__);
 goto fail;
 }
 restart_block = true;
@@ -2612,6 +2619,7 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 
 /* Now send that blob */
 if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) {
+error_setg(errp, "%s: Failed to send packaged data", __func__);
 goto fail_closefb;
 }
 qemu_fclose(fb);
-- 
2.44.0

[PULL 1/2] migration: Set migration error in migration_completion()

2024-03-31 Thread peterx

From: Avihai Horon 

After commit 9425ef3f990a ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.

This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set.

This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0d5 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.

Fix it by ensuring migration error is set in case of error in
migration_completion().

Signed-off-by: Avihai Horon 
Reviewed-by: Peter Xu 
Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in 
close_return_path_on_source()")
Acked-by: Cédric Le Goater 
Link: https://lore.kernel.org/r/20240328140252.16756-2-avih...@nvidia.com
Signed-off-by: Peter Xu 
---
 migration/migration.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9fe8fd2afd..b73ae3a72c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
 {
 int ret = 0;
 int current_active_state = s->state;
+Error *local_err = NULL;
 
 if (s->state == MIGRATION_STATUS_ACTIVE) {
 ret = migration_completion_precopy(s, _active_state);
@@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
 return;
 
 fail:
+if (qemu_file_get_error_obj(s->to_dst_file, _err)) {
+migrate_set_error(s, local_err);
+error_free(local_err);
+} else if (ret) {
+error_setg_errno(_err, -ret, "Error in migration completion");
+migrate_set_error(s, local_err);
+error_free(local_err);
+}
+
 migration_completion_failed(s, current_active_state);
 }
 
-- 
2.44.0

[PULL 2/3] migration/postcopy: Fix high frequency sync

2024-03-22 Thread peterx

From: Peter Xu 

With current code base I can observe extremely high sync count during
precopy, as long as one enables postcopy-ram=on before switchover to
postcopy.

To provide some context of when QEMU decides to do a full sync: it checks
must_precopy (which implies "data must be sent during precopy phase"), and
as long as it is lower than the threshold size we calculated (out of
bandwidth and expected downtime) QEMU will kick off the slow/exact sync.

However, when postcopy is enabled (even if still during precopy phase), RAM
only reports all pages as can_postcopy, and report must_precopy==0.  Then
"must_precopy <= threshold_size" mostly always triggers and enforces a slow
sync for every call to migration_iteration_run() when postcopy is enabled
even if not used.  That is insane.

It turns out it was a regress bug introduced in the previous refactoring in
8.0 as reported by Nina [1]:

  (a) c8df4a7aef ("migration: Split save_live_pending() into state_pending_*")

Then a workaround patch is applied at the end of release (8.0-rc4) to fix it:

  (b) 28ef5339c3 ("migration: fix ram_state_pending_exact()")

However that "workaround" was overlooked when during the cleanup in this
9.0 release in this commit..

  (c) b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()")

Then the issue was re-exposed as reported by Nina [1].

The problem with (b) is that it only fixed the case for RAM, rather than
all the rest of iterators.  Here a slow sync should only be required if all
dirty data (precopy+postcopy) is less than the threshold_size that QEMU
calculated.  It is even debatable whether a sync is needed when switched to
postcopy.  Currently ram_state_pending_exact() will be mostly noop if
switched to postcopy, and that logic seems to apply too for all the rest of
iterators, as sync dirty bitmap during a postcopy doesn't make much sense.
However let's leave such change for later, as we're in rc phase.

So rather than reusing commit (b), this patch provides the complete fix for
all iterators.  When at it, cleanup a little bit on the lines around.

[1] https://gitlab.com/qemu-project/qemu/-/issues/1565

Reported-by: Nina Schoetterl-Glausch 
Fixes: b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()")
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240320214453.584374-1-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/migration.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 047b6b49cf..9fe8fd2afd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3199,17 +3199,16 @@ typedef enum {
  */
 static MigIterateState migration_iteration_run(MigrationState *s)
 {
-uint64_t must_precopy, can_postcopy;
+uint64_t must_precopy, can_postcopy, pending_size;
 Error *local_err = NULL;
 bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 bool can_switchover = migration_can_switchover(s);
 
 qemu_savevm_state_pending_estimate(_precopy, _postcopy);
-uint64_t pending_size = must_precopy + can_postcopy;
-
+pending_size = must_precopy + can_postcopy;
 trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy);
 
-if (must_precopy <= s->threshold_size) {
+if (pending_size < s->threshold_size) {
 qemu_savevm_state_pending_exact(_precopy, _postcopy);
 pending_size = must_precopy + can_postcopy;
 trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
-- 
2.44.0

[PULL 3/3] migration/multifd: Fix clearing of mapped-ram zero pages

2024-03-22 Thread peterx

From: Fabiano Rosas 

When the zero page detection is done in the multifd threads, we need
to iterate the second part of the pages->offset array and clear the
file bitmap for each zero page. The piece of code we merged to do that
is wrong.

The reason this has passed all the tests is because the bitmap is
initialized with zeroes already, so clearing the bits only really has
an effect during live migration and when a data page goes from having
data to no data.

Fixes: 303e6f54f9 ("migration/multifd: Implement zero page transmission on the 
multifd thread.")
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240321201242.6009-1-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index d2f0238f70..2802afe79d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -111,7 +111,6 @@ void multifd_send_channel_created(void)
 static void multifd_set_file_bitmap(MultiFDSendParams *p)
 {
 MultiFDPages_t *pages = p->pages;
-uint32_t zero_num = p->pages->num - p->pages->normal_num;
 
 assert(pages->block);
 
@@ -119,7 +118,7 @@ static void multifd_set_file_bitmap(MultiFDSendParams *p)
 ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], true);
 }
 
-for (int i = p->pages->num; i < zero_num; i++) {
+for (int i = p->pages->normal_num; i < p->pages->num; i++) {
 ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], false);
 }
 }
-- 
2.44.0

[PULL 0/3] Migration 20240322 patches

2024-03-22 Thread peterx

From: Peter Xu 

The following changes since commit 853546f8128476eefb701d4a55b2781bb3a46faa:

  Merge tag 'pull-loongarch-20240322' of https://gitlab.com/gaosong/qemu into 
staging (2024-03-22 10:59:57 +)

are available in the Git repository at:

  https://gitlab.com/peterx/qemu.git tags/migration-20240322-pull-request

for you to fetch changes up to 8fa1a21c6edc2bf7de85984944848ab9ac49e937:

  migration/multifd: Fix clearing of mapped-ram zero pages (2024-03-22 12:12:08 
-0400)


Migration pull for 9.0-rc1

- Fabiano's patch to revert fd: support on mapped-ram
- Peter's fix on postcopy regression on unnecessary dirty syncs
- Fabiano's fix on mapped-ram rare corrupt on zero page handling



Fabiano Rosas (2):
  migration: Revert mapped-ram multifd support to fd: URI
  migration/multifd: Fix clearing of mapped-ram zero pages

Peter Xu (1):
  migration/postcopy: Fix high frequency sync

 migration/fd.h   |  2 --
 migration/fd.c   | 56 
 migration/file.c | 19 ++--
 migration/migration.c| 20 ++---
 migration/multifd.c  |  5 +---
 tests/qtest/migration-test.c | 43 ---
 6 files changed, 12 insertions(+), 133 deletions(-)

-- 
2.44.0

[PULL 1/3] migration: Revert mapped-ram multifd support to fd: URI

2024-03-22 Thread peterx

From: Fabiano Rosas 

This reverts commit decdc76772c453ff1444612e910caa0d45cd8eac in full
and also the relevant migration-tests from
7a09f092834641b7a793d50a3a261073bbb404a6.

After the addition of the new QAPI-based migration address API in 8.2
we've been converting an "fd:" URI into a SocketAddress, missing the
fact that the "fd:" syntax could also be used for a plain file instead
of a socket. This is a problem because the SocketAddress is part of
the API, so we're effectively asking users to create a "socket"
channel to pass in a plain file.

The easiest way to fix this situation is to deprecate the usage of
both SocketAddress and "fd:" when used with a plain file for
migration. Since this has been possible since 8.2, we can wait until
9.1 to deprecate it.

For 9.0, however, we should avoid adding further support to migration
to a plain file using the old "fd:" syntax or the new SocketAddress
API, and instead require the usage of either the old-style "file:" URI
or the FileMigrationArgs::filename field of the new API with the
"/dev/fdset/NN" syntax, both of which are already supported.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240319210941.1907-1-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/fd.h   |  2 --
 migration/fd.c   | 56 
 migration/file.c | 19 ++--
 migration/migration.c| 13 -
 migration/multifd.c  |  2 --
 tests/qtest/migration-test.c | 43 ---
 6 files changed, 8 insertions(+), 127 deletions(-)

diff --git a/migration/fd.h b/migration/fd.h
index 0c0a18d9e7..b901bc014e 100644
--- a/migration/fd.h
+++ b/migration/fd.h
@@ -20,6 +20,4 @@ void fd_start_incoming_migration(const char *fdname, Error 
**errp);
 
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
  Error **errp);
-void fd_cleanup_outgoing_migration(void);
-int fd_args_get_fd(void);
 #endif
diff --git a/migration/fd.c b/migration/fd.c
index fe0d096abd..449adaa2de 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -15,42 +15,19 @@
  */
 
 #include "qemu/osdep.h"
-#include "qapi/error.h"
 #include "channel.h"
 #include "fd.h"
 #include "file.h"
 #include "migration.h"
 #include "monitor/monitor.h"
-#include "io/channel-file.h"
-#include "io/channel-socket.h"
 #include "io/channel-util.h"
-#include "options.h"
 #include "trace.h"
 
 
-static struct FdOutgoingArgs {
-int fd;
-} outgoing_args;
-
-int fd_args_get_fd(void)
-{
-return outgoing_args.fd;
-}
-
-void fd_cleanup_outgoing_migration(void)
-{
-if (outgoing_args.fd > 0) {
-close(outgoing_args.fd);
-outgoing_args.fd = -1;
-}
-}
-
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
**errp)
 {
 QIOChannel *ioc;
 int fd = monitor_get_fd(monitor_cur(), fdname, errp);
-int newfd;
-
 if (fd == -1) {
 return;
 }
@@ -62,18 +39,6 @@ void fd_start_outgoing_migration(MigrationState *s, const 
char *fdname, Error **
 return;
 }
 
-/*
- * This is dup()ed just to avoid referencing an fd that might
- * be already closed by the iochannel.
- */
-newfd = dup(fd);
-if (newfd == -1) {
-error_setg_errno(errp, errno, "Could not dup FD %d", fd);
-object_unref(ioc);
-return;
-}
-outgoing_args.fd = newfd;
-
 qio_channel_set_name(ioc, "migration-fd-outgoing");
 migration_channel_connect(s, ioc, NULL, NULL);
 object_unref(OBJECT(ioc));
@@ -104,20 +69,9 @@ void fd_start_incoming_migration(const char *fdname, Error 
**errp)
 return;
 }
 
-if (migrate_multifd()) {
-if (fd_is_socket(fd)) {
-error_setg(errp,
-   "Multifd migration to a socket FD is not supported");
-object_unref(ioc);
-return;
-}
-
-file_create_incoming_channels(ioc, errp);
-} else {
-qio_channel_set_name(ioc, "migration-fd-incoming");
-qio_channel_add_watch_full(ioc, G_IO_IN,
-   fd_accept_incoming_migration,
-   NULL, NULL,
-   g_main_context_get_thread_default());
-}
+qio_channel_set_name(ioc, "migration-fd-incoming");
+qio_channel_add_watch_full(ioc, G_IO_IN,
+   fd_accept_incoming_migration,
+   NULL, NULL,
+   g_main_context_get_thread_default());
 }
diff --git a/migration/file.c b/migration/file.c
index b6e8ba13f2..ab18ba505a 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -11,7 +11,6 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "channel.h"
-#include "fd.h"
 #include "file.h"
 #include "migration.h"
 #include "io/channel-file.h"
@@ -55,27 +54,15 @@ bool file_send_channel_create(gpointer opaque, Error **errp)
 {
 QIOChannelFile

[PATCH] migration/postcopy: Fix high frequency sync

2024-03-20 Thread peterx

From: Peter Xu 

On current code base I can observe extremely high sync count during
precopy, as long as one enables postcopy-ram=on before switchover to
postcopy.

To provide some context of when we decide to do a full sync: we check
must_precopy (which implies "data must be sent during precopy phase"), and
as long as it is lower than the threshold size we calculated (out of
bandwidth and expected downtime) we will kick off the slow sync.

However, when postcopy is enabled (even if still during precopy phase), RAM
only reports all pages as can_postcopy, and report must_precopy==0.  Then
"must_precopy <= threshold_size" mostly always triggers and enforces a slow
sync for every call to migration_iteration_run() when postcopy is enabled
even if not used.  That is insane.

It turns out it was a regress bug introduced in the previous refactoring in
QEMU 8.0 in late 2022. Fix this by checking the whole RAM size rather than
must_precopy, like before.  Not copy stable yet as many things changed, and
even if this should be a major performance regression, no functional change
has observed (and that's also probably why nobody found it).  I only notice
this when looking for another bug reported by Nina.

When at it, cleanup a little bit on the lines around.

Cc: Nina Schoetterl-Glausch 
Fixes: c8df4a7aef ("migration: Split save_live_pending() into state_pending_*")
Signed-off-by: Peter Xu 
---

Nina: I copied you only because this might still be relevant, as this issue
also misteriously points back to c8df4a7aef..  However I don't think it
should be a fix of your problem, at most it can change the possibility of
reproducability.

This is not a regression for this release, but I still want to have it for
9.0.  Fabiano, any opinions / objections?
---
 migration/migration.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 047b6b49cf..9fe8fd2afd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3199,17 +3199,16 @@ typedef enum {
  */
 static MigIterateState migration_iteration_run(MigrationState *s)
 {
-uint64_t must_precopy, can_postcopy;
+uint64_t must_precopy, can_postcopy, pending_size;
 Error *local_err = NULL;
 bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 bool can_switchover = migration_can_switchover(s);
 
 qemu_savevm_state_pending_estimate(_precopy, _postcopy);
-uint64_t pending_size = must_precopy + can_postcopy;
-
+pending_size = must_precopy + can_postcopy;
 trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy);
 
-if (must_precopy <= s->threshold_size) {
+if (pending_size < s->threshold_size) {
 qemu_savevm_state_pending_exact(_precopy, _postcopy);
 pending_size = must_precopy + can_postcopy;
 trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
-- 
2.44.0

[PULL 05/10] physmem: Fix migration dirty bitmap coherency with TCG memory access

2024-03-17 Thread peterx

From: Nicholas Piggin 

The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
aligned ranges forgot to bring the TCG TLB up to date after clearing
some of the dirty memory bitmap bits. This can result in stores though
the TCG TLB not setting the dirty memory bitmap and ultimately causes
memory corruption / lost updates during migration from a TCG host.

Fix this by calling cpu_physical_memory_dirty_bits_cleared() when
dirty bits have been cleared.

Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
Signed-off-by: Nicholas Piggin 
Tested-by: Thomas Huth 
Message-ID: <20240219061731.232570-1-npig...@gmail.com>
[PMD: Split patch in 2: part 2/2, slightly adapt description]
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Link: https://lore.kernel.org/r/20240312201458.79532-4-phi...@linaro.org
Signed-off-by: Peter Xu 
---
 include/exec/ram_addr.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index b060ea9176..de45ba7bc9 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -513,6 +513,9 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
 idx++;
 }
 }
+if (num_dirty) {
+cpu_physical_memory_dirty_bits_cleared(start, length);
+}
 
 if (rb->clear_bmap) {
 /*
-- 
2.44.0

[PULL 10/10] migration/multifd: Duplicate the fd for the outgoing_args

2024-03-17 Thread peterx

From: Fabiano Rosas 

We currently store the file descriptor used during the main outgoing
channel creation to use it again when creating the multifd
channels.

Since this fd is used for the first iochannel, there's risk that the
QIOChannel gets freed and the fd closed while outgoing_args.fd still
has it available. This could lead to an fd-reuse bug.

Duplicate the outgoing_args fd to avoid this issue.

Suggested-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240315032040.7974-3-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/fd.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index c07030f715..fe0d096abd 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -49,8 +49,7 @@ void fd_start_outgoing_migration(MigrationState *s, const 
char *fdname, Error **
 {
 QIOChannel *ioc;
 int fd = monitor_get_fd(monitor_cur(), fdname, errp);
-
-outgoing_args.fd = -1;
+int newfd;
 
 if (fd == -1) {
 return;
@@ -63,7 +62,17 @@ void fd_start_outgoing_migration(MigrationState *s, const 
char *fdname, Error **
 return;
 }
 
-outgoing_args.fd = fd;
+/*
+ * This is dup()ed just to avoid referencing an fd that might
+ * be already closed by the iochannel.
+ */
+newfd = dup(fd);
+if (newfd == -1) {
+error_setg_errno(errp, errno, "Could not dup FD %d", fd);
+object_unref(ioc);
+return;
+}
+outgoing_args.fd = newfd;
 
 qio_channel_set_name(ioc, "migration-fd-outgoing");
 migration_channel_connect(s, ioc, NULL, NULL);
-- 
2.44.0

[PULL 03/10] physmem: Expose tlb_reset_dirty_range_all()

2024-03-17 Thread peterx

From: Philippe Mathieu-Daudé 

In order to call tlb_reset_dirty_range_all() outside of
system/physmem.c, expose its prototype.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Link: https://lore.kernel.org/r/20240312201458.79532-2-phi...@linaro.org
Signed-off-by: Peter Xu 
---
 include/exec/exec-all.h | 1 +
 system/physmem.c| 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ce36bb10d4..3e53501691 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -655,6 +655,7 @@ static inline void mmap_unlock(void) {}
 
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length);
 void tlb_set_dirty(CPUState *cpu, vaddr addr);
+void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length);
 
 MemoryRegionSection *
 address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
diff --git a/system/physmem.c b/system/physmem.c
index 6cfb7a80ab..5441480ff0 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -819,7 +819,7 @@ found:
 return block;
 }
 
-static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length)
+void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length)
 {
 CPUState *cpu;
 ram_addr_t start1;
-- 
2.44.0

[PULL 01/10] io: Introduce qio_channel_file_new_dupfd

2024-03-17 Thread peterx

From: Fabiano Rosas 

Add a new helper function for creating a QIOChannelFile channel with a
duplicated file descriptor. This saves the calling code from having to
do error checking on the dup() call.

Suggested-by: "Daniel P. Berrangé" 
Signed-off-by: Fabiano Rosas 
Reviewed-by: "Daniel P. Berrangé" 
Link: https://lore.kernel.org/r/2024031125.17299-2-faro...@suse.de
Signed-off-by: Peter Xu 
---
 include/io/channel-file.h | 18 ++
 io/channel-file.c | 12 
 2 files changed, 30 insertions(+)

diff --git a/include/io/channel-file.h b/include/io/channel-file.h
index 50e8eb1138..d373a4e44d 100644
--- a/include/io/channel-file.h
+++ b/include/io/channel-file.h
@@ -68,6 +68,24 @@ struct QIOChannelFile {
 QIOChannelFile *
 qio_channel_file_new_fd(int fd);
 
+/**
+ * qio_channel_file_new_dupfd:
+ * @fd: the file descriptor
+ * @errp: pointer to initialized error object
+ *
+ * Create a new IO channel object for a file represented by the @fd
+ * parameter. Like qio_channel_file_new_fd(), but the @fd is first
+ * duplicated with dup().
+ *
+ * The channel will own the duplicated file descriptor and will take
+ * responsibility for closing it, the original FD is owned by the
+ * caller.
+ *
+ * Returns: the new channel object
+ */
+QIOChannelFile *
+qio_channel_file_new_dupfd(int fd, Error **errp);
+
 /**
  * qio_channel_file_new_path:
  * @path: the file path
diff --git a/io/channel-file.c b/io/channel-file.c
index a6ad7770c6..6436cfb6ae 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -45,6 +45,18 @@ qio_channel_file_new_fd(int fd)
 return ioc;
 }
 
+QIOChannelFile *
+qio_channel_file_new_dupfd(int fd, Error **errp)
+{
+int newfd = dup(fd);
+
+if (newfd < 0) {
+error_setg_errno(errp, errno, "Could not dup FD %d", fd);
+return NULL;
+}
+
+return qio_channel_file_new_fd(newfd);
+}
 
 QIOChannelFile *
 qio_channel_file_new_path(const char *path,
-- 
2.44.0

[PULL 04/10] physmem: Factor cpu_physical_memory_dirty_bits_cleared() out

2024-03-17 Thread peterx

From: Nicholas Piggin 

Signed-off-by: Nicholas Piggin 
Tested-by: Thomas Huth 
Message-ID: <20240219061731.232570-1-npig...@gmail.com>
[PMD: Split patch in 2: part 1/2]
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Link: https://lore.kernel.org/r/20240312201458.79532-3-phi...@linaro.org
Signed-off-by: Peter Xu 
---
 include/exec/ram_addr.h | 9 +
 system/physmem.c| 8 +++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 90676093f5..b060ea9176 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -25,6 +25,7 @@
 #include "sysemu/tcg.h"
 #include "exec/ramlist.h"
 #include "exec/ramblock.h"
+#include "exec/exec-all.h"
 
 extern uint64_t total_dirty_pages;
 
@@ -443,6 +444,14 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned 
long *bitmap,
 }
 #endif /* not _WIN32 */
 
+static inline void cpu_physical_memory_dirty_bits_cleared(ram_addr_t start,
+  ram_addr_t length)
+{
+if (tcg_enabled()) {
+tlb_reset_dirty_range_all(start, length);
+}
+
+}
 bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
   ram_addr_t length,
   unsigned client);
diff --git a/system/physmem.c b/system/physmem.c
index 5441480ff0..a4fe3d2bf8 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -881,8 +881,8 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t 
start,
 memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size);
 }
 
-if (dirty && tcg_enabled()) {
-tlb_reset_dirty_range_all(start, length);
+if (dirty) {
+cpu_physical_memory_dirty_bits_cleared(start, length);
 }
 
 return dirty;
@@ -929,9 +929,7 @@ DirtyBitmapSnapshot 
*cpu_physical_memory_snapshot_and_clear_dirty
 }
 }
 
-if (tcg_enabled()) {
-tlb_reset_dirty_range_all(start, length);
-}
+cpu_physical_memory_dirty_bits_cleared(start, length);
 
 memory_region_clear_dirty_bitmap(mr, offset, length);
 
-- 
2.44.0

[PULL 02/10] migration: Fix error handling after dup in file migration

2024-03-17 Thread peterx

From: Fabiano Rosas 

The file migration code was allowing a possible -1 from a failed call
to dup() to propagate into the new QIOFileChannel::fd before checking
for validity. Coverity doesn't like that, possibly due to the the
lseek(-1, ...) call that would ensue before returning from the channel
creation routine.

Use the newly introduced qio_channel_file_dupfd() to properly check
the return of dup() before proceeding.

Fixes: CID 1539961
Fixes: CID 1539965
Fixes: CID 1539960
Fixes: 2dd7ee7a51 ("migration/multifd: Add incoming QIOChannelFile support")
Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Reported-by: Peter Maydell 
Signed-off-by: Fabiano Rosas 
Reviewed-by: "Daniel P. Berrangé" 
Link: https://lore.kernel.org/r/2024031125.17299-3-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/fd.c   |  9 -
 migration/file.c | 14 +++---
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index d4ae72d132..4e2a63a73d 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -80,6 +80,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
 void fd_start_incoming_migration(const char *fdname, Error **errp)
 {
 QIOChannel *ioc;
+QIOChannelFile *fioc;
 int fd = monitor_fd_param(monitor_cur(), fdname, errp);
 if (fd == -1) {
 return;
@@ -103,15 +104,13 @@ void fd_start_incoming_migration(const char *fdname, 
Error **errp)
 int channels = migrate_multifd_channels();
 
 while (channels--) {
-ioc = QIO_CHANNEL(qio_channel_file_new_fd(dup(fd)));
-
-if (QIO_CHANNEL_FILE(ioc)->fd == -1) {
-error_setg(errp, "Failed to duplicate fd %d", fd);
+fioc = qio_channel_file_new_dupfd(fd, errp);
+if (!fioc) {
 return;
 }
 
 qio_channel_set_name(ioc, "migration-fd-incoming");
-qio_channel_add_watch_full(ioc, G_IO_IN,
+qio_channel_add_watch_full(QIO_CHANNEL(fioc), G_IO_IN,
fd_accept_incoming_migration,
NULL, NULL,
g_main_context_get_thread_default());
diff --git a/migration/file.c b/migration/file.c
index b0b963e0ce..e56c5eb0a5 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -58,12 +58,13 @@ bool file_send_channel_create(gpointer opaque, Error **errp)
 int fd = fd_args_get_fd();
 
 if (fd && fd != -1) {
-ioc = qio_channel_file_new_fd(dup(fd));
+ioc = qio_channel_file_new_dupfd(fd, errp);
 } else {
 ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
-if (!ioc) {
-goto out;
-}
+}
+
+if (!ioc) {
+goto out;
 }
 
 multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
@@ -147,10 +148,9 @@ void file_start_incoming_migration(FileMigrationArgs 
*file_args, Error **errp)
NULL, NULL,
g_main_context_get_thread_default());
 
-fioc = qio_channel_file_new_fd(dup(fioc->fd));
+fioc = qio_channel_file_new_dupfd(fioc->fd, errp);
 
-if (!fioc || fioc->fd == -1) {
-error_setg(errp, "Error creating migration incoming channel");
+if (!fioc) {
 break;
 }
 } while (++i < channels);
-- 
2.44.0

[PULL 06/10] migration: Skip only empty block devices

2024-03-17 Thread peterx

From: Cédric Le Goater 

The block .save_setup() handler calls a helper routine
init_blk_migration() which builds a list of block devices to take into
account for migration. When one device is found to be empty (sectors
== 0), the loop exits and all the remaining devices are ignored. This
is a regression introduced when bdrv_iterate() was removed.

Change that by skipping only empty devices.

Cc: Markus Armbruster 
Cc: qemu-stable 
Suggested-by: Kevin Wolf 
Fixes: fea68bb6e9fa ("block: Eliminate bdrv_iterate(), use bdrv_next()")
Signed-off-by: Cédric Le Goater 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
Link: https://lore.kernel.org/r/20240312120431.550054-1-...@redhat.com
[peterx: fix "Suggested-by:"]
Signed-off-by: Peter Xu 
---
 migration/block.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/migration/block.c b/migration/block.c
index 8c6ebafacc..2b9054889a 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -402,7 +402,10 @@ static int init_blk_migration(QEMUFile *f)
 }
 
 sectors = bdrv_nb_sectors(bs);
-if (sectors <= 0) {
+if (sectors == 0) {
+continue;
+}
+if (sectors < 0) {
 ret = sectors;
 bdrv_next_cleanup();
 goto out;
-- 
2.44.0

[PULL 09/10] migration/multifd: Ensure we're not given a socket for file migration

2024-03-17 Thread peterx

From: Fabiano Rosas 

When doing migration using the fd: URI, QEMU will fetch the file
descriptor passed in via the monitor at
fd_start_outgoing|incoming_migration(), which means the checks at
migration_channels_and_transport_compatible() happen too soon and we
don't know at that point whether the FD refers to a plain file or a
socket.

For this reason, we've been allowing a migration channel of type
SOCKET_ADDRESS_TYPE_FD to pass the initial verifications in scenarios
where the socket migration is not supported, such as with fd + multifd.

The commit decdc76772 ("migration/multifd: Add mapped-ram support to
fd: URI") was supposed to add a second check prior to starting
migration to make sure a socket fd is not passed instead of a file fd,
but failed to do so.

Add the missing verification and update the comment explaining this
situation which is currently incorrect.

Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240315032040.7974-2-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/fd.c| 8 
 migration/file.c  | 7 +++
 migration/migration.c | 6 +++---
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index 39a52e5c90..c07030f715 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -22,6 +22,7 @@
 #include "migration.h"
 #include "monitor/monitor.h"
 #include "io/channel-file.h"
+#include "io/channel-socket.h"
 #include "io/channel-util.h"
 #include "options.h"
 #include "trace.h"
@@ -95,6 +96,13 @@ void fd_start_incoming_migration(const char *fdname, Error 
**errp)
 }
 
 if (migrate_multifd()) {
+if (fd_is_socket(fd)) {
+error_setg(errp,
+   "Multifd migration to a socket FD is not supported");
+object_unref(ioc);
+return;
+}
+
 file_create_incoming_channels(ioc, errp);
 } else {
 qio_channel_set_name(ioc, "migration-fd-incoming");
diff --git a/migration/file.c b/migration/file.c
index ddde0ca818..b6e8ba13f2 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -15,6 +15,7 @@
 #include "file.h"
 #include "migration.h"
 #include "io/channel-file.h"
+#include "io/channel-socket.h"
 #include "io/channel-util.h"
 #include "options.h"
 #include "trace.h"
@@ -58,6 +59,12 @@ bool file_send_channel_create(gpointer opaque, Error **errp)
 int fd = fd_args_get_fd();
 
 if (fd && fd != -1) {
+if (fd_is_socket(fd)) {
+error_setg(errp,
+   "Multifd migration to a socket FD is not supported");
+goto out;
+}
+
 ioc = qio_channel_file_new_dupfd(fd, errp);
 } else {
 ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
diff --git a/migration/migration.c b/migration/migration.c
index 644e073b7d..f60bd371e3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -166,9 +166,9 @@ static bool transport_supports_seeking(MigrationAddress 
*addr)
 }
 
 /*
- * At this point, the user might not yet have passed the file
- * descriptor to QEMU, so we cannot know for sure whether it
- * refers to a plain file or a socket. Let it through anyway.
+ * At this point QEMU has not yet fetched the fd passed in by the
+ * user, so we cannot know for sure whether it refers to a plain
+ * file or a socket. Let it through anyway and check at fd.c.
  */
 if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
 return addr->u.socket.type == SOCKET_ADDRESS_TYPE_FD;
-- 
2.44.0

[PULL 07/10] migration: cpr-reboot documentation

2024-03-17 Thread peterx

From: Steve Sistare 

Signed-off-by: Steve Sistare 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Fabiano Rosas 
Link: 
https://lore.kernel.org/r/1710338119-330923-1-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 docs/devel/migration/CPR.rst  | 147 ++
 docs/devel/migration/features.rst |   1 +
 2 files changed, 148 insertions(+)
 create mode 100644 docs/devel/migration/CPR.rst

diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst
new file mode 100644
index 00..63c36470cf
--- /dev/null
+++ b/docs/devel/migration/CPR.rst
@@ -0,0 +1,147 @@
+CheckPoint and Restart (CPR)
+
+
+CPR is the umbrella name for a set of migration modes in which the
+VM is migrated to a new QEMU instance on the same host.  It is
+intended for use when the goal is to update host software components
+that run the VM, such as QEMU or even the host kernel.  At this time,
+cpr-reboot is the only available mode.
+
+Because QEMU is restarted on the same host, with access to the same
+local devices, CPR is allowed in certain cases where normal migration
+would be blocked.  However, the user must not modify the contents of
+guest block devices between quitting old QEMU and starting new QEMU.
+
+CPR unconditionally stops VM execution before memory is saved, and
+thus does not depend on any form of dirty page tracking.
+
+cpr-reboot mode
+---
+
+In this mode, QEMU stops the VM, and writes VM state to the migration
+URI, which will typically be a file.  After quitting QEMU, the user
+resumes by running QEMU with the ``-incoming`` option.  Because the
+old and new QEMU instances are not active concurrently, the URI cannot
+be a type that streams data from one instance to the other.
+
+Guest RAM can be saved in place if backed by shared memory, or can be
+copied to a file.  The former is more efficient and is therefore
+preferred.
+
+After state and memory are saved, the user may update userland host
+software before restarting QEMU and resuming the VM.  Further, if
+the RAM is backed by persistent shared memory, such as a DAX device,
+then the user may reboot to a new host kernel before restarting QEMU.
+
+This mode supports VFIO devices provided the user first puts the
+guest in the suspended runstate, such as by issuing the
+``guest-suspend-ram`` command to the QEMU guest agent.  The agent
+must be pre-installed in the guest, and the guest must support
+suspend to RAM.  Beware that suspension can take a few seconds, so
+the user should poll to see the suspended state before proceeding
+with the CPR operation.
+
+Usage
+^
+
+It is recommended that guest RAM be backed with some type of shared
+memory, such as ``memory-backend-file,share=on``, and that the
+``x-ignore-shared`` capability be set.  This combination allows memory
+to be saved in place.  Otherwise, after QEMU stops the VM, all guest
+RAM is copied to the migration URI.
+
+Outgoing:
+  * Set the migration mode parameter to ``cpr-reboot``.
+  * Set the ``x-ignore-shared`` capability if desired.
+  * Issue the ``migrate`` command.  It is recommended the the URI be a
+``file`` type, but one can use other types such as ``exec``,
+provided the command captures all the data from the outgoing side,
+and provides all the data to the incoming side.
+  * Quit when QEMU reaches the postmigrate state.
+
+Incoming:
+  * Start QEMU with the ``-incoming defer`` option.
+  * Set the migration mode parameter to ``cpr-reboot``.
+  * Set the ``x-ignore-shared`` capability if desired.
+  * Issue the ``migrate-incoming`` command.
+  * If the VM was running when the outgoing ``migrate`` command was
+issued, then QEMU automatically resumes VM execution.
+
+Example 1
+^
+::
+
+  # qemu-kvm -monitor stdio
+  -object 
memory-backend-file,id=ram0,size=4G,mem-path=/dev/dax0.0,align=2M,share=on -m 4G
+  ...
+
+  (qemu) info status
+  VM status: running
+  (qemu) migrate_set_parameter mode cpr-reboot
+  (qemu) migrate_set_capability x-ignore-shared on
+  (qemu) migrate -d file:vm.state
+  (qemu) info status
+  VM status: paused (postmigrate)
+  (qemu) quit
+
+  ### optionally update kernel and reboot
+  # systemctl kexec
+  kexec_core: Starting new kernel
+  ...
+
+  # qemu-kvm ... -incoming defer
+  (qemu) info status
+  VM status: paused (inmigrate)
+  (qemu) migrate_set_parameter mode cpr-reboot
+  (qemu) migrate_set_capability x-ignore-shared on
+  (qemu) migrate_incoming file:vm.state
+  (qemu) info status
+  VM status: running
+
+Example 2: VFIO
+^^^
+::
+
+  # qemu-kvm -monitor stdio
+  -object 
memory-backend-file,id=ram0,size=4G,mem-path=/dev/dax0.0,align=2M,share=on -m 4G
+  -device vfio-pci, ...
+  -chardev socket,id=qga0,path=qga.sock,server=on,wait=off
+  -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
+  ...
+
+  (qemu) info status
+  VM status: running
+
+  # echo '{"execute":"guest-suspend-ram"}' | ncat --send-only -U

[PULL 08/10] migration: Fix iocs leaks during file and fd migration

2024-03-17 Thread peterx

From: Fabiano Rosas 

The memory for the io channels is being leaked in three different ways
during file migration:

1) if the offset check fails we never drop the ioc reference;

2) we allocate an extra channel for no reason;

3) if multifd is enabled but channel creation fails when calling
   dup(), we leave the previous channels around along with the glib
   polling;

Fix all issues by restructuring the code to first allocate the
channels and only register the watches when all channels have been
created.

For multifd, the file and fd migrations can share code because both
are backed by a QIOChannelFile. For the non-multifd case, the fd needs
to be separate because it is backed by a QIOChannelSocket.

Fixes: 2dd7ee7a51 ("migration/multifd: Add incoming QIOChannelFile support")
Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Reported-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240313212824.16974-2-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/file.h |  1 +
 migration/fd.c   | 29 +++-
 migration/file.c | 58 ++--
 3 files changed, 46 insertions(+), 42 deletions(-)

diff --git a/migration/file.h b/migration/file.h
index 9f71e87f74..7699c04677 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -20,6 +20,7 @@ void file_start_outgoing_migration(MigrationState *s,
 int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
 void file_cleanup_outgoing_migration(void);
 bool file_send_channel_create(gpointer opaque, Error **errp);
+void file_create_incoming_channels(QIOChannel *ioc, Error **errp);
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
 int niov, RAMBlock *block, Error **errp);
 int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp);
diff --git a/migration/fd.c b/migration/fd.c
index 4e2a63a73d..39a52e5c90 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -18,6 +18,7 @@
 #include "qapi/error.h"
 #include "channel.h"
 #include "fd.h"
+#include "file.h"
 #include "migration.h"
 #include "monitor/monitor.h"
 #include "io/channel-file.h"
@@ -80,7 +81,6 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
 void fd_start_incoming_migration(const char *fdname, Error **errp)
 {
 QIOChannel *ioc;
-QIOChannelFile *fioc;
 int fd = monitor_fd_param(monitor_cur(), fdname, errp);
 if (fd == -1) {
 return;
@@ -94,26 +94,13 @@ void fd_start_incoming_migration(const char *fdname, Error 
**errp)
 return;
 }
 
-qio_channel_set_name(ioc, "migration-fd-incoming");
-qio_channel_add_watch_full(ioc, G_IO_IN,
-   fd_accept_incoming_migration,
-   NULL, NULL,
-   g_main_context_get_thread_default());
-
 if (migrate_multifd()) {
-int channels = migrate_multifd_channels();
-
-while (channels--) {
-fioc = qio_channel_file_new_dupfd(fd, errp);
-if (!fioc) {
-return;
-}
-
-qio_channel_set_name(ioc, "migration-fd-incoming");
-qio_channel_add_watch_full(QIO_CHANNEL(fioc), G_IO_IN,
-   fd_accept_incoming_migration,
-   NULL, NULL,
-   g_main_context_get_thread_default());
-}
+file_create_incoming_channels(ioc, errp);
+} else {
+qio_channel_set_name(ioc, "migration-fd-incoming");
+qio_channel_add_watch_full(ioc, G_IO_IN,
+   fd_accept_incoming_migration,
+   NULL, NULL,
+   g_main_context_get_thread_default());
 }
 }
diff --git a/migration/file.c b/migration/file.c
index e56c5eb0a5..ddde0ca818 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -115,13 +115,46 @@ static gboolean file_accept_incoming_migration(QIOChannel 
*ioc,
 return G_SOURCE_REMOVE;
 }
 
+void file_create_incoming_channels(QIOChannel *ioc, Error **errp)
+{
+int i, fd, channels = 1;
+g_autofree QIOChannel **iocs = NULL;
+
+if (migrate_multifd()) {
+channels += migrate_multifd_channels();
+}
+
+iocs = g_new0(QIOChannel *, channels);
+fd = QIO_CHANNEL_FILE(ioc)->fd;
+iocs[0] = ioc;
+
+for (i = 1; i < channels; i++) {
+QIOChannelFile *fioc = qio_channel_file_new_dupfd(fd, errp);
+
+if (!fioc) {
+while (i) {
+object_unref(iocs[--i]);
+}
+return;
+}
+
+iocs[i] = QIO_CHANNEL(fioc);
+}
+
+for (i = 0; i < channels; i++) {
+qio_channel_set_name(iocs[i], "migration-file-incoming");
+qio_channel_add_watch_full(iocs[i], G_IO_IN,
+   file_accept_incoming_migration,
+   NULL, NULL,
+

[PULL 00/10] Migration 20240317 patches

2024-03-17 Thread peterx

From: Peter Xu 

The following changes since commit 35ac6831d98e18e2c78c85c93e3a6ca1f1ae3e58:

  Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging 
(2024-03-12 13:42:57 +)

are available in the Git repository at:

  https://gitlab.com/peterx/qemu.git tags/migration-20240317-pull-request

for you to fetch changes up to 9adfb308c1513562d6acec02aa780c5ef9b0193d:

  migration/multifd: Duplicate the fd for the outgoing_args (2024-03-15 
11:26:33 -0400)


Migration pull for 9.0-rc0

- Nicholas/Phil's fix on migration corruption / inconsistent for tcg
- Cedric's fix on block migration over n_sectors==0
- Steve's CPR reboot documentation page
- Fabiano's misc fixes on mapped-ram (IOC leak, dup() errors, fd checks, fd
  use race, etc.)



Cédric Le Goater (1):
  migration: Skip only empty block devices

Fabiano Rosas (5):
  io: Introduce qio_channel_file_new_dupfd
  migration: Fix error handling after dup in file migration
  migration: Fix iocs leaks during file and fd migration
  migration/multifd: Ensure we're not given a socket for file migration
  migration/multifd: Duplicate the fd for the outgoing_args

Nicholas Piggin (2):
  physmem: Factor cpu_physical_memory_dirty_bits_cleared() out
  physmem: Fix migration dirty bitmap coherency with TCG memory access

Philippe Mathieu-Daudé (1):
  physmem: Expose tlb_reset_dirty_range_all()

Steve Sistare (1):
  migration: cpr-reboot documentation

 docs/devel/migration/CPR.rst  | 147 ++
 docs/devel/migration/features.rst |   1 +
 include/exec/exec-all.h   |   1 +
 include/exec/ram_addr.h   |  12 +++
 include/io/channel-file.h |  18 
 migration/file.h  |   1 +
 io/channel-file.c |  12 +++
 migration/block.c |   5 +-
 migration/fd.c|  51 ++-
 migration/file.c  |  75 +--
 migration/migration.c |   6 +-
 system/physmem.c  |  10 +-
 12 files changed, 279 insertions(+), 60 deletions(-)
 create mode 100644 docs/devel/migration/CPR.rst

-- 
2.44.0

[PULL 20/34] migration: export migration_is_running

2024-03-11 Thread peterx

From: Steve Sistare 

Delete the MigrationState parameter from migration_is_running and move
it to the public API in misc.h.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-5-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h   |  1 +
 migration/migration.h  |  2 --
 migration/migration.c  | 10 ++
 migration/options.c|  4 ++--
 migration/savevm.c |  2 +-
 system/dirtylimit.c|  2 +-
 target/riscv/kvm/kvm-cpu.c |  4 ++--
 7 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e1f1bf853e..7526977de6 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -106,6 +106,7 @@ int migration_call_notifiers(MigrationState *s, 
MigrationEventType type,
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
+bool migration_is_running(void);
 /* ...and after the device transmission */
 /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
diff --git a/migration/migration.h b/migration/migration.h
index 736460aa8b..e4983db9c9 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -479,8 +479,6 @@ bool migrate_has_error(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
-bool migration_is_running(int state);
-
 int migrate_init(MigrationState *s, Error **errp);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
diff --git a/migration/migration.c b/migration/migration.c
index 17859cbaee..546ba86c63 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1103,9 +1103,11 @@ bool migration_is_setup_or_active(void)
 }
 }
 
-bool migration_is_running(int state)
+bool migration_is_running(void)
 {
-switch (state) {
+MigrationState *s = current_migration;
+
+switch (s->state) {
 case MIGRATION_STATUS_ACTIVE:
 case MIGRATION_STATUS_POSTCOPY_ACTIVE:
 case MIGRATION_STATUS_POSTCOPY_PAUSED:
@@ -1477,7 +1479,7 @@ static void migrate_fd_cancel(MigrationState *s)
 
 do {
 old_state = s->state;
-if (!migration_is_running(old_state)) {
+if (!migration_is_running()) {
 break;
 }
 /* If the migration is paused, kick it out of the pause */
@@ -1962,7 +1964,7 @@ static bool migrate_prepare(MigrationState *s, bool blk, 
bool blk_inc,
 return true;
 }
 
-if (migration_is_running(s->state)) {
+if (migration_is_running()) {
 error_setg(errp, QERR_MIGRATION_ACTIVE);
 return false;
 }
diff --git a/migration/options.c b/migration/options.c
index 40eb930940..642cfb00a3 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -681,7 +681,7 @@ bool migrate_cap_set(int cap, bool value, Error **errp)
 MigrationState *s = migrate_get_current();
 bool new_caps[MIGRATION_CAPABILITY__MAX];
 
-if (migration_is_running(s->state)) {
+if (migration_is_running()) {
 error_setg(errp, QERR_MIGRATION_ACTIVE);
 return false;
 }
@@ -725,7 +725,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 MigrationCapabilityStatusList *cap;
 bool new_caps[MIGRATION_CAPABILITY__MAX];
 
-if (migration_is_running(s->state) || migration_in_colo_state()) {
+if (migration_is_running() || migration_in_colo_state()) {
 error_setg(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index 76b57a9888..388d7af7cd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1706,7 +1706,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 MigrationState *ms = migrate_get_current();
 MigrationStatus status;
 
-if (migration_is_running(ms->state)) {
+if (migration_is_running()) {
 error_setg(errp, QERR_MIGRATION_ACTIVE);
 return -EINVAL;
 }
diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index 051e0311c1..1622bb7426 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -451,7 +451,7 @@ static bool dirtylimit_is_allowed(void)
 {
 MigrationState *ms = migrate_get_current();
 
-if (migration_is_running(ms->state) &&
+if (migration_is_running() &&
 (!qemu_thread_is_self(>thread)) &&
 migrate_dirty_limit() &&
 dirtylimit_in_service()) {
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index c7afdb1e81..cda7d78a77 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -44,7 +44,7 @@
 #include "kvm_riscv.h"
 #include "sbi_ecall_interface.h"
 #include "chardev/char-fe.h"
-#include "migration/migration.h"
+#include "migration/misc.h"
 #include "sysemu/runstate.h"
 #include "hw/riscv/numa.h"
 
@@ -729,7 +729,7 @@ static void kvm_riscv_put_regs_timer(CPUState

[PULL 31/34] migration/multifd: Implement zero page transmission on the multifd thread.

2024-03-11 Thread peterx

From: Hao Xiang 

1. Add zero_pages field in MultiFDPacket_t.
2. Implements the zero page detection and handling on the multifd
threads for non-compression, zlib and zstd compression backends.
3. Added a new value 'multifd' in ZeroPageDetection enumeration.
4. Adds zero page counters and updates multifd send/receive tracing
format to track the newly added counters.

Signed-off-by: Hao Xiang 
Acked-by: Markus Armbruster 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xi...@linux.dev
Signed-off-by: Peter Xu 
---
 qapi/migration.json  |  7 ++-
 migration/multifd.h  | 23 +++-
 hw/core/qdev-properties-system.c |  2 +-
 migration/multifd-zero-page.c| 87 ++
 migration/multifd-zlib.c | 21 ++--
 migration/multifd-zstd.c | 20 +--
 migration/multifd.c  | 90 +++-
 migration/ram.c  |  1 -
 migration/meson.build|  1 +
 migration/trace-events   |  8 +--
 10 files changed, 228 insertions(+), 32 deletions(-)
 create mode 100644 migration/multifd-zero-page.c

diff --git a/qapi/migration.json b/qapi/migration.json
index 83fdef73b9..2684e4e9ac 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -677,10 +677,15 @@
 #
 # @legacy: Perform zero page checking in main migration thread.
 #
+# @multifd: Perform zero page checking in multifd sender thread if
+# multifd migration is enabled, else in the main migration
+# thread as for @legacy.
+#
 # Since: 9.0
+#
 ##
 { 'enum': 'ZeroPageDetection',
-  'data': [ 'none', 'legacy' ] }
+  'data': [ 'none', 'legacy', 'multifd' ] }
 
 ##
 # @BitmapMigrationBitmapAliasTransform:
diff --git a/migration/multifd.h b/migration/multifd.h
index 7447c2bea3..c9d9b09239 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -55,14 +55,24 @@ typedef struct {
 /* size of the next packet that contains pages */
 uint32_t next_packet_size;
 uint64_t packet_num;
-uint64_t unused[4];/* Reserved for future use */
+/* zero pages */
+uint32_t zero_pages;
+uint32_t unused32[1];/* Reserved for future use */
+uint64_t unused64[3];/* Reserved for future use */
 char ramblock[256];
+/*
+ * This array contains the pointers to:
+ *  - normal pages (initial normal_pages entries)
+ *  - zero pages (following zero_pages entries)
+ */
 uint64_t offset[];
 } __attribute__((packed)) MultiFDPacket_t;
 
 typedef struct {
 /* number of used pages */
 uint32_t num;
+/* number of normal pages */
+uint32_t normal_num;
 /* number of allocated pages */
 uint32_t allocated;
 /* offset of each page */
@@ -136,6 +146,8 @@ typedef struct {
 uint64_t packets_sent;
 /* non zero pages sent through this channel */
 uint64_t total_normal_pages;
+/* zero pages sent through this channel */
+uint64_t total_zero_pages;
 /* buffers to send */
 struct iovec *iov;
 /* number of iovs used */
@@ -194,12 +206,18 @@ typedef struct {
 uint8_t *host;
 /* non zero pages recv through this channel */
 uint64_t total_normal_pages;
+/* zero pages recv through this channel */
+uint64_t total_zero_pages;
 /* buffers to recv */
 struct iovec *iov;
 /* Pages that are not zero */
 ram_addr_t *normal;
 /* num of non zero pages */
 uint32_t normal_num;
+/* Pages that are zero */
+ram_addr_t *zero;
+/* num of zero pages */
+uint32_t zero_num;
 /* used for de-compression methods */
 void *compress_data;
 } MultiFDRecvParams;
@@ -221,6 +239,9 @@ typedef struct {
 
 void multifd_register_ops(int method, MultiFDMethods *ops);
 void multifd_send_fill_packet(MultiFDSendParams *p);
+bool multifd_send_prepare_common(MultiFDSendParams *p);
+void multifd_send_zero_page_detect(MultiFDSendParams *p);
+void multifd_recv_zero_page_process(MultiFDRecvParams *p);
 
 static inline void multifd_send_prepare_header(MultiFDSendParams *p)
 {
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 71a21bf24e..7eca2f2377 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -696,7 +696,7 @@ const PropertyInfo qdev_prop_granule_mode = {
 const PropertyInfo qdev_prop_zero_page_detection = {
 .name = "ZeroPageDetection",
 .description = "zero_page_detection values, "
-   "none,legacy",
+   "none,legacy,multifd",
 .enum_table = _lookup,
 .get = qdev_propinfo_get_enum,
 .set = qdev_propinfo_set_enum,
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
new file mode 100644
index 00..1ba38be636
--- /dev/null
+++ b/migration/multifd-zero-page.c
@@ -0,0 +1,87 @@
+/*
+ * Multifd zero page detection implementation.
+ *
+ * Copyright (c) 2024 Bytedance Inc
+ *
+ * Authors:
+ *  Hao Xiang 
+ *
+ * This work is licensed under the terms

[PULL 30/34] migration/multifd: Add new migration option zero-page-detection.

2024-03-11 Thread peterx

From: Hao Xiang 

This new parameter controls where the zero page checking is running.
1. If this parameter is set to 'legacy', zero page checking is
done in the migration main thread.
2. If this parameter is set to 'none', zero page checking is disabled.

Signed-off-by: Hao Xiang 
Reviewed-by: Peter Xu 
Acked-by: Markus Armbruster 
Link: https://lore.kernel.org/r/20240311180015.3359271-4-hao.xi...@linux.dev
Signed-off-by: Peter Xu 
---
 qapi/migration.json | 33 ++---
 include/hw/qdev-properties-system.h |  4 
 migration/options.h |  1 +
 hw/core/qdev-properties-system.c| 10 +
 migration/migration-hmp-cmds.c  |  9 
 migration/options.c | 21 ++
 migration/ram.c |  4 
 7 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 51d188b902..83fdef73b9 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -670,6 +670,18 @@
 { 'enum': 'MigMode',
   'data': [ 'normal', 'cpr-reboot' ] }
 
+##
+# @ZeroPageDetection:
+#
+# @none: Do not perform zero page checking.
+#
+# @legacy: Perform zero page checking in main migration thread.
+#
+# Since: 9.0
+##
+{ 'enum': 'ZeroPageDetection',
+  'data': [ 'none', 'legacy' ] }
+
 ##
 # @BitmapMigrationBitmapAliasTransform:
 #
@@ -891,6 +903,10 @@
 # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
 #(Since 8.2)
 #
+# @zero-page-detection: Whether and how to detect zero pages.
+# See description in @ZeroPageDetection.  Default is 'legacy'.
+# (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -924,7 +940,8 @@
'block-bitmap-mapping',
{ 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
'vcpu-dirty-limit',
-   'mode'] }
+   'mode',
+   'zero-page-detection'] }
 
 ##
 # @MigrateSetParameters:
@@ -1083,6 +1100,10 @@
 # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
 #(Since 8.2)
 #
+# @zero-page-detection: Whether and how to detect zero pages.
+# See description in @ZeroPageDetection.  Default is 'legacy'.
+# (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1136,7 +1157,8 @@
 '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
 'features': [ 'unstable' ] },
 '*vcpu-dirty-limit': 'uint64',
-'*mode': 'MigMode'} }
+'*mode': 'MigMode',
+'*zero-page-detection': 'ZeroPageDetection'} }
 
 ##
 # @migrate-set-parameters:
@@ -1311,6 +1333,10 @@
 # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
 #(Since 8.2)
 #
+# @zero-page-detection: Whether and how to detect zero pages.
+# See description in @ZeroPageDetection.  Default is 'legacy'.
+# (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1361,7 +1387,8 @@
 '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
 'features': [ 'unstable' ] },
 '*vcpu-dirty-limit': 'uint64',
-'*mode': 'MigMode'} }
+'*mode': 'MigMode',
+'*zero-page-detection': 'ZeroPageDetection'} }
 
 ##
 # @query-migrate-parameters:
diff --git a/include/hw/qdev-properties-system.h 
b/include/hw/qdev-properties-system.h
index 626be87dd3..438f65389f 100644
--- a/include/hw/qdev-properties-system.h
+++ b/include/hw/qdev-properties-system.h
@@ -9,6 +9,7 @@ extern const PropertyInfo qdev_prop_reserved_region;
 extern const PropertyInfo qdev_prop_multifd_compression;
 extern const PropertyInfo qdev_prop_mig_mode;
 extern const PropertyInfo qdev_prop_granule_mode;
+extern const PropertyInfo qdev_prop_zero_page_detection;
 extern const PropertyInfo qdev_prop_losttickpolicy;
 extern const PropertyInfo qdev_prop_blockdev_on_error;
 extern const PropertyInfo qdev_prop_bios_chs_trans;
@@ -50,6 +51,9 @@ extern const PropertyInfo qdev_prop_iothread_vq_mapping_list;
MigMode)
 #define DEFINE_PROP_GRANULE_MODE(_n, _s, _f, _d) \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_granule_mode, GranuleMode)
+#define DEFINE_PROP_ZERO_PAGE_DETECTION(_n, _s, _f, _d) \
+DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_zero_page_detection, \
+   ZeroPageDetection)
 #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
 LostTickPolicy)
diff --git a/migration/options.h b/migration/options.h
index b6b69c2bb7..ab8199e207 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -90,6 +90,7 @@ const char *migrate_tls_authz(void);
 const char *migrate_tls_creds(void);
 const char *migrate_tls_hostname(void);
 uint64_t

[PULL 25/34] migration: privatize colo interfaces

2024-03-11 Thread peterx

From: Steve Sistare 

Remove private migration interfaces from net/colo-compare.c and push them
to migration/colo.c.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-10-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 migration/colo.c   | 17 +++--
 net/colo-compare.c |  3 +--
 stubs/colo.c   |  1 -
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 315e31fe32..84632a603e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -63,9 +63,9 @@ static bool colo_runstate_is_stopped(void)
 return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
 }
 
-static void colo_checkpoint_notify(void *opaque)
+static void colo_checkpoint_notify(void)
 {
-MigrationState *s = opaque;
+MigrationState *s = migrate_get_current();
 int64_t next_notify_time;
 
 qemu_event_set(>colo_checkpoint_event);
@@ -74,10 +74,15 @@ static void colo_checkpoint_notify(void *opaque)
 timer_mod(s->colo_delay_timer, next_notify_time);
 }
 
+static void colo_checkpoint_notify_timer(void *opaque)
+{
+colo_checkpoint_notify();
+}
+
 void colo_checkpoint_delay_set(void)
 {
 if (migration_in_colo_state()) {
-colo_checkpoint_notify(migrate_get_current());
+colo_checkpoint_notify();
 }
 }
 
@@ -162,7 +167,7 @@ static void primary_vm_do_failover(void)
  * kick COLO thread which might wait at
  * qemu_sem_wait(>colo_checkpoint_sem).
  */
-colo_checkpoint_notify(s);
+colo_checkpoint_notify();
 
 /*
  * Wake up COLO thread which may blocked in recv() or send(),
@@ -518,7 +523,7 @@ out:
 
 static void colo_compare_notify_checkpoint(Notifier *notifier, void *data)
 {
-colo_checkpoint_notify(data);
+colo_checkpoint_notify();
 }
 
 static void colo_process_checkpoint(MigrationState *s)
@@ -642,7 +647,7 @@ void migrate_start_colo_process(MigrationState *s)
 bql_unlock();
 qemu_event_init(>colo_checkpoint_event, false);
 s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
-colo_checkpoint_notify, s);
+colo_checkpoint_notify_timer, NULL);
 
 qemu_sem_init(>colo_exit_sem, 0);
 colo_process_checkpoint(s);
diff --git a/net/colo-compare.c b/net/colo-compare.c
index f2dfc0ebdc..c4ad0ab71f 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -28,7 +28,6 @@
 #include "sysemu/iothread.h"
 #include "net/colo-compare.h"
 #include "migration/colo.h"
-#include "migration/migration.h"
 #include "util.h"
 
 #include "block/aio-wait.h"
@@ -189,7 +188,7 @@ static void colo_compare_inconsistency_notify(CompareState 
*s)
 notify_remote_frame(s);
 } else {
 notifier_list_notify(_compare_notifiers,
- migrate_get_current());
+ NULL);
 }
 }
 
diff --git a/stubs/colo.c b/stubs/colo.c
index 08c9f982d5..f8c069b739 100644
--- a/stubs/colo.c
+++ b/stubs/colo.c
@@ -2,7 +2,6 @@
 #include "qemu/notify.h"
 #include "net/colo-compare.h"
 #include "migration/colo.h"
-#include "migration/migration.h"
 #include "qemu/error-report.h"
 #include "qapi/qapi-commands-migration.h"
 
-- 
2.44.0

[PULL 27/34] migration: purge MigrationState from public interface

2024-03-11 Thread peterx

From: Steve Sistare 

Move remaining MigrationState references from the public file
misc.h to the private file migration.h.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-12-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h | 6 ++
 migration/migration.h| 6 ++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index d563d2c801..c9e200f4eb 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -64,7 +64,6 @@ bool migration_is_active(void);
 bool migration_is_device(void);
 bool migration_thread_is_self(void);
 bool migration_is_setup_or_active(void);
-bool migrate_mode_is_cpr(MigrationState *);
 
 typedef enum MigrationEventType {
 MIG_EVENT_PRECOPY_SETUP,
@@ -103,16 +102,15 @@ void migration_add_notifier_mode(NotifierWithReturn 
*notify,
  MigrationNotifyFunc func, MigMode mode);
 
 void migration_remove_notifier(NotifierWithReturn *notify);
-int migration_call_notifiers(MigrationState *s, MigrationEventType type,
- Error **errp);
-bool migration_has_failed(MigrationState *);
 bool migration_is_running(void);
 void migration_file_set_error(int err);
 
 /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
+
 /* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */
 bool migration_incoming_postcopy_advised(void);
+
 /* True if background snapshot is active */
 bool migration_in_bg_snapshot(void);
 
diff --git a/migration/migration.h b/migration/migration.h
index e4983db9c9..8045e39c26 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -26,6 +26,7 @@
 #include "qom/object.h"
 #include "postcopy-ram.h"
 #include "sysemu/runstate.h"
+#include "migration/misc.h"
 
 struct PostcopyBlocktimeContext;
 
@@ -479,12 +480,17 @@ bool migrate_has_error(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
+int migration_call_notifiers(MigrationState *s, MigrationEventType type,
+ Error **errp);
+
 int migrate_init(MigrationState *s, Error **errp);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
 bool migration_postcopy_is_alive(int state);
 MigrationState *migrate_get_current(void);
+bool migration_has_failed(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
 
 uint64_t ram_get_total_transferred_pages(void);
 
-- 
2.44.0

[PULL 21/34] migration: export vcpu_dirty_limit_period

2024-03-11 Thread peterx

From: Steve Sistare 

Define and export vcpu_dirty_limit_period to eliminate a dependency
on MigrationState.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-6-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/client-options.h | 1 +
 migration/options.c| 7 +++
 system/dirtylimit.c| 3 +--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/migration/client-options.h 
b/include/migration/client-options.h
index 887fea1565..59f4b55cf4 100644
--- a/include/migration/client-options.h
+++ b/include/migration/client-options.h
@@ -20,5 +20,6 @@ bool migrate_switchover_ack(void);
 /* parameters */
 
 MigMode migrate_mode(void);
+uint64_t migrate_vcpu_dirty_limit_period(void);
 
 #endif
diff --git a/migration/options.c b/migration/options.c
index 642cfb00a3..09178c6f60 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -924,6 +924,13 @@ const char *migrate_tls_hostname(void)
 return s->parameters.tls_hostname;
 }
 
+uint64_t migrate_vcpu_dirty_limit_period(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.x_vcpu_dirty_limit_period;
+}
+
 uint64_t migrate_xbzrle_cache_size(void)
 {
 MigrationState *s = migrate_get_current();
diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index 1622bb7426..b0afaa0776 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -77,14 +77,13 @@ static bool dirtylimit_quit;
 
 static void vcpu_dirty_rate_stat_collect(void)
 {
-MigrationState *s = migrate_get_current();
 VcpuStat stat;
 int i = 0;
 int64_t period = DIRTYLIMIT_CALC_TIME_MS;
 
 if (migrate_dirty_limit() &&
 migration_is_active()) {
-period = s->parameters.x_vcpu_dirty_limit_period;
+period = migrate_vcpu_dirty_limit_period();
 }
 
 /* calculate vcpu dirtyrate */
-- 
2.44.0

[PULL 15/34] migration: Fix format in error message

2024-03-11 Thread peterx

From: Anthony PERARD 

In file_write_ramblock_iov(), "offset" is "uintptr_t" and not
"ram_addr_t". While usually they are both equivalent, this is not the
case with CONFIG_XEN_BACKEND.

Use the right format. This will fix build on 32-bit.

Fixes: f427d90b9898 ("migration/multifd: Support outgoing mapped-ram stream 
format")
Signed-off-by: Anthony PERARD 
Link: https://lore.kernel.org/r/20240311123439.16844-1-anthony.per...@citrix.com
Signed-off-by: Peter Xu 
---
 migration/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/file.c b/migration/file.c
index 164b079966..5054a60851 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -191,7 +191,7 @@ int file_write_ramblock_iov(QIOChannel *ioc, const struct 
iovec *iov,
  */
 offset = (uintptr_t) iov[slice_idx].iov_base - (uintptr_t) block->host;
 if (offset >= block->used_length) {
-error_setg(errp, "offset " RAM_ADDR_FMT
+error_setg(errp, "offset %" PRIxPTR
"outside of ramblock %s range", offset, block->idstr);
 ret = -1;
 break;
-- 
2.44.0

[PULL 29/34] migration/multifd: Allow clearing of the file_bmap from multifd

2024-03-11 Thread peterx

From: Fabiano Rosas 

We currently only need to clear the mapped-ram file bitmap from the
migration thread during save_zero_page.

We're about to add support for zero page detection on the multifd
thread, so allow ramblock_set_file_bmap_atomic() to also clear the
bits.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240311180015.3359271-3-hao.xi...@linux.dev
Signed-off-by: Peter Xu 
---
 migration/ram.h | 3 ++-
 migration/multifd.c | 2 +-
 migration/ram.c | 8 ++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/migration/ram.h b/migration/ram.h
index b9ac0da587..08feecaf51 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -75,7 +75,8 @@ bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, 
Error **errp);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 void postcopy_preempt_shutdown_file(MigrationState *s);
 void *postcopy_preempt_thread(void *opaque);
-void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset);
+void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset,
+   bool set);
 
 /* ram cache */
 int colo_init_ram_cache(void);
diff --git a/migration/multifd.c b/migration/multifd.c
index bf9d483f7a..3ba922694e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -115,7 +115,7 @@ static void multifd_set_file_bitmap(MultiFDSendParams *p)
 assert(pages->block);
 
 for (int i = 0; i < p->pages->num; i++) {
-ramblock_set_file_bmap_atomic(pages->block, pages->offset[i]);
+ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], true);
 }
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 3ee8cb47d3..dec2e73f8e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3149,9 +3149,13 @@ static void ram_save_file_bmap(QEMUFile *f)
 }
 }
 
-void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset)
+void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset, bool 
set)
 {
-set_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap);
+if (set) {
+set_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap);
+} else {
+clear_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap);
+}
 }
 
 /**
-- 
2.44.0

[PULL 05/34] migration: Report error when shutdown fails

2024-03-11 Thread peterx

From: Cédric Le Goater 

This will help detect issues regarding I/O channels usage.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Xu 
Signed-off-by: Cédric Le Goater 
Link: https://lore.kernel.org/r/20240304122844.1888308-7-...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/qemu-file.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b10c882629..a10882d47f 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -63,6 +63,8 @@ struct QEMUFile {
  */
 int qemu_file_shutdown(QEMUFile *f)
 {
+Error *err = NULL;
+
 /*
  * We must set qemufile error before the real shutdown(), otherwise
  * there can be a race window where we thought IO all went though
@@ -91,7 +93,8 @@ int qemu_file_shutdown(QEMUFile *f)
 return -ENOSYS;
 }
 
-if (qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL) < 0) {
+if (qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, ) < 0) {
+error_report_err(err);
 return -EIO;
 }
 
-- 
2.44.0

[PULL 32/34] migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page.

2024-03-11 Thread peterx

From: Hao Xiang 

1. Add a dedicated handler for MigrationOps::ram_save_target_page in
multifd live migration.
2. Refactor ram_save_target_page_legacy so that the legacy and multifd
handlers don't have internal functions calling into each other.

Signed-off-by: Hao Xiang 
Reviewed-by: Fabiano Rosas 
Message-Id: <20240226195654.934709-4-hao.xi...@bytedance.com>
Link: https://lore.kernel.org/r/20240311180015.3359271-6-hao.xi...@linux.dev
Signed-off-by: Peter Xu 
---
 migration/ram.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c26435adc7..8deb84984f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2079,7 +2079,6 @@ static bool save_compress_page(RAMState *rs, 
PageSearchStatus *pss,
  */
 static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
 {
-RAMBlock *block = pss->block;
 ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
 int res;
 
@@ -2095,17 +2094,33 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
 return 1;
 }
 
+return ram_save_page(rs, pss);
+}
+
+/**
+ * ram_save_target_page_multifd: send one target page to multifd workers
+ *
+ * Returns 1 if the page was queued, -1 otherwise.
+ *
+ * @rs: current RAM state
+ * @pss: data about the page we want to send
+ */
+static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
+{
+RAMBlock *block = pss->block;
+ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
+
 /*
- * Do not use multifd in postcopy as one whole host page should be
- * placed.  Meanwhile postcopy requires atomic update of pages, so even
- * if host page size == guest page size the dest guest during run may
- * still see partially copied pages which is data corruption.
+ * While using multifd live migration, we still need to handle zero
+ * page checking on the migration main thread.
  */
-if (migrate_multifd() && !migration_in_postcopy()) {
-return ram_save_multifd_page(block, offset);
+if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
+if (save_zero_page(rs, pss, offset)) {
+return 1;
+}
 }
 
-return ram_save_page(rs, pss);
+return ram_save_multifd_page(block, offset);
 }
 
 /* Should be called before sending a host page */
@@ -3112,7 +3127,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 migration_ops = g_malloc0(sizeof(MigrationOps));
-migration_ops->ram_save_target_page = ram_save_target_page_legacy;
+
+if (migrate_multifd()) {
+migration_ops->ram_save_target_page = ram_save_target_page_multifd;
+} else {
+migration_ops->ram_save_target_page = ram_save_target_page_legacy;
+}
 
 bql_unlock();
 ret = multifd_send_sync_main();
-- 
2.44.0

[PULL 07/34] migration: Add documentation for SaveVMHandlers

2024-03-11 Thread peterx

From: Cédric Le Goater 

The SaveVMHandlers structure is still in use for complex subsystems
and devices. Document the handlers since we are going to modify a few
later.

Reviewed-by: Peter Xu 
Signed-off-by: Cédric Le Goater 
Link: https://lore.kernel.org/r/20240304122844.1888308-9-...@redhat.com
Signed-off-by: Peter Xu 
---
 include/migration/register.h | 263 +++
 1 file changed, 237 insertions(+), 26 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 2e6a7d766e..d7b70a8be6 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -16,30 +16,130 @@
 
 #include "hw/vmstate-if.h"
 
+/**
+ * struct SaveVMHandlers: handler structure to finely control
+ * migration of complex subsystems and devices, such as RAM, block and
+ * VFIO.
+ */
 typedef struct SaveVMHandlers {
-/* This runs inside the BQL.  */
+
+/* The following handlers run inside the BQL. */
+
+/**
+ * @save_state
+ *
+ * Saves state section on the source using the latest state format
+ * version.
+ *
+ * Legacy method. Should be deprecated when all users are ported
+ * to VMStateDescription.
+ *
+ * @f: QEMUFile where to send the data
+ * @opaque: data pointer passed to register_savevm_live()
+ */
 void (*save_state)(QEMUFile *f, void *opaque);
 
-/*
- * save_prepare is called early, even before migration starts, and can be
- * used to perform early checks.
+/**
+ * @save_prepare
+ *
+ * Called early, even before migration starts, and can be used to
+ * perform early checks.
+ *
+ * @opaque: data pointer passed to register_savevm_live()
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Returns zero to indicate success and negative for error
  */
 int (*save_prepare)(void *opaque, Error **errp);
+
+/**
+ * @save_setup
+ *
+ * Initializes the data structures on the source and transmits
+ * first section containing information on the device
+ *
+ * @f: QEMUFile where to send the data
+ * @opaque: data pointer passed to register_savevm_live()
+ *
+ * Returns zero to indicate success and negative for error
+ */
 int (*save_setup)(QEMUFile *f, void *opaque);
+
+/**
+ * @save_cleanup
+ *
+ * Uninitializes the data structures on the source
+ *
+ * @opaque: data pointer passed to register_savevm_live()
+ */
 void (*save_cleanup)(void *opaque);
+
+/**
+ * @save_live_complete_postcopy
+ *
+ * Called at the end of postcopy for all postcopyable devices.
+ *
+ * @f: QEMUFile where to send the data
+ * @opaque: data pointer passed to register_savevm_live()
+ *
+ * Returns zero to indicate success and negative for error
+ */
 int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
+
+/**
+ * @save_live_complete_precopy
+ *
+ * Transmits the last section for the device containing any
+ * remaining data at the end of a precopy phase. When postcopy is
+ * enabled, devices that support postcopy will skip this step,
+ * where the final data will be flushed at the end of postcopy via
+ * @save_live_complete_postcopy instead.
+ *
+ * @f: QEMUFile where to send the data
+ * @opaque: data pointer passed to register_savevm_live()
+ *
+ * Returns zero to indicate success and negative for error
+ */
 int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
 
 /* This runs both outside and inside the BQL.  */
+
+/**
+ * @is_active
+ *
+ * Will skip a state section if not active
+ *
+ * @opaque: data pointer passed to register_savevm_live()
+ *
+ * Returns true if state section is active else false
+ */
 bool (*is_active)(void *opaque);
+
+/**
+ * @has_postcopy
+ *
+ * Checks if a device supports postcopy
+ *
+ * @opaque: data pointer passed to register_savevm_live()
+ *
+ * Returns true for postcopy support else false
+ */
 bool (*has_postcopy)(void *opaque);
 
-/* is_active_iterate
- * If it is not NULL then qemu_savevm_state_iterate will skip iteration if
- * it returns false. For example, it is needed for only-postcopy-states,
- * which needs to be handled by qemu_savevm_state_setup and
- * qemu_savevm_state_pending, but do not need iterations until not in
- * postcopy stage.
+/**
+ * @is_active_iterate
+ *
+ * As #SaveVMHandlers.is_active(), will skip an inactive state
+ * section in qemu_savevm_state_iterate.
+ *
+ * For example, it is needed for only-postcopy-states, which needs
+ * to be handled by qemu_savevm_state_setup() and
+ * qemu_savevm_state_pending(), but do not need iterations until
+ * not in postcopy stage.
+ *
+ * @opaque: data pointer passed to register_savevm_live()
+ *

[PULL 18/34] migration: export migration_is_setup_or_active

2024-03-11 Thread peterx

From: Steve Sistare 

Delete the MigrationState parameter from migration_is_setup_or_active
and move it to the public API in misc.h.

Signed-off-by: Steve Sistare 
Reviewed-by: Philippe Mathieu-Daudé 
Link: 
https://lore.kernel.org/r/1710179338-294359-3-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h |  1 +
 migration/migration.h|  1 -
 hw/vfio/common.c |  2 +-
 migration/migration.c| 12 ++--
 migration/ram.c  |  5 ++---
 net/vhost-vdpa.c |  3 +--
 6 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4c226a40bb..79cff6224e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -61,6 +61,7 @@ void migration_object_init(void);
 void migration_shutdown(void);
 bool migration_is_idle(void);
 bool migration_is_active(MigrationState *);
+bool migration_is_setup_or_active(void);
 bool migrate_mode_is_cpr(MigrationState *);
 
 typedef enum MigrationEventType {
diff --git a/migration/migration.h b/migration/migration.h
index 65c0b61cbd..736460aa8b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -479,7 +479,6 @@ bool migrate_has_error(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
-bool migration_is_setup_or_active(int state);
 bool migration_is_running(int state);
 
 int migrate_init(MigrationState *s, Error **errp);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 059bfdc07a..896eab8103 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -152,7 +152,7 @@ static void vfio_set_migration_error(int err)
 {
 MigrationState *ms = migrate_get_current();
 
-if (migration_is_setup_or_active(ms->state)) {
+if (migration_is_setup_or_active()) {
 WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
 if (ms->to_dst_file) {
 qemu_file_set_error(ms->to_dst_file, err);
diff --git a/migration/migration.c b/migration/migration.c
index a49fcd53ee..af21403bad 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1081,9 +1081,11 @@ void migrate_send_rp_resume_ack(MigrationIncomingState 
*mis, uint32_t value)
  * Return true if we're already in the middle of a migration
  * (i.e. any of the active or setup states)
  */
-bool migration_is_setup_or_active(int state)
+bool migration_is_setup_or_active(void)
 {
-switch (state) {
+MigrationState *s = current_migration;
+
+switch (s->state) {
 case MIGRATION_STATUS_ACTIVE:
 case MIGRATION_STATUS_POSTCOPY_ACTIVE:
 case MIGRATION_STATUS_POSTCOPY_PAUSED:
@@ -1601,10 +1603,8 @@ bool migration_incoming_postcopy_advised(void)
 
 bool migration_in_bg_snapshot(void)
 {
-MigrationState *s = migrate_get_current();
-
 return migrate_background_snapshot() &&
-migration_is_setup_or_active(s->state);
+   migration_is_setup_or_active();
 }
 
 bool migration_is_idle(void)
@@ -2297,7 +2297,7 @@ static void *source_return_path_thread(void *opaque)
 trace_source_return_path_thread_entry();
 rcu_register_thread();
 
-while (migration_is_setup_or_active(ms->state)) {
+while (migration_is_setup_or_active()) {
 trace_source_return_path_thread_loop_top();
 
 header_type = qemu_get_be16(rp);
diff --git a/migration/ram.c b/migration/ram.c
index 2cd936d9ce..3ee8cb47d3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2909,10 +2909,9 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
 RAMBlock *block;
 ram_addr_t offset;
 size_t used_len, start, npages;
-MigrationState *s = migrate_get_current();
 
 /* This function is currently expected to be used during live migration */
-if (!migration_is_setup_or_active(s->state)) {
+if (!migration_is_setup_or_active()) {
 return;
 }
 
@@ -3263,7 +3262,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 
 out:
 if (ret >= 0
-&& migration_is_setup_or_active(migrate_get_current()->state)) {
+&& migration_is_setup_or_active()) {
 if (migrate_multifd() && migrate_multifd_flush_after_each_section() &&
 !migrate_mapped_ram()) {
 ret = multifd_send_sync_main();
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index e6bdb4562d..8564817073 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -26,7 +26,6 @@
 #include 
 #include "standard-headers/linux/virtio_net.h"
 #include "monitor/monitor.h"
-#include "migration/migration.h"
 #include "migration/misc.h"
 #include "hw/virtio/vhost.h"
 
@@ -355,7 +354,7 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
 assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
 if (s->always_svq ||
-migration_is_setup_or_active(migrate_get_current()->state)) {
+migration_is_setup_or_active()) {
 v->shadow_vqs_enabled = true;
 } else {
 v->shadow_vqs_enabled = false;
-- 
2.44.0

[PULL 22/34] migration: migration_thread_is_self

2024-03-11 Thread peterx

From: Steve Sistare 

Define and export migration_thread_is_self to eliminate a dependency
on MigrationState.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-7-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h | 1 +
 migration/migration.c| 7 +++
 system/dirtylimit.c  | 5 +
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 7526977de6..c4b5416357 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -61,6 +61,7 @@ void migration_object_init(void);
 void migration_shutdown(void);
 bool migration_is_idle(void);
 bool migration_is_active(void);
+bool migration_thread_is_self(void);
 bool migration_is_setup_or_active(void);
 bool migrate_mode_is_cpr(MigrationState *);
 
diff --git a/migration/migration.c b/migration/migration.c
index 546ba86c63..afe72af0b1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1647,6 +1647,13 @@ bool migration_is_active(void)
 s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 }
 
+bool migration_thread_is_self(void)
+{
+MigrationState *s = current_migration;
+
+return qemu_thread_is_self(>thread);
+}
+
 bool migrate_mode_is_cpr(MigrationState *s)
 {
 return s->parameters.mode == MIG_MODE_CPR_REBOOT;
diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index b0afaa0776..ab20da34bb 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -25,7 +25,6 @@
 #include "sysemu/kvm.h"
 #include "trace.h"
 #include "migration/misc.h"
-#include "migration/migration.h"
 
 /*
  * Dirtylimit stop working if dirty page rate error
@@ -448,10 +447,8 @@ static void dirtylimit_cleanup(void)
  */
 static bool dirtylimit_is_allowed(void)
 {
-MigrationState *ms = migrate_get_current();
-
 if (migration_is_running() &&
-(!qemu_thread_is_self(>thread)) &&
+!migration_thread_is_self() &&
 migrate_dirty_limit() &&
 dirtylimit_in_service()) {
 return false;
-- 
2.44.0

[PULL 10/34] migration/rdma: Fix a memory issue for migration

2024-03-11 Thread peterx

From: Yu Zhang 

In commit 3fa9642ff7 change was made to convert the RDMA backend to
accept MigrateAddress struct. However, the assignment of "host" leads
to data corruption on the target host and the failure of migration.

isock->host = rdma->host;

By allocating the memory explicitly for it with g_strdup_printf(), the
issue is fixed and the migration doesn't fail any more.

Fixes: 3fa9642ff7 ("migration: convert rdma backend to accept MigrateAddress")
Cc: qemu-stable 
Cc: Li Zhijian 
Link: 
https://lore.kernel.org/r/CAHEcVy4L_D6tuhJ8h=xlr4wapaprje3nnxzaeyunotrxq6c...@mail.gmail.com
Signed-off-by: Yu Zhang 
[peterx: use g_strdup() instead of g_strdup_printf(), per Zhijian]
Signed-off-by: Peter Xu 
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a355dcea89..855753c671 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3357,7 +3357,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 goto err_rdma_dest_wait;
 }
 
-isock->host = rdma->host;
+isock->host = g_strdup(rdma->host);
 isock->port = g_strdup_printf("%d", rdma->port);
 
 /*
-- 
2.44.0

[PULL 28/34] migration/multifd: Allow zero pages in file migration

2024-03-11 Thread peterx

From: Fabiano Rosas 

Currently, it's an error to have no data pages in the multifd file
migration because zero page detection is done in the migration thread
and zero pages don't reach multifd. This is enforced with the
pages->num assert.

We're about to add zero page detection on the multifd thread. Fix the
file_write_ramblock_iov() to stop considering p->iovs_num=0 an error.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240311180015.3359271-2-hao.xi...@linux.dev
Signed-off-by: Peter Xu 
---
 migration/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/file.c b/migration/file.c
index 5054a60851..b0b963e0ce 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -159,7 +159,7 @@ void file_start_incoming_migration(FileMigrationArgs 
*file_args, Error **errp)
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
 int niov, RAMBlock *block, Error **errp)
 {
-ssize_t ret = -1;
+ssize_t ret = 0;
 int i, slice_idx, slice_num;
 uintptr_t base, next, offset;
 size_t len;
-- 
2.44.0

[PULL 34/34] migration/multifd: Add new migration test cases for legacy zero page checking.

2024-03-11 Thread peterx

From: Hao Xiang 

Now that zero page checking is done on the multifd sender threads by
default, we still provide an option for backward compatibility. This
change adds a qtest migration test case to set the zero-page-detection
option to "legacy" and run multifd migration with zero page checking on the
migration main thread.

Signed-off-by: Hao Xiang 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240311180015.3359271-8-hao.xi...@linux.dev
Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 4023d808f9..71895abb7f 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2771,6 +2771,24 @@ test_migrate_precopy_tcp_multifd_start(QTestState *from,
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
 }
 
+static void *
+test_migrate_precopy_tcp_multifd_start_zero_page_legacy(QTestState *from,
+QTestState *to)
+{
+test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
+migrate_set_parameter_str(from, "zero-page-detection", "legacy");
+return NULL;
+}
+
+static void *
+test_migration_precopy_tcp_multifd_start_no_zero_page(QTestState *from,
+  QTestState *to)
+{
+test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
+migrate_set_parameter_str(from, "zero-page-detection", "none");
+return NULL;
+}
+
 static void *
 test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
 QTestState *to)
@@ -2812,6 +2830,36 @@ static void test_multifd_tcp_none(void)
 test_precopy_common();
 }
 
+static void test_multifd_tcp_zero_page_legacy(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.start_hook = test_migrate_precopy_tcp_multifd_start_zero_page_legacy,
+/*
+ * Multifd is more complicated than most of the features, it
+ * directly takes guest page buffers when sending, make sure
+ * everything will work alright even if guest page is changing.
+ */
+.live = true,
+};
+test_precopy_common();
+}
+
+static void test_multifd_tcp_no_zero_page(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.start_hook = test_migration_precopy_tcp_multifd_start_no_zero_page,
+/*
+ * Multifd is more complicated than most of the features, it
+ * directly takes guest page buffers when sending, make sure
+ * everything will work alright even if guest page is changing.
+ */
+.live = true,
+};
+test_precopy_common();
+}
+
 static void test_multifd_tcp_zlib(void)
 {
 MigrateCommon args = {
@@ -3729,6 +3777,10 @@ int main(int argc, char **argv)
 }
 migration_test_add("/migration/multifd/tcp/plain/none",
test_multifd_tcp_none);
+migration_test_add("/migration/multifd/tcp/plain/zero-page/legacy",
+   test_multifd_tcp_zero_page_legacy);
+migration_test_add("/migration/multifd/tcp/plain/zero-page/none",
+   test_multifd_tcp_no_zero_page);
 migration_test_add("/migration/multifd/tcp/plain/cancel",
test_multifd_tcp_cancel);
 migration_test_add("/migration/multifd/tcp/plain/zlib",
-- 
2.44.0

[PULL 26/34] migration: delete unused accessors

2024-03-11 Thread peterx

From: Steve Sistare 

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-11-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h |  3 ---
 migration/migration.c| 10 --
 2 files changed, 13 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e521cd5229..d563d2c801 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -105,13 +105,10 @@ void migration_add_notifier_mode(NotifierWithReturn 
*notify,
 void migration_remove_notifier(NotifierWithReturn *notify);
 int migration_call_notifiers(MigrationState *s, MigrationEventType type,
  Error **errp);
-bool migration_in_setup(MigrationState *);
-bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
 bool migration_is_running(void);
 void migration_file_set_error(int err);
 
-/* ...and after the device transmission */
 /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
 /* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */
diff --git a/migration/migration.c b/migration/migration.c
index 216f63d62b..644e073b7d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1548,16 +1548,6 @@ int migration_call_notifiers(MigrationState *s, 
MigrationEventType type,
 return ret;
 }
 
-bool migration_in_setup(MigrationState *s)
-{
-return s->state == MIGRATION_STATUS_SETUP;
-}
-
-bool migration_has_finished(MigrationState *s)
-{
-return s->state == MIGRATION_STATUS_COMPLETED;
-}
-
 bool migration_has_failed(MigrationState *s)
 {
 return (s->state == MIGRATION_STATUS_CANCELLED ||
-- 
2.44.0

[PULL 09/34] migration/multifd: Don't fsync when closing QIOChannelFile

2024-03-11 Thread peterx

From: Fabiano Rosas 

Commit bc38feddeb ("io: fsync before closing a file channel") added a
fsync/fdatasync at the closing point of the QIOChannelFile to ensure
integrity of the migration stream in case of QEMU crash.

The decision to do the sync at qio_channel_close() was not the best
since that function runs in the main thread and the fsync can cause
QEMU to hang for several minutes, depending on the migration size and
disk speed.

To fix the hang, remove the fsync from qio_channel_file_close().

At this moment, the migration code is the only user of the fsync and
we're taking the tradeoff of not having a sync at all, leaving the
responsibility to the upper layers.

Fixes: bc38feddeb ("io: fsync before closing a file channel")
Reviewed-by: "Daniel P. Berrangé" 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240305195629.9922-1-faro...@suse.de
Link: https://lore.kernel.org/r/20240305174332.2553-1-faro...@suse.de
[peterx: add more comment to the qio_channel_close()]
Signed-off-by: Peter Xu 
---
 docs/devel/migration/main.rst |  3 ++-
 io/channel-file.c |  5 -
 migration/multifd.c   | 28 +++-
 3 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 8024275d6d..54385a23e5 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -44,7 +44,8 @@ over any transport.
 - file migration: do the migration using a file that is passed to QEMU
   by path. A file offset option is supported to allow a management
   application to add its own metadata to the start of the file without
-  QEMU interference.
+  QEMU interference. Note that QEMU does not flush cached file
+  data/metadata at the end of migration.
 
 In addition, support is included for migration using RDMA, which
 transports the page data using ``RDMA``, where the hardware takes care of
diff --git a/io/channel-file.c b/io/channel-file.c
index d4706fa592..a6ad7770c6 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -242,11 +242,6 @@ static int qio_channel_file_close(QIOChannel *ioc,
 {
 QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
 
-if (qemu_fdatasync(fioc->fd) < 0) {
-error_setg_errno(errp, errno,
- "Unable to synchronize file data with storage 
device");
-return -1;
-}
 if (qemu_close(fioc->fd) < 0) {
 error_setg_errno(errp, errno,
  "Unable to close file");
diff --git a/migration/multifd.c b/migration/multifd.c
index d4a44da559..bf9d483f7a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -710,16 +710,26 @@ static bool 
multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
 if (p->c) {
 migration_ioc_unregister_yank(p->c);
 /*
- * An explicit close() on the channel here is normally not
- * required, but can be helpful for "file:" iochannels, where it
- * will include fdatasync() to make sure the data is flushed to the
- * disk backend.
+ * The object_unref() cannot guarantee the fd will always be
+ * released because finalize() of the iochannel is only
+ * triggered on the last reference and it's not guaranteed
+ * that we always hold the last refcount when reaching here.
  *
- * The object_unref() cannot guarantee that because: (1) finalize()
- * of the iochannel is only triggered on the last reference, and
- * it's not guaranteed that we always hold the last refcount when
- * reaching here, and, (2) even if finalize() is invoked, it only
- * does a close(fd) without data flush.
+ * Closing the fd explicitly has the benefit that if there is any
+ * registered I/O handler callbacks on such fd, that will get a
+ * POLLNVAL event and will further trigger the cleanup to finally
+ * release the IOC.
+ *
+ * FIXME: It should logically be guaranteed that all multifd
+ * channels have no I/O handler callback registered when reaching
+ * here, because migration thread will wait for all multifd channel
+ * establishments to complete during setup.  Since
+ * migrate_fd_cleanup() will be scheduled in main thread too, all
+ * previous callbacks should guarantee to be completed when
+ * reaching here.  See multifd_send_state.channels_created and its
+ * usage.  In the future, we could replace this with an assert
+ * making sure we're the last reference, or simply drop it if above
+ * is more clear to be justified.
  */
 qio_channel_close(p->c, _abort);
 object_unref(OBJECT(p->c));
-- 
2.44.0

[PULL 33/34] migration/multifd: Enable multifd zero page checking by default.

2024-03-11 Thread peterx

From: Hao Xiang 

1. Set default "zero-page-detection" option to "multifd". Now
zero page checking can be done in the multifd threads and this
becomes the default configuration.
2. Handle migration QEMU9.0 -> QEMU8.2 compatibility. We provide
backward compatibility where zero page checking is done from the
migration main thread.

Signed-off-by: Hao Xiang 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240311180015.3359271-7-hao.xi...@linux.dev
Signed-off-by: Peter Xu 
---
 qapi/migration.json | 6 +++---
 hw/core/machine.c   | 4 +++-
 migration/options.c | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 2684e4e9ac..aa1b39bce1 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -909,7 +909,7 @@
 #(Since 8.2)
 #
 # @zero-page-detection: Whether and how to detect zero pages.
-# See description in @ZeroPageDetection.  Default is 'legacy'.
+# See description in @ZeroPageDetection.  Default is 'multifd'.
 # (since 9.0)
 #
 # Features:
@@ -1106,7 +1106,7 @@
 #(Since 8.2)
 #
 # @zero-page-detection: Whether and how to detect zero pages.
-# See description in @ZeroPageDetection.  Default is 'legacy'.
+# See description in @ZeroPageDetection.  Default is 'multifd'.
 # (since 9.0)
 #
 # Features:
@@ -1339,7 +1339,7 @@
 #(Since 8.2)
 #
 # @zero-page-detection: Whether and how to detect zero pages.
-# See description in @ZeroPageDetection.  Default is 'legacy'.
+# See description in @ZeroPageDetection.  Default is 'multifd'.
 # (since 9.0)
 #
 # Features:
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9ac5d5389a..0e9d646b61 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -32,7 +32,9 @@
 #include "hw/virtio/virtio-net.h"
 #include "audio/audio.h"
 
-GlobalProperty hw_compat_8_2[] = {};
+GlobalProperty hw_compat_8_2[] = {
+{ "migration", "zero-page-detection", "legacy"},
+};
 const size_t hw_compat_8_2_len = G_N_ELEMENTS(hw_compat_8_2);
 
 GlobalProperty hw_compat_8_1[] = {
diff --git a/migration/options.c b/migration/options.c
index 8f2a3a2fa5..9ed2fe4bee 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -181,7 +181,7 @@ Property migration_properties[] = {
   MIG_MODE_NORMAL),
 DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
parameters.zero_page_detection,
-   ZERO_PAGE_DETECTION_LEGACY),
+   ZERO_PAGE_DETECTION_MULTIFD),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
-- 
2.44.0

[PULL 06/34] migration: Remove SaveStateHandler and LoadStateHandler typedefs

2024-03-11 Thread peterx

From: Cédric Le Goater 

They are only used once.

Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Signed-off-by: Cédric Le Goater 
Link: https://lore.kernel.org/r/20240304122844.1888308-8-...@redhat.com
Signed-off-by: Peter Xu 
---
 include/migration/register.h | 4 ++--
 include/qemu/typedefs.h  | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 9ab1f79512..2e6a7d766e 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -18,7 +18,7 @@
 
 typedef struct SaveVMHandlers {
 /* This runs inside the BQL.  */
-SaveStateHandler *save_state;
+void (*save_state)(QEMUFile *f, void *opaque);
 
 /*
  * save_prepare is called early, even before migration starts, and can be
@@ -71,7 +71,7 @@ typedef struct SaveVMHandlers {
 /* This calculate the exact remaining data to transfer */
 void (*state_pending_exact)(void *opaque, uint64_t *must_precopy,
 uint64_t *can_postcopy);
-LoadStateHandler *load_state;
+int (*load_state)(QEMUFile *f, void *opaque, int version_id);
 int (*load_setup)(QEMUFile *f, void *opaque);
 int (*load_cleanup)(void *opaque);
 /* Called when postcopy migration wants to resume from failure */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index a028dba4d0..50c277cf0b 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -151,8 +151,6 @@ typedef struct IRQState *qemu_irq;
 /*
  * Function types
  */
-typedef void SaveStateHandler(QEMUFile *f, void *opaque);
-typedef int LoadStateHandler(QEMUFile *f, void *opaque, int version_id);
 typedef void (*qemu_irq_handler)(void *opaque, int n, int level);
 
 #endif /* QEMU_TYPEDEFS_H */
-- 
2.44.0

[PULL 24/34] migration: migration_file_set_error

2024-03-11 Thread peterx

From: Steve Sistare 

Define and export migration_file_set_error to eliminate a dependency
on MigrationState.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-9-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h |  2 ++
 hw/vfio/common.c |  9 +
 hw/vfio/migration.c  | 11 +++
 migration/migration.c| 11 +++
 4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 28cfaed2c7..e521cd5229 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -109,6 +109,8 @@ bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
 bool migration_is_running(void);
+void migration_file_set_error(int err);
+
 /* ...and after the device transmission */
 /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index de010680ff..b44204eade 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -39,7 +39,6 @@
 #include "sysemu/runstate.h"
 #include "trace.h"
 #include "qapi/error.h"
-#include "migration/migration.h"
 #include "migration/misc.h"
 #include "migration/blocker.h"
 #include "migration/qemu-file.h"
@@ -150,14 +149,8 @@ bool vfio_viommu_preset(VFIODevice *vbasedev)
 
 static void vfio_set_migration_error(int err)
 {
-MigrationState *ms = migrate_get_current();
-
 if (migration_is_setup_or_active()) {
-WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
-if (ms->to_dst_file) {
-qemu_file_set_error(ms->to_dst_file, err);
-}
-}
+migration_file_set_error(err);
 }
 }
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 49c0016add..a15fd486c6 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -17,13 +17,12 @@
 
 #include "sysemu/runstate.h"
 #include "hw/vfio/vfio-common.h"
-#include "migration/migration.h"
+#include "migration/misc.h"
 #include "migration/savevm.h"
 #include "migration/vmstate.h"
 #include "migration/qemu-file.h"
 #include "migration/register.h"
 #include "migration/blocker.h"
-#include "migration/misc.h"
 #include "qapi/error.h"
 #include "exec/ramlist.h"
 #include "exec/ram_addr.h"
@@ -714,9 +713,7 @@ static void vfio_vmstate_change_prepare(void *opaque, bool 
running,
  * Migration should be aborted in this case, but vm_state_notify()
  * currently does not support reporting failures.
  */
-if (migrate_get_current()->to_dst_file) {
-qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
-}
+migration_file_set_error(ret);
 }
 
 trace_vfio_vmstate_change_prepare(vbasedev->name, running,
@@ -746,9 +743,7 @@ static void vfio_vmstate_change(void *opaque, bool running, 
RunState state)
  * Migration should be aborted in this case, but vm_state_notify()
  * currently does not support reporting failures.
  */
-if (migrate_get_current()->to_dst_file) {
-qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
-}
+migration_file_set_error(ret);
 }
 
 trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
diff --git a/migration/migration.c b/migration/migration.c
index db1e627848..216f63d62b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3038,6 +3038,17 @@ static MigThrError postcopy_pause(MigrationState *s)
 }
 }
 
+void migration_file_set_error(int err)
+{
+MigrationState *s = current_migration;
+
+WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
+if (s->to_dst_file) {
+qemu_file_set_error(s->to_dst_file, err);
+}
+}
+}
+
 static MigThrError migration_detect_error(MigrationState *s)
 {
 int ret;
-- 
2.44.0

[PULL 12/34] physmem: Reduce local variable scope in flatview_read/write_continue()

2024-03-11 Thread peterx

From: Jonathan Cameron 

Precursor to factoring out the inner loops for reuse.

Reviewed-by: Peter Xu 
Signed-off-by: Jonathan Cameron 
Reviewed-by: David Hildenbrand 
Reviewed-by: Philippe Mathieu-Daudé 
Link: 
https://lore.kernel.org/r/20240307153710.30907-3-jonathan.came...@huawei.com
Signed-off-by: Peter Xu 
---
 system/physmem.c | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index e92bed50a6..e35aa29343 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2688,10 +2688,7 @@ static MemTxResult flatview_write_continue(FlatView *fv, 
hwaddr addr,
hwaddr len, hwaddr mr_addr,
hwaddr l, MemoryRegion *mr)
 {
-uint8_t *ram_ptr;
-uint64_t val;
 MemTxResult result = MEMTX_OK;
-bool release_lock = false;
 const uint8_t *buf = ptr;
 
 for (;;) {
@@ -2699,7 +2696,9 @@ static MemTxResult flatview_write_continue(FlatView *fv, 
hwaddr addr,
 result |= MEMTX_ACCESS_ERROR;
 /* Keep going. */
 } else if (!memory_access_is_direct(mr, true)) {
-release_lock |= prepare_mmio_access(mr);
+uint64_t val;
+bool release_lock = prepare_mmio_access(mr);
+
 l = memory_access_size(mr, l, mr_addr);
 /* XXX: could force current_cpu to NULL to avoid
potential bugs */
@@ -2717,18 +2716,21 @@ static MemTxResult flatview_write_continue(FlatView 
*fv, hwaddr addr,
 val = ldn_he_p(buf, l);
 result |= memory_region_dispatch_write(mr, mr_addr, val,
size_memop(l), attrs);
+if (release_lock) {
+bql_unlock();
+}
+
+
 } else {
 /* RAM case */
-ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, , false);
+
+uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, ,
+   false);
+
 memmove(ram_ptr, buf, l);
 invalidate_and_set_dirty(mr, mr_addr, l);
 }
 
-if (release_lock) {
-bql_unlock();
-release_lock = false;
-}
-
 len -= l;
 buf += l;
 addr += l;
@@ -2767,10 +2769,7 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr 
addr,
hwaddr len, hwaddr mr_addr, hwaddr l,
MemoryRegion *mr)
 {
-uint8_t *ram_ptr;
-uint64_t val;
 MemTxResult result = MEMTX_OK;
-bool release_lock = false;
 uint8_t *buf = ptr;
 
 fuzz_dma_read_cb(addr, len, mr);
@@ -2780,7 +2779,9 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr 
addr,
 /* Keep going. */
 } else if (!memory_access_is_direct(mr, false)) {
 /* I/O case */
-release_lock |= prepare_mmio_access(mr);
+uint64_t val;
+bool release_lock = prepare_mmio_access(mr);
+
 l = memory_access_size(mr, l, mr_addr);
 result |= memory_region_dispatch_read(mr, mr_addr, ,
   size_memop(l), attrs);
@@ -2796,17 +2797,16 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr 
addr,
(l == 8 && len >= 8));
 #endif
 stn_he_p(buf, l, val);
+if (release_lock) {
+bql_unlock();
+}
 } else {
 /* RAM case */
-ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, , false);
+uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, ,
+   false);
 memcpy(buf, ram_ptr, l);
 }
 
-if (release_lock) {
-bql_unlock();
-release_lock = false;
-}
-
 len -= l;
 buf += l;
 addr += l;
-- 
2.44.0

[PULL 01/34] migration: Don't serialize devices in qemu_savevm_state_iterate()

2024-03-11 Thread peterx

From: Avihai Horon 

Commit 90697be8896c ("live migration: Serialize vmstate saving in stage
2") introduced device serialization in qemu_savevm_state_iterate(). The
rationale behind it was to first complete migration of slower changing
block devices and only then migrate the RAM, to avoid sending fast
changing RAM pages over and over.

This commit was added a long time ago, and while it was useful back
then, it is not the case anymore:
1. Block migration is deprecated, see commit 66db46ca83b8 ("migration:
   Deprecate block migration").
2. Today there are other iterative devices besides RAM and block, such
   as VFIO, which are registered for migration after RAM. With current
   serialization behavior, a fast changing device can block other
   devices from sending their data, which may prevent migration from
   converging in some cases.

The issue described in item 2 was observed in several VFIO migration
scenarios with switchover-ack capability enabled, where some workload on
the VM prevented RAM from ever reaching a hard zero, thus blocking VFIO
initial pre-copy data from being sent. Hence, destination could not ack
switchover and migration could not converge.

Fix that by not serializing iterative devices in
qemu_savevm_state_iterate().

Note that this still doesn't fully prevent device starvation. As
correctly pointed out by Peter [1], a fast changing device might
constantly consume all allocated bandwidth and block the following
devices. However, this scenario is more likely to happen only if
max-bandwidth is low.

[1] https://lore.kernel.org/qemu-devel/Zd6iw9dBhW6wKNxx@x1n/

Signed-off-by: Avihai Horon 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240304105339.20713-2-avih...@nvidia.com
Signed-off-by: Peter Xu 
---
 migration/savevm.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index dc1fb9c0d3..e84b26e1c8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1390,7 +1390,8 @@ int qemu_savevm_state_resume_prepare(MigrationState *s)
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
 {
 SaveStateEntry *se;
-int ret = 1;
+bool all_finished = true;
+int ret;
 
 trace_savevm_state_iterate();
 QTAILQ_FOREACH(se, _state.handlers, entry) {
@@ -1431,16 +1432,12 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool 
postcopy)
  "%d(%s): %d",
  se->section_id, se->idstr, ret);
 qemu_file_set_error(f, ret);
-}
-if (ret <= 0) {
-/* Do not proceed to the next vmstate before this one reported
-   completion of the current stage. This serializes the migration
-   and reduces the probability that a faster changing state is
-   synchronized over and over again. */
-break;
+return ret;
+} else if (!ret) {
+all_finished = false;
 }
 }
-return ret;
+return all_finished;
 }
 
 static bool should_send_vmdesc(void)
-- 
2.44.0

[PULL 23/34] migration: migration_is_device

2024-03-11 Thread peterx

From: Steve Sistare 

Define and export migration_is_device to eliminate a dependency
on MigrationState.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-8-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h | 1 +
 hw/vfio/common.c | 4 +---
 migration/migration.c| 7 +++
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index c4b5416357..28cfaed2c7 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -61,6 +61,7 @@ void migration_object_init(void);
 void migration_shutdown(void);
 bool migration_is_idle(void);
 bool migration_is_active(void);
+bool migration_is_device(void);
 bool migration_thread_is_self(void);
 bool migration_is_setup_or_active(void);
 bool migrate_mode_is_cpr(MigrationState *);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 2dbbf62e15..de010680ff 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -180,10 +180,8 @@ bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
 static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
 {
 VFIODevice *vbasedev;
-MigrationState *ms = migrate_get_current();
 
-if (!migration_is_active() &&
-ms->state != MIGRATION_STATUS_DEVICE) {
+if (!migration_is_active() && !migration_is_device()) {
 return false;
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index afe72af0b1..db1e627848 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1647,6 +1647,13 @@ bool migration_is_active(void)
 s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 }
 
+bool migration_is_device(void)
+{
+MigrationState *s = current_migration;
+
+return s->state == MIGRATION_STATUS_DEVICE;
+}
+
 bool migration_thread_is_self(void)
 {
 MigrationState *s = current_migration;
-- 
2.44.0

[PULL 17/34] migration: remove migration.h references

2024-03-11 Thread peterx

From: Steve Sistare 

Remove migration.h from files that no longer need it due to
previous commits.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-2-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 hw/vfio/container.c| 1 -
 hw/virtio/vhost-user.c | 1 -
 hw/virtio/virtio-balloon.c | 1 -
 system/qdev-monitor.c  | 1 -
 target/loongarch/kvm/kvm.c | 1 -
 tests/unit/test-vmstate.c  | 1 -
 6 files changed, 6 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index bd25b9fbad..ff081a12c2 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -32,7 +32,6 @@
 #include "sysemu/reset.h"
 #include "trace.h"
 #include "qapi/error.h"
-#include "migration/migration.h"
 #include "pci.h"
 
 VFIOGroupList vfio_group_list =
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index a1eea8547e..1af8621481 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -26,7 +26,6 @@
 #include "qemu/sockets.h"
 #include "sysemu/runstate.h"
 #include "sysemu/cryptodev.h"
-#include "migration/migration.h"
 #include "migration/postcopy-ram.h"
 #include "trace.h"
 #include "exec/ramblock.h"
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a59ff172bd..609e39a821 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -31,7 +31,6 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "migration/misc.h"
-#include "migration/migration.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 09e07cab9b..c1243891c3 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -38,7 +38,6 @@
 #include "qemu/option_int.h"
 #include "sysemu/block-backend.h"
 #include "migration/misc.h"
-#include "migration/migration.h"
 #include "qemu/cutils.h"
 #include "hw/qdev-properties.h"
 #include "hw/clock.h"
diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index c19978a970..11a69a3b4e 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -22,7 +22,6 @@
 #include "hw/irq.h"
 #include "qemu/log.h"
 #include "hw/loader.h"
-#include "migration/migration.h"
 #include "sysemu/runstate.h"
 #include "cpu-csr.h"
 #include "kvm_loongarch.h"
diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
index c4f9faa273..63f28f26f4 100644
--- a/tests/unit/test-vmstate.c
+++ b/tests/unit/test-vmstate.c
@@ -24,7 +24,6 @@
 
 #include "qemu/osdep.h"
 
-#include "../migration/migration.h"
 #include "migration/vmstate.h"
 #include "migration/qemu-file-types.h"
 #include "../migration/qemu-file.h"
-- 
2.44.0

[PULL 00/34] Migration 20240311 patches

2024-03-11 Thread peterx

From: Peter Xu 

The following changes since commit 7489f7f3f81dcb776df8c1b9a9db281fc21bf05f:

  Merge tag 'hw-misc-20240309' of https://github.com/philmd/qemu into staging 
(2024-03-09 20:12:21 +)

are available in the Git repository at:

  https://gitlab.com/peterx/qemu.git tags/migration-20240311-pull-request

for you to fetch changes up to 1815338df00fd0a3fe25085564c6966f74c8f43d:

  migration/multifd: Add new migration test cases for legacy zero page 
checking. (2024-03-11 16:57:09 -0400)


Migration pull request

- Avihai's fix to allow vmstate iterators to not starve for VFIO
- Maksim's fix on additional check on precopy load error
- Fabiano's fix on fdatasync() hang in mapped-ram
- Jonathan's fix on vring cached access over MMIO regions
- Cedric's cleanup patches 1-4 out of his error report series
- Yu's fix for RDMA migration (which used to be broken even for 8.2)
- Anthony's small cleanup/fix on err message
- Steve's patches on privatize migration.h
- Xiang's patchset to enable zero page detections in multifd threads



Anthony PERARD (1):
  migration: Fix format in error message

Avihai Horon (3):
  migration: Don't serialize devices in qemu_savevm_state_iterate()
  vfio/migration: Refactor vfio_save_state() return value
  vfio/migration: Add a note about migration rate limiting

Cédric Le Goater (4):
  migration: Report error when shutdown fails
  migration: Remove SaveStateHandler and LoadStateHandler typedefs
  migration: Add documentation for SaveVMHandlers
  migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error

Fabiano Rosas (3):
  migration/multifd: Don't fsync when closing QIOChannelFile
  migration/multifd: Allow zero pages in file migration
  migration/multifd: Allow clearing of the file_bmap from multifd

Hao Xiang (5):
  migration/multifd: Add new migration option zero-page-detection.
  migration/multifd: Implement zero page transmission on the multifd
thread.
  migration/multifd: Implement ram_save_target_page_multifd to handle
multifd version of MigrationOps::ram_save_target_page.
  migration/multifd: Enable multifd zero page checking by default.
  migration/multifd: Add new migration test cases for legacy zero page
checking.

Jonathan Cameron (4):
  physmem: Rename addr1 to more informative mr_addr in
flatview_read/write() and similar
  physmem: Reduce local variable scope in flatview_read/write_continue()
  physmem: Factor out body of flatview_read/write_continue() loop
  physmem: Fix wrong address in large
address_space_read/write_cached_slow()

Maksim Davydov (1):
  migration/ram: add additional check

Steve Sistare (12):
  migration: export fewer options
  migration: remove migration.h references
  migration: export migration_is_setup_or_active
  migration: export migration_is_active
  migration: export migration_is_running
  migration: export vcpu_dirty_limit_period
  migration: migration_thread_is_self
  migration: migration_is_device
  migration: migration_file_set_error
  migration: privatize colo interfaces
  migration: delete unused accessors
  migration: purge MigrationState from public interface

Yu Zhang (1):
  migration/rdma: Fix a memory issue for migration

 docs/devel/migration/main.rst   |   3 +-
 qapi/migration.json |  38 +++-
 include/hw/qdev-properties-system.h |   4 +
 include/migration/client-options.h  |  25 +++
 include/migration/misc.h|  18 +-
 include/migration/register.h| 267 +---
 include/qemu/typedefs.h |   2 -
 migration/migration.h   |   7 +-
 migration/multifd.h |  23 ++-
 migration/options.h |   7 +-
 migration/ram.h |   3 +-
 hw/core/machine.c   |   4 +-
 hw/core/qdev-properties-system.c|  10 ++
 hw/vfio/common.c|  17 +-
 hw/vfio/container.c |   1 -
 hw/vfio/migration.c |  24 ++-
 hw/virtio/vhost-user.c  |   1 -
 hw/virtio/virtio-balloon.c  |   2 -
 io/channel-file.c   |   5 -
 migration/colo.c|  17 +-
 migration/file.c|   4 +-
 migration/migration-hmp-cmds.c  |   9 +
 migration/migration.c   |  67 ---
 migration/multifd-zero-page.c   |  87 +
 migration/multifd-zlib.c|  21 ++-
 migration/multifd-zstd.c|  20 ++-
 migration/multifd.c | 120 ++---
 migration/options.c |  32 +++-
 migration/qemu-file.c   |   5 +-
 migration/ram.c |  62 +--
 migration/rdma.c|   2 +-
 migration/savevm.c  |  23 +--
 net/colo-compare.c  |   3 +-
 net/vhost-vdpa.c|   3 +-
 stubs/colo.c|   1 -
 system

[PULL 08/34] migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error

2024-03-11 Thread peterx

From: Cédric Le Goater 

When commit bd2270608fa0 ("migration/ram.c: add a notifier chain for
precopy") added PRECOPY_NOTIFY_SETUP notifiers at the end of
qemu_savevm_state_setup(), it didn't take into account a possible
error in the loop calling vmstate_save() or .save_setup() handlers.

Check ret value before calling the notifiers.

Reviewed-by: Peter Xu 
Signed-off-by: Cédric Le Goater 
Link: https://lore.kernel.org/r/20240304122844.1888308-10-...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/savevm.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index e84b26e1c8..76b57a9888 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1317,7 +1317,7 @@ void qemu_savevm_state_setup(QEMUFile *f)
 MigrationState *ms = migrate_get_current();
 SaveStateEntry *se;
 Error *local_err = NULL;
-int ret;
+int ret = 0;
 
 json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
 json_writer_start_array(ms->vmdesc, "devices");
@@ -1351,6 +1351,10 @@ void qemu_savevm_state_setup(QEMUFile *f)
 }
 }
 
+if (ret) {
+return;
+}
+
 if (precopy_notify(PRECOPY_NOTIFY_SETUP, _err)) {
 error_report_err(local_err);
 }
-- 
2.44.0

[PULL 16/34] migration: export fewer options

2024-03-11 Thread peterx

From: Steve Sistare 

A small number of migration options are accessed by migration clients,
but to see them clients must include all of options.h, which is mostly
for migration core code.  migrate_mode() in particular will be needed by
multiple clients.

Refactor the option declarations so clients can see the necessary few via
misc.h, which already exports a portion of the client API.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179319-294320-1-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/client-options.h | 24 
 include/migration/misc.h   |  1 +
 migration/options.h|  6 +-
 hw/vfio/migration.c|  1 -
 hw/virtio/virtio-balloon.c |  1 -
 system/dirtylimit.c|  1 -
 6 files changed, 26 insertions(+), 8 deletions(-)
 create mode 100644 include/migration/client-options.h

diff --git a/include/migration/client-options.h 
b/include/migration/client-options.h
new file mode 100644
index 00..887fea1565
--- /dev/null
+++ b/include/migration/client-options.h
@@ -0,0 +1,24 @@
+/*
+ * QEMU public migration capabilities
+ *
+ * Copyright (c) 2012-2023 Red Hat Inc
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_CLIENT_OPTIONS_H
+#define QEMU_MIGRATION_CLIENT_OPTIONS_H
+
+/* capabilities */
+
+bool migrate_background_snapshot(void);
+bool migrate_dirty_limit(void);
+bool migrate_postcopy_ram(void);
+bool migrate_switchover_ack(void);
+
+/* parameters */
+
+MigMode migrate_mode(void);
+
+#endif
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 5d1aa593ed..4c226a40bb 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -17,6 +17,7 @@
 #include "qemu/notify.h"
 #include "qapi/qapi-types-migration.h"
 #include "qapi/qapi-types-net.h"
+#include "migration/client-options.h"
 
 /* migration/ram.c */
 
diff --git a/migration/options.h b/migration/options.h
index 6ddd8dad9b..b6b69c2bb7 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -16,6 +16,7 @@
 
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"
+#include "migration/client-options.h"
 
 /* migration properties */
 
@@ -24,12 +25,10 @@ extern Property migration_properties[];
 /* capabilities */
 
 bool migrate_auto_converge(void);
-bool migrate_background_snapshot(void);
 bool migrate_block(void);
 bool migrate_colo(void);
 bool migrate_compress(void);
 bool migrate_dirty_bitmaps(void);
-bool migrate_dirty_limit(void);
 bool migrate_events(void);
 bool migrate_mapped_ram(void);
 bool migrate_ignore_shared(void);
@@ -38,11 +37,9 @@ bool migrate_multifd(void);
 bool migrate_pause_before_switchover(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_postcopy_preempt(void);
-bool migrate_postcopy_ram(void);
 bool migrate_rdma_pin_all(void);
 bool migrate_release_ram(void);
 bool migrate_return_path(void);
-bool migrate_switchover_ack(void);
 bool migrate_validate_uuid(void);
 bool migrate_xbzrle(void);
 bool migrate_zero_blocks(void);
@@ -84,7 +81,6 @@ uint8_t migrate_max_cpu_throttle(void);
 uint64_t migrate_max_bandwidth(void);
 uint64_t migrate_avail_switchover_bandwidth(void);
 uint64_t migrate_max_postcopy_bandwidth(void);
-MigMode migrate_mode(void);
 int migrate_multifd_channels(void);
 MultiFDCompression migrate_multifd_compression(void);
 int migrate_multifd_zlib_level(void);
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index f82dcabc49..49c0016add 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -18,7 +18,6 @@
 #include "sysemu/runstate.h"
 #include "hw/vfio/vfio-common.h"
 #include "migration/migration.h"
-#include "migration/options.h"
 #include "migration/savevm.h"
 #include "migration/vmstate.h"
 #include "migration/qemu-file.h"
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 89f853fa9e..a59ff172bd 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -32,7 +32,6 @@
 #include "qemu/error-report.h"
 #include "migration/misc.h"
 #include "migration/migration.h"
-#include "migration/options.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index b5607eb8c2..774ff44f79 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -26,7 +26,6 @@
 #include "trace.h"
 #include "migration/misc.h"
 #include "migration/migration.h"
-#include "migration/options.h"
 
 /*
  * Dirtylimit stop working if dirty page rate error
-- 
2.44.0

[PULL 03/34] vfio/migration: Add a note about migration rate limiting

2024-03-11 Thread peterx

From: Avihai Horon 

VFIO migration buffer size is currently limited to 1MB. Therefore, there
is no need to check if migration rate exceeded, as in the worst case it
will exceed by only 1MB.

However, if the buffer size is later changed to a bigger value,
vfio_save_iterate() should enforce migration rate (similar to migration
RAM code).

Add a note about this in vfio_save_iterate() to serve as a reminder.

Suggested-by: Peter Xu 
Signed-off-by: Avihai Horon 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240304105339.20713-4-avih...@nvidia.com
Signed-off-by: Peter Xu 
---
 hw/vfio/migration.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 0af783a589..f82dcabc49 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -505,6 +505,12 @@ static bool vfio_is_active_iterate(void *opaque)
 return vfio_device_state_is_precopy(vbasedev);
 }
 
+/*
+ * Note about migration rate limiting: VFIO migration buffer size is currently
+ * limited to 1MB, so there is no need to check if migration rate exceeded (as
+ * in the worst case it will exceed by 1MB). However, if the buffer size is
+ * later changed to a bigger value, migration rate should be enforced here.
+ */
 static int vfio_save_iterate(QEMUFile *f, void *opaque)
 {
 VFIODevice *vbasedev = opaque;
-- 
2.44.0

[PULL 04/34] migration/ram: add additional check

2024-03-11 Thread peterx

From: Maksim Davydov 

If a migration stream is broken, the address and flag reading can return
zero. Thus, an irrelevant flag error will be returned instead of EIO.
It can be fixed by additional check after the reading.

Signed-off-by: Maksim Davydov 
Link: 
https://lore.kernel.org/r/20240304144203.158477-1-davydov-...@yandex-team.ru
Signed-off-by: Peter Xu 
---
 migration/ram.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 003c28e133..2cd936d9ce 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4214,6 +4214,12 @@ static int ram_load_precopy(QEMUFile *f)
 i++;
 
 addr = qemu_get_be64(f);
+ret = qemu_file_get_error(f);
+if (ret) {
+error_report("Getting RAM address failed");
+break;
+}
+
 flags = addr & ~TARGET_PAGE_MASK;
 addr &= TARGET_PAGE_MASK;
 
-- 
2.44.0

[PULL 02/34] vfio/migration: Refactor vfio_save_state() return value

2024-03-11 Thread peterx

From: Avihai Horon 

Currently, vfio_save_state() returns 1 regardless of whether there is
more data to send or not. This was done to prevent a fast changing VFIO
device from potentially blocking other devices from sending their data,
as qemu_savevm_state_iterate() serialized devices.

Now that qemu_savevm_state_iterate() no longer serializes devices, there
is no need for that.

Refactor vfio_save_state() to return 0 if more data is available and 1
if no more data is available.

Signed-off-by: Avihai Horon 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240304105339.20713-3-avih...@nvidia.com
Signed-off-by: Peter Xu 
---
 hw/vfio/migration.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 50140eda87..0af783a589 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -529,11 +529,7 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
 trace_vfio_save_iterate(vbasedev->name, migration->precopy_init_size,
 migration->precopy_dirty_size);
 
-/*
- * A VFIO device's pre-copy dirty_bytes is not guaranteed to reach zero.
- * Return 1 so following handlers will not be potentially blocked.
- */
-return 1;
+return !migration->precopy_init_size && !migration->precopy_dirty_size;
 }
 
 static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
-- 
2.44.0

[PULL 19/34] migration: export migration_is_active

2024-03-11 Thread peterx

From: Steve Sistare 

Delete the MigrationState parameter from migration_is_active so it
can be exported and used without including migration.h.

Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1710179338-294359-4-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h |  2 +-
 hw/vfio/common.c |  4 ++--
 migration/migration.c| 10 ++
 system/dirtylimit.c  |  2 +-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 79cff6224e..e1f1bf853e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,7 +60,7 @@ void dump_vmstate_json_to_file(FILE *out_fp);
 void migration_object_init(void);
 void migration_shutdown(void);
 bool migration_is_idle(void);
-bool migration_is_active(MigrationState *);
+bool migration_is_active(void);
 bool migration_is_setup_or_active(void);
 bool migrate_mode_is_cpr(MigrationState *);
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 896eab8103..2dbbf62e15 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -182,7 +182,7 @@ static bool 
vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
 VFIODevice *vbasedev;
 MigrationState *ms = migrate_get_current();
 
-if (ms->state != MIGRATION_STATUS_ACTIVE &&
+if (!migration_is_active() &&
 ms->state != MIGRATION_STATUS_DEVICE) {
 return false;
 }
@@ -225,7 +225,7 @@ vfio_devices_all_running_and_mig_active(const 
VFIOContainerBase *bcontainer)
 {
 VFIODevice *vbasedev;
 
-if (!migration_is_active(migrate_get_current())) {
+if (!migration_is_active()) {
 return false;
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index af21403bad..17859cbaee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1406,7 +1406,7 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_fclose(tmp);
 }
 
-assert(!migration_is_active(s));
+assert(!migration_is_active());
 
 if (s->state == MIGRATION_STATUS_CANCELLING) {
 migrate_set_state(>state, MIGRATION_STATUS_CANCELLING,
@@ -1637,8 +1637,10 @@ bool migration_is_idle(void)
 return false;
 }
 
-bool migration_is_active(MigrationState *s)
+bool migration_is_active(void)
 {
+MigrationState *s = current_migration;
+
 return (s->state == MIGRATION_STATUS_ACTIVE ||
 s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 }
@@ -3461,7 +3463,7 @@ static void *migration_thread(void *opaque)
 
 trace_migration_thread_setup_complete();
 
-while (migration_is_active(s)) {
+while (migration_is_active()) {
 if (urgent || !migration_rate_exceeded(s->to_dst_file)) {
 MigIterateState iter_state = migration_iteration_run(s);
 if (iter_state == MIG_ITERATE_SKIP) {
@@ -3607,7 +3609,7 @@ static void *bg_migration_thread(void *opaque)
 migration_bh_schedule(bg_migration_vm_start_bh, s);
 bql_unlock();
 
-while (migration_is_active(s)) {
+while (migration_is_active()) {
 MigIterateState iter_state = bg_migration_iteration_run(s);
 if (iter_state == MIG_ITERATE_SKIP) {
 continue;
diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index 774ff44f79..051e0311c1 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -83,7 +83,7 @@ static void vcpu_dirty_rate_stat_collect(void)
 int64_t period = DIRTYLIMIT_CALC_TIME_MS;
 
 if (migrate_dirty_limit() &&
-migration_is_active(s)) {
+migration_is_active()) {
 period = s->parameters.x_vcpu_dirty_limit_period;
 }
 
-- 
2.44.0

[PULL 14/34] physmem: Fix wrong address in large address_space_read/write_cached_slow()

2024-03-11 Thread peterx

From: Jonathan Cameron 

If the access is bigger than the MemoryRegion supports,
flatview_read/write_continue() will attempt to update the Memory Region.
but the address passed to flatview_translate() is relative to the cache, not
to the FlatView.

On arm/virt with interleaved CXL memory emulation and virtio-blk-pci this
lead to the first part of descriptor being read from the CXL memory and the
second part from PA 0x8 which happens to be a blank region
of a flash chip and all ffs on this particular configuration.
Note this test requires the out of tree ARM support for CXL, but
the problem is more general.

Avoid this by adding new address_space_read_continue_cached()
and address_space_write_continue_cached() which share all the logic
with the flatview versions except for the MemoryRegion lookup which
is unnecessary as the MemoryRegionCache only covers one MemoryRegion.

Signed-off-by: Jonathan Cameron 
Link: 
https://lore.kernel.org/r/20240307153710.30907-5-jonathan.came...@huawei.com
Signed-off-by: Peter Xu 
---
 system/physmem.c | 63 +++-
 1 file changed, 57 insertions(+), 6 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index 737869a3f5..6cfb7a80ab 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -3370,6 +3370,59 @@ static inline MemoryRegion 
*address_space_translate_cached(
 return section.mr;
 }
 
+/* Called within RCU critical section.  */
+static MemTxResult address_space_write_continue_cached(MemTxAttrs attrs,
+   const void *ptr,
+   hwaddr len,
+   hwaddr mr_addr,
+   hwaddr l,
+   MemoryRegion *mr)
+{
+MemTxResult result = MEMTX_OK;
+const uint8_t *buf = ptr;
+
+for (;;) {
+result |= flatview_write_continue_step(attrs, buf, len, mr_addr, ,
+   mr);
+
+len -= l;
+buf += l;
+mr_addr += l;
+
+if (!len) {
+break;
+}
+
+l = len;
+}
+
+return result;
+}
+
+/* Called within RCU critical section.  */
+static MemTxResult address_space_read_continue_cached(MemTxAttrs attrs,
+  void *ptr, hwaddr len,
+  hwaddr mr_addr, hwaddr l,
+  MemoryRegion *mr)
+{
+MemTxResult result = MEMTX_OK;
+uint8_t *buf = ptr;
+
+for (;;) {
+result |= flatview_read_continue_step(attrs, buf, len, mr_addr, , 
mr);
+len -= l;
+buf += l;
+mr_addr += l;
+
+if (!len) {
+break;
+}
+l = len;
+}
+
+return result;
+}
+
 /* Called from RCU critical section. address_space_read_cached uses this
  * out of line function when the target is an MMIO or IOMMU region.
  */
@@ -3383,9 +3436,8 @@ address_space_read_cached_slow(MemoryRegionCache *cache, 
hwaddr addr,
 l = len;
 mr = address_space_translate_cached(cache, addr, _addr, , false,
 MEMTXATTRS_UNSPECIFIED);
-return flatview_read_continue(cache->fv,
-  addr, MEMTXATTRS_UNSPECIFIED, buf, len,
-  mr_addr, l, mr);
+return address_space_read_continue_cached(MEMTXATTRS_UNSPECIFIED,
+  buf, len, mr_addr, l, mr);
 }
 
 /* Called from RCU critical section. address_space_write_cached uses this
@@ -3401,9 +3453,8 @@ address_space_write_cached_slow(MemoryRegionCache *cache, 
hwaddr addr,
 l = len;
 mr = address_space_translate_cached(cache, addr, _addr, , true,
 MEMTXATTRS_UNSPECIFIED);
-return flatview_write_continue(cache->fv,
-   addr, MEMTXATTRS_UNSPECIFIED, buf, len,
-   mr_addr, l, mr);
+return address_space_write_continue_cached(MEMTXATTRS_UNSPECIFIED,
+   buf, len, mr_addr, l, mr);
 }
 
 #define ARG1_DECLMemoryRegionCache *cache
-- 
2.44.0

[PULL 13/34] physmem: Factor out body of flatview_read/write_continue() loop

2024-03-11 Thread peterx

From: Jonathan Cameron 

This code will be reused for the address_space_cached accessors
shortly.

Also reduce scope of result variable now we aren't directly
calling this in the loop.

Signed-off-by: Jonathan Cameron 
Reviewed-by: David Hildenbrand 
Link: 
https://lore.kernel.org/r/20240307153710.30907-4-jonathan.came...@huawei.com
Signed-off-by: Peter Xu 
---
 system/physmem.c | 169 +++
 1 file changed, 99 insertions(+), 70 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index e35aa29343..737869a3f5 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2681,6 +2681,56 @@ static bool flatview_access_allowed(MemoryRegion *mr, 
MemTxAttrs attrs,
 return false;
 }
 
+static MemTxResult flatview_write_continue_step(MemTxAttrs attrs,
+const uint8_t *buf,
+hwaddr len, hwaddr mr_addr,
+hwaddr *l, MemoryRegion *mr)
+{
+if (!flatview_access_allowed(mr, attrs, mr_addr, *l)) {
+return MEMTX_ACCESS_ERROR;
+}
+
+if (!memory_access_is_direct(mr, true)) {
+uint64_t val;
+MemTxResult result;
+bool release_lock = prepare_mmio_access(mr);
+
+*l = memory_access_size(mr, *l, mr_addr);
+/*
+ * XXX: could force current_cpu to NULL to avoid
+ * potential bugs
+ */
+
+/*
+ * Assure Coverity (and ourselves) that we are not going to OVERRUN
+ * the buffer by following ldn_he_p().
+ */
+#ifdef QEMU_STATIC_ANALYSIS
+assert((*l == 1 && len >= 1) ||
+   (*l == 2 && len >= 2) ||
+   (*l == 4 && len >= 4) ||
+   (*l == 8 && len >= 8));
+#endif
+val = ldn_he_p(buf, *l);
+result = memory_region_dispatch_write(mr, mr_addr, val,
+  size_memop(*l), attrs);
+if (release_lock) {
+bql_unlock();
+}
+
+return result;
+} else {
+/* RAM case */
+uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, l,
+   false);
+
+memmove(ram_ptr, buf, *l);
+invalidate_and_set_dirty(mr, mr_addr, *l);
+
+return MEMTX_OK;
+}
+}
+
 /* Called within RCU critical section.  */
 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
MemTxAttrs attrs,
@@ -2692,44 +2742,8 @@ static MemTxResult flatview_write_continue(FlatView *fv, 
hwaddr addr,
 const uint8_t *buf = ptr;
 
 for (;;) {
-if (!flatview_access_allowed(mr, attrs, mr_addr, l)) {
-result |= MEMTX_ACCESS_ERROR;
-/* Keep going. */
-} else if (!memory_access_is_direct(mr, true)) {
-uint64_t val;
-bool release_lock = prepare_mmio_access(mr);
-
-l = memory_access_size(mr, l, mr_addr);
-/* XXX: could force current_cpu to NULL to avoid
-   potential bugs */
-
-/*
- * Assure Coverity (and ourselves) that we are not going to OVERRUN
- * the buffer by following ldn_he_p().
- */
-#ifdef QEMU_STATIC_ANALYSIS
-assert((l == 1 && len >= 1) ||
-   (l == 2 && len >= 2) ||
-   (l == 4 && len >= 4) ||
-   (l == 8 && len >= 8));
-#endif
-val = ldn_he_p(buf, l);
-result |= memory_region_dispatch_write(mr, mr_addr, val,
-   size_memop(l), attrs);
-if (release_lock) {
-bql_unlock();
-}
-
-
-} else {
-/* RAM case */
-
-uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, ,
-   false);
-
-memmove(ram_ptr, buf, l);
-invalidate_and_set_dirty(mr, mr_addr, l);
-}
+result |= flatview_write_continue_step(attrs, buf, len, mr_addr, ,
+   mr);
 
 len -= l;
 buf += l;
@@ -2763,6 +2777,52 @@ static MemTxResult flatview_write(FlatView *fv, hwaddr 
addr, MemTxAttrs attrs,
mr_addr, l, mr);
 }
 
+static MemTxResult flatview_read_continue_step(MemTxAttrs attrs, uint8_t *buf,
+   hwaddr len, hwaddr mr_addr,
+   hwaddr *l,
+   MemoryRegion *mr)
+{
+if (!flatview_access_allowed(mr, attrs, mr_addr, *l)) {
+return MEMTX_ACCESS_ERROR;
+}
+
+if (!memory_access_is_direct(mr, false)) {
+/* I/O case */
+uint64_t val;
+MemTxResult result;
+bool release_lock = prepare_mmio_access(mr);
+
+*l = memory_access_size(mr,

[PULL 11/34] physmem: Rename addr1 to more informative mr_addr in flatview_read/write() and similar

2024-03-11 Thread peterx

From: Jonathan Cameron 

The calls to flatview_read/write[_continue]() have parameters addr and
addr1 but the names give no indication of what they are addresses of.
Rename addr1 to mr_addr to reflect that it is the translated address
offset within the MemoryRegion returned by flatview_translate().
Similarly rename the parameter in address_space_read/write_cached_slow()

Suggested-by: Peter Xu 
Signed-off-by: Jonathan Cameron 
Reviewed-by: David Hildenbrand 
Link: 
https://lore.kernel.org/r/20240307153710.30907-2-jonathan.came...@huawei.com
Signed-off-by: Peter Xu 
---
 system/physmem.c | 50 
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index 6e9ed97597..e92bed50a6 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2685,7 +2685,7 @@ static bool flatview_access_allowed(MemoryRegion *mr, 
MemTxAttrs attrs,
 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
MemTxAttrs attrs,
const void *ptr,
-   hwaddr len, hwaddr addr1,
+   hwaddr len, hwaddr mr_addr,
hwaddr l, MemoryRegion *mr)
 {
 uint8_t *ram_ptr;
@@ -2695,12 +2695,12 @@ static MemTxResult flatview_write_continue(FlatView 
*fv, hwaddr addr,
 const uint8_t *buf = ptr;
 
 for (;;) {
-if (!flatview_access_allowed(mr, attrs, addr1, l)) {
+if (!flatview_access_allowed(mr, attrs, mr_addr, l)) {
 result |= MEMTX_ACCESS_ERROR;
 /* Keep going. */
 } else if (!memory_access_is_direct(mr, true)) {
 release_lock |= prepare_mmio_access(mr);
-l = memory_access_size(mr, l, addr1);
+l = memory_access_size(mr, l, mr_addr);
 /* XXX: could force current_cpu to NULL to avoid
potential bugs */
 
@@ -2715,13 +2715,13 @@ static MemTxResult flatview_write_continue(FlatView 
*fv, hwaddr addr,
(l == 8 && len >= 8));
 #endif
 val = ldn_he_p(buf, l);
-result |= memory_region_dispatch_write(mr, addr1, val,
+result |= memory_region_dispatch_write(mr, mr_addr, val,
size_memop(l), attrs);
 } else {
 /* RAM case */
-ram_ptr = qemu_ram_ptr_length(mr->ram_block, addr1, , false);
+ram_ptr = qemu_ram_ptr_length(mr->ram_block, mr_addr, , false);
 memmove(ram_ptr, buf, l);
-invalidate_and_set_dirty(mr, addr1, l);
+invalidate_and_set_dirty(mr, mr_addr, l);
 }
 
 if (release_lock) {
@@ -2738,7 +2738,7 @@ static MemTxResult flatview_write_continue(FlatView *fv, 
hwaddr addr,
 }
 
 l = len;
-mr = flatview_translate(fv, addr, , , true, attrs);
+mr = flatview_translate(fv, addr, _addr, , true, attrs);
 }
 
 return result;
@@ -2749,22 +2749,22 @@ static MemTxResult flatview_write(FlatView *fv, hwaddr 
addr, MemTxAttrs attrs,
   const void *buf, hwaddr len)
 {
 hwaddr l;
-hwaddr addr1;
+hwaddr mr_addr;
 MemoryRegion *mr;
 
 l = len;
-mr = flatview_translate(fv, addr, , , true, attrs);
+mr = flatview_translate(fv, addr, _addr, , true, attrs);
 if (!flatview_access_allowed(mr, attrs, addr, len)) {
 return MEMTX_ACCESS_ERROR;
 }
 return flatview_write_continue(fv, addr, attrs, buf, len,
-   addr1, l, mr);
+   mr_addr, l, mr);
 }
 
 /* Called within RCU critical section.  */
 MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr,
MemTxAttrs attrs, void *ptr,
-   hwaddr len, hwaddr addr1, hwaddr l,
+   hwaddr len, hwaddr mr_addr, hwaddr l,
MemoryRegion *mr)
 {
 uint8_t *ram_ptr;
@@ -2775,14 +2775,14 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr 
addr,
 
 fuzz_dma_read_cb(addr, len, mr);
 for (;;) {
-if (!flatview_access_allowed(mr, attrs, addr1, l)) {
+if (!flatview_access_allowed(mr, attrs, mr_addr, l)) {
 result |= MEMTX_ACCESS_ERROR;
 /* Keep going. */
 } else if (!memory_access_is_direct(mr, false)) {
 /* I/O case */
 release_lock |= prepare_mmio_access(mr);
-l = memory_access_size(mr, l, addr1);
-result |= memory_region_dispatch_read(mr, addr1, ,
+l = memory_access_size(mr, l, mr_addr);
+result |= memory_region_dispatch_read(mr, mr_addr, ,
   size_memop(l), attrs);
 
 /*
@@ -2798,7 +2798,7 @@ MemTxResult

[PULL 24/27] migration/multifd: Support incoming mapped-ram stream format

2024-03-03 Thread peterx

From: Fabiano Rosas 

For the incoming mapped-ram migration we need to read the ramblock
headers, get the pages bitmap and send the host address of each
non-zero page to the multifd channel thread for writing.

Usage on HMP is:

(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability mapped-ram on
(qemu) migrate_incoming file:migfile

(the ram.h include needs to move because we've been previously relying
on it being included from migration.c. Now file.h will start including
multifd.h before migration.o is processed)

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-22-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/file.h|  2 ++
 migration/multifd.h |  2 ++
 migration/file.c| 18 +-
 migration/multifd.c | 31 ---
 migration/ram.c | 26 --
 5 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/migration/file.h b/migration/file.h
index 01a338cac7..9f71e87f74 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -11,6 +11,7 @@
 #include "qapi/qapi-types-migration.h"
 #include "io/task.h"
 #include "channel.h"
+#include "multifd.h"
 
 void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp);
 
@@ -21,4 +22,5 @@ void file_cleanup_outgoing_migration(void);
 bool file_send_channel_create(gpointer opaque, Error **errp);
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
 int niov, RAMBlock *block, Error **errp);
+int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp);
 #endif
diff --git a/migration/multifd.h b/migration/multifd.h
index db8887f088..7447c2bea3 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -13,6 +13,8 @@
 #ifndef QEMU_MIGRATION_MULTIFD_H
 #define QEMU_MIGRATION_MULTIFD_H
 
+#include "ram.h"
+
 typedef struct MultiFDRecvData MultiFDRecvData;
 
 bool multifd_send_setup(void);
diff --git a/migration/file.c b/migration/file.c
index d949a941d0..499d2782fe 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -13,7 +13,6 @@
 #include "channel.h"
 #include "file.h"
 #include "migration.h"
-#include "multifd.h"
 #include "io/channel-file.h"
 #include "io/channel-util.h"
 #include "options.h"
@@ -204,3 +203,20 @@ int file_write_ramblock_iov(QIOChannel *ioc, const struct 
iovec *iov,
 
 return (ret < 0) ? ret : 0;
 }
+
+int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp)
+{
+MultiFDRecvData *data = p->data;
+size_t ret;
+
+ret = qio_channel_pread(p->c, (char *) data->opaque,
+data->size, data->file_offset, errp);
+if (ret != data->size) {
+error_prepend(errp,
+  "multifd recv (%u): read 0x%zx, expected 0x%zx",
+  p->id, ret, data->size);
+return -1;
+}
+
+return 0;
+}
diff --git a/migration/multifd.c b/migration/multifd.c
index 8118145428..419feb7df1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -18,7 +18,6 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "file.h"
-#include "ram.h"
 #include "migration.h"
 #include "migration-stats.h"
 #include "socket.h"
@@ -251,7 +250,7 @@ static int nocomp_recv(MultiFDRecvParams *p, Error **errp)
 uint32_t flags;
 
 if (!multifd_use_packets()) {
-return 0;
+return multifd_file_recv_data(p, errp);
 }
 
 flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
@@ -1331,22 +1330,48 @@ void multifd_recv_cleanup(void)
 void multifd_recv_sync_main(void)
 {
 int thread_count = migrate_multifd_channels();
+bool file_based = !multifd_use_packets();
 int i;
 
-if (!migrate_multifd() || !multifd_use_packets()) {
+if (!migrate_multifd()) {
 return;
 }
 
+/*
+ * File-based channels don't use packets and therefore need to
+ * wait for more work. Release them to start the sync.
+ */
+if (file_based) {
+for (i = 0; i < thread_count; i++) {
+MultiFDRecvParams *p = _recv_state->params[i];
+
+trace_multifd_recv_sync_main_signal(p->id);
+qemu_sem_post(>sem);
+}
+}
+
 /*
  * Initiate the synchronization by waiting for all channels.
+ *
  * For socket-based migration this means each channel has received
  * the SYNC packet on the stream.
+ *
+ * For file-based migration this means each channel is done with
+ * the work (pending_job=false).
  */
 for (i = 0; i < thread_count; i++) {
 trace_multifd_recv_sync_main_wait(i);
 qemu_sem_wait(_recv_state->sem_sync);
 }
 
+if (file_based) {
+/*
+ * For file-based loading is done in one iteration. We're
+ * done.
+ */
+return;
+}
+
 /*
  * Sync done. Release the channels for the next iteration.
  */
diff --git a/migration/ram.c b/migration/ram.c
index 87cb73fd76..1f1b5297cf 100644
---

[PULL 18/27] migration/multifd: Allow receiving pages without packets

2024-03-03 Thread peterx

From: Fabiano Rosas 

Currently multifd does not need to have knowledge of pages on the
receiving side because all the information needed is within the
packets that come in the stream.

We're about to add support to mapped-ram migration, which cannot use
packets because it expects the ramblock section in the migration file
to contain only the guest pages data.

Add a data structure to transfer pages between the ram migration code
and the multifd receiving threads.

We don't want to reuse MultiFDPages_t for two reasons:

a) multifd threads don't really need to know about the data they're
   receiving.

b) the receiving side has to be stopped to load the pages, which means
   we can experiment with larger granularities than page size when
   transferring data.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-16-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.h |  15 ++
 migration/file.c|   1 +
 migration/multifd.c | 129 +---
 3 files changed, 138 insertions(+), 7 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 6a54377cc1..1be985978e 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -13,6 +13,8 @@
 #ifndef QEMU_MIGRATION_MULTIFD_H
 #define QEMU_MIGRATION_MULTIFD_H
 
+typedef struct MultiFDRecvData MultiFDRecvData;
+
 bool multifd_send_setup(void);
 void multifd_send_shutdown(void);
 int multifd_recv_setup(Error **errp);
@@ -23,6 +25,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
 int multifd_send_sync_main(void);
 bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
+bool multifd_recv(void);
+MultiFDRecvData *multifd_get_recv_data(void);
 
 /* Multifd Compression flags */
 #define MULTIFD_FLAG_SYNC (1 << 0)
@@ -63,6 +67,13 @@ typedef struct {
 RAMBlock *block;
 } MultiFDPages_t;
 
+struct MultiFDRecvData {
+void *opaque;
+size_t size;
+/* for preadv */
+off_t file_offset;
+};
+
 typedef struct {
 /* Fields are only written at creating/deletion time */
 /* No lock required for them, they are read only */
@@ -152,6 +163,8 @@ typedef struct {
 
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
+/* sem where to wait for more work */
+QemuSemaphore sem;
 
 /* this mutex protects the following parameters */
 QemuMutex mutex;
@@ -161,6 +174,8 @@ typedef struct {
 uint32_t flags;
 /* global number of generated multifd packets */
 uint64_t packet_num;
+int pending_job;
+MultiFDRecvData *data;
 
 /* thread local variables. No locking required */
 
diff --git a/migration/file.c b/migration/file.c
index 5d4975f43e..22d052a71f 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -6,6 +6,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "exec/ramblock.h"
 #include "qemu/cutils.h"
 #include "qapi/error.h"
 #include "channel.h"
diff --git a/migration/multifd.c b/migration/multifd.c
index 8c43424c81..d470af73ba 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -81,9 +81,13 @@ struct {
 
 struct {
 MultiFDRecvParams *params;
+MultiFDRecvData *data;
 /* number of created threads */
 int count;
-/* syncs main thread and channels */
+/*
+ * This is always posted by the recv threads, the migration thread
+ * uses it to wait for recv threads to finish assigned tasks.
+ */
 QemuSemaphore sem_sync;
 /* global number of generated multifd packets */
 uint64_t packet_num;
@@ -1119,6 +1123,57 @@ bool multifd_send_setup(void)
 return true;
 }
 
+bool multifd_recv(void)
+{
+int i;
+static int next_recv_channel;
+MultiFDRecvParams *p = NULL;
+MultiFDRecvData *data = multifd_recv_state->data;
+
+/*
+ * next_channel can remain from a previous migration that was
+ * using more channels, so ensure it doesn't overflow if the
+ * limit is lower now.
+ */
+next_recv_channel %= migrate_multifd_channels();
+for (i = next_recv_channel;; i = (i + 1) % migrate_multifd_channels()) {
+if (multifd_recv_should_exit()) {
+return false;
+}
+
+p = _recv_state->params[i];
+
+if (qatomic_read(>pending_job) == false) {
+next_recv_channel = (i + 1) % migrate_multifd_channels();
+break;
+}
+}
+
+/*
+ * Order pending_job read before manipulating p->data below. Pairs
+ * with qatomic_store_release() at multifd_recv_thread().
+ */
+smp_mb_acquire();
+
+assert(!p->data->size);
+multifd_recv_state->data = p->data;
+p->data = data;
+
+/*
+ * Order p->data update before setting pending_job. Pairs with
+ * qatomic_load_acquire() at multifd_recv_thread().
+ */
+qatomic_store_release(>pending_job, true);
+qemu_sem_post(>sem);
+
+return true;
+}
+
+MultiFDRecvData *multifd_get_recv_data(void)
+{
+return

[PULL 07/27] io: implement io_pwritev/preadv for QIOChannelFile

2024-03-03 Thread peterx

From: Nikolay Borisov 

The upcoming 'mapped-ram' feature will require qemu to write data to
(and restore from) specific offsets of the migration file.

Add a minimal implementation of pwritev/preadv and expose them via the
io_pwritev and io_preadv interfaces.

Signed-off-by: Nikolay Borisov 
Reviewed-by: "Daniel P. Berrangé" 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-5-faro...@suse.de
Signed-off-by: Peter Xu 
---
 io/channel-file.c | 56 +++
 1 file changed, 56 insertions(+)

diff --git a/io/channel-file.c b/io/channel-file.c
index f91bf6db1c..a6ad7770c6 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -146,6 +146,58 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
 return ret;
 }
 
+#ifdef CONFIG_PREADV
+static ssize_t qio_channel_file_preadv(QIOChannel *ioc,
+   const struct iovec *iov,
+   size_t niov,
+   off_t offset,
+   Error **errp)
+{
+QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+ssize_t ret;
+
+ retry:
+ret = preadv(fioc->fd, iov, niov, offset);
+if (ret < 0) {
+if (errno == EAGAIN) {
+return QIO_CHANNEL_ERR_BLOCK;
+}
+if (errno == EINTR) {
+goto retry;
+}
+
+error_setg_errno(errp, errno, "Unable to read from file");
+return -1;
+}
+
+return ret;
+}
+
+static ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
+const struct iovec *iov,
+size_t niov,
+off_t offset,
+Error **errp)
+{
+QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+ssize_t ret;
+
+ retry:
+ret = pwritev(fioc->fd, iov, niov, offset);
+if (ret <= 0) {
+if (errno == EAGAIN) {
+return QIO_CHANNEL_ERR_BLOCK;
+}
+if (errno == EINTR) {
+goto retry;
+}
+error_setg_errno(errp, errno, "Unable to write to file");
+return -1;
+}
+return ret;
+}
+#endif /* CONFIG_PREADV */
+
 static int qio_channel_file_set_blocking(QIOChannel *ioc,
  bool enabled,
  Error **errp)
@@ -231,6 +283,10 @@ static void qio_channel_file_class_init(ObjectClass *klass,
 ioc_klass->io_writev = qio_channel_file_writev;
 ioc_klass->io_readv = qio_channel_file_readv;
 ioc_klass->io_set_blocking = qio_channel_file_set_blocking;
+#ifdef CONFIG_PREADV
+ioc_klass->io_pwritev = qio_channel_file_pwritev;
+ioc_klass->io_preadv = qio_channel_file_preadv;
+#endif
 ioc_klass->io_seek = qio_channel_file_seek;
 ioc_klass->io_close = qio_channel_file_close;
 ioc_klass->io_create_watch = qio_channel_file_create_watch;
-- 
2.44.0

[PULL 12/27] migration/ram: Add outgoing 'mapped-ram' migration

2024-03-03 Thread peterx

From: Fabiano Rosas 

Implement the outgoing migration side for the 'mapped-ram' capability.

A bitmap is introduced to track which pages have been written in the
migration file. Pages are written at a fixed location for every
ramblock. Zero pages are ignored as they'd be zero in the destination
migration as well.

The migration stream is altered to put the dirty pages for a ramblock
after its header instead of having a sequential stream of pages that
follow the ramblock headers.

Without mapped-ram (current):With mapped-ram (new):

 -   
 | ramblock 1 header |   | ramblock 1 header|
 -   
 | ramblock 2 header |   | ramblock 1 mapped-ram header |
 -   
 | ...   |   | padding to next 1MB boundary |
 -   | ...  |
 | ramblock n header |   
 -   | ramblock 1 pages |
 | RAM_SAVE_FLAG_EOS |   | ...  |
 -   
 | stream of pages   |   | ramblock 2 header|
 | (iter 1)  |   
 | ...   |   | ramblock 2 mapped-ram header |
 -   
 | RAM_SAVE_FLAG_EOS |   | padding to next 1MB boundary |
 -   | ...  |
 | stream of pages   |   
 | (iter 2)  |   | ramblock 2 pages |
 | ...   |   | ...  |
 -   
 | ...   |   | ...  |
 -   
 | RAM_SAVE_FLAG_EOS|
 
 | ...  |
 

where:
 - ramblock header: the generic information for a ramblock, such as
   idstr, used_len, etc.

 - ramblock mapped-ram header: the new information added by this
   feature: bitmap of pages written, bitmap size and offset of pages
   in the migration file.

Signed-off-by: Nikolay Borisov 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-10-faro...@suse.de
Signed-off-by: Peter Xu 
---
 include/exec/ramblock.h |  13 
 migration/ram.c | 131 +---
 2 files changed, 135 insertions(+), 9 deletions(-)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 3eb79723c6..848915ea5b 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -44,6 +44,19 @@ struct RAMBlock {
 size_t page_size;
 /* dirty bitmap used during migration */
 unsigned long *bmap;
+
+/*
+ * Below fields are only used by mapped-ram migration
+ */
+/* bitmap of pages present in the migration file */
+unsigned long *file_bmap;
+/*
+ * offset in the file pages belonging to this ramblock are saved,
+ * used only during migration to a file.
+ */
+off_t bitmap_offset;
+uint64_t pages_offset;
+
 /* bitmap of already received pages in postcopy */
 unsigned long *receivedmap;
 
diff --git a/migration/ram.c b/migration/ram.c
index 45a00b45ed..f807824d49 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -94,6 +94,18 @@
 #define RAM_SAVE_FLAG_MULTIFD_FLUSH0x200
 /* We can't use any flag that is bigger than 0x200 */
 
+/*
+ * mapped-ram migration supports O_DIRECT, so we need to make sure the
+ * userspace buffer, the IO operation size and the file offset are
+ * aligned according to the underlying device's block size. The first
+ * two are already aligned to page size, but we need to add padding to
+ * the file to align the offset.  We cannot read the block size
+ * dynamically because the migration file can be moved between
+ * different systems, so use 1M to cover most block sizes and to keep
+ * the file offset aligned at page size as well.
+ */
+#define MAPPED_RAM_FILE_OFFSET_ALIGNMENT 0x10
+
 XBZRLECacheStats xbzrle_counters;
 
 /* used by the search for pages to send */
@@ -1126,12 +1138,18 @@ static int save_zero_page(RAMState *rs, 
PageSearchStatus *pss,
 return 0;
 }
 
+stat64_add(_stats.zero_pages, 1);
+
+if (migrate_mapped_ram()) {
+/* zero pages are not transferred with mapped-ram */
+clear_bit(offset >>

[PULL 03/27] tests/migration: Set compression level in migration tests

2024-03-03 Thread peterx

From: Bryan Zhang 

Adds calls to set compression level for `zstd` and `zlib` migration
tests, just to make sure that the calls work.

Signed-off-by: Bryan Zhang 
Link: 
https://lore.kernel.org/r/20240301035901.4006936-3-bryan.zh...@bytedance.com
Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 83512bce85..8c35f3457b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2664,6 +2664,13 @@ static void *
 test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
 QTestState *to)
 {
+/*
+ * Overloading this test to also check that set_parameter does not error.
+ * This is also done in the tests for the other compression methods.
+ */
+migrate_set_parameter_int(from, "multifd-zlib-level", 2);
+migrate_set_parameter_int(to, "multifd-zlib-level", 2);
+
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "zlib");
 }
 
@@ -2672,6 +2679,9 @@ static void *
 test_migrate_precopy_tcp_multifd_zstd_start(QTestState *from,
 QTestState *to)
 {
+migrate_set_parameter_int(from, "multifd-zstd-level", 2);
+migrate_set_parameter_int(to, "multifd-zstd-level", 2);
+
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "zstd");
 }
 #endif /* CONFIG_ZSTD */
-- 
2.44.0

[PULL 13/27] migration/ram: Add incoming 'mapped-ram' migration

2024-03-03 Thread peterx

From: Fabiano Rosas 

Add the necessary code to parse the format changes for the
'mapped-ram' capability.

One of the more notable changes in behavior is that in the
'mapped-ram' case ram pages are restored in one go rather than
constantly looping through the migration stream.

Signed-off-by: Nikolay Borisov 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-11-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/ram.c | 143 +++-
 1 file changed, 141 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index f807824d49..18620784c6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -106,6 +106,12 @@
  */
 #define MAPPED_RAM_FILE_OFFSET_ALIGNMENT 0x10
 
+/*
+ * When doing mapped-ram migration, this is the amount we read from
+ * the pages region in the migration file at a time.
+ */
+#define MAPPED_RAM_LOAD_BUF_SIZE 0x10
+
 XBZRLECacheStats xbzrle_counters;
 
 /* used by the search for pages to send */
@@ -2998,6 +3004,35 @@ static void mapped_ram_setup_ramblock(QEMUFile *file, 
RAMBlock *block)
 qemu_set_offset(file, block->pages_offset + block->used_length, SEEK_SET);
 }
 
+static bool mapped_ram_read_header(QEMUFile *file, MappedRamHeader *header,
+   Error **errp)
+{
+size_t ret, header_size = sizeof(MappedRamHeader);
+
+ret = qemu_get_buffer(file, (uint8_t *)header, header_size);
+if (ret != header_size) {
+error_setg(errp, "Could not read whole mapped-ram migration header "
+   "(expected %zd, got %zd bytes)", header_size, ret);
+return false;
+}
+
+/* migration stream is big-endian */
+header->version = be32_to_cpu(header->version);
+
+if (header->version > MAPPED_RAM_HDR_VERSION) {
+error_setg(errp, "Migration mapped-ram capability version not "
+   "supported (expected <= %d, got %d)", 
MAPPED_RAM_HDR_VERSION,
+   header->version);
+return false;
+}
+
+header->page_size = be64_to_cpu(header->page_size);
+header->bitmap_offset = be64_to_cpu(header->bitmap_offset);
+header->pages_offset = be64_to_cpu(header->pages_offset);
+
+return true;
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -3899,22 +3934,126 @@ void colo_flush_ram_cache(void)
 trace_colo_flush_ram_cache_end();
 }
 
+static bool read_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
+ long num_pages, unsigned long *bitmap,
+ Error **errp)
+{
+ERRP_GUARD();
+unsigned long set_bit_idx, clear_bit_idx;
+ram_addr_t offset;
+void *host;
+size_t read, unread, size;
+
+for (set_bit_idx = find_first_bit(bitmap, num_pages);
+ set_bit_idx < num_pages;
+ set_bit_idx = find_next_bit(bitmap, num_pages, clear_bit_idx + 1)) {
+
+clear_bit_idx = find_next_zero_bit(bitmap, num_pages, set_bit_idx + 1);
+
+unread = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx);
+offset = set_bit_idx << TARGET_PAGE_BITS;
+
+while (unread > 0) {
+host = host_from_ram_block_offset(block, offset);
+if (!host) {
+error_setg(errp, "page outside of ramblock %s range",
+   block->idstr);
+return false;
+}
+
+size = MIN(unread, MAPPED_RAM_LOAD_BUF_SIZE);
+
+read = qemu_get_buffer_at(f, host, size,
+  block->pages_offset + offset);
+if (!read) {
+goto err;
+}
+offset += read;
+unread -= read;
+}
+}
+
+return true;
+
+err:
+qemu_file_get_error_obj(f, errp);
+error_prepend(errp, "(%s) failed to read page " RAM_ADDR_FMT
+  "from file offset %" PRIx64 ": ", block->idstr, offset,
+  block->pages_offset + offset);
+return false;
+}
+
+static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
+  ram_addr_t length, Error **errp)
+{
+g_autofree unsigned long *bitmap = NULL;
+MappedRamHeader header;
+size_t bitmap_size;
+long num_pages;
+
+if (!mapped_ram_read_header(f, , errp)) {
+return;
+}
+
+block->pages_offset = header.pages_offset;
+
+/*
+ * Check the alignment of the file region that contains pages. We
+ * don't enforce MAPPED_RAM_FILE_OFFSET_ALIGNMENT to allow that
+ * value to change in the future. Do only a sanity check with page
+ * size alignment.
+ */
+if (!QEMU_IS_ALIGNED(block->pages_offset, TARGET_PAGE_SIZE)) {
+error_setg(errp,
+   "Error reading ramblock %s pages, region has bad alignment",
+   block->idstr);
+

[PULL 21/27] migration/multifd: Add incoming QIOChannelFile support

2024-03-03 Thread peterx

From: Fabiano Rosas 

On the receiving side we don't need to differentiate between main
channel and threads, so whichever channel is defined first gets to be
the main one. And since there are no packets, use the atomic channel
count to index into the params array.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-19-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/file.c  | 35 +++
 migration/migration.c |  3 ++-
 migration/multifd.c   |  3 +--
 3 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index a350dd61f0..2f8b626b27 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -8,6 +8,7 @@
 #include "qemu/osdep.h"
 #include "exec/ramblock.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "channel.h"
 #include "file.h"
@@ -15,6 +16,7 @@
 #include "multifd.h"
 #include "io/channel-file.h"
 #include "io/channel-util.h"
+#include "options.h"
 #include "trace.h"
 
 #define OFFSET_OPTION ",offset="
@@ -112,7 +114,8 @@ void file_start_incoming_migration(FileMigrationArgs 
*file_args, Error **errp)
 g_autofree char *filename = g_strdup(file_args->filename);
 QIOChannelFile *fioc = NULL;
 uint64_t offset = file_args->offset;
-QIOChannel *ioc;
+int channels = 1;
+int i = 0;
 
 trace_migration_file_incoming(filename);
 
@@ -121,13 +124,29 @@ void file_start_incoming_migration(FileMigrationArgs 
*file_args, Error **errp)
 return;
 }
 
-ioc = QIO_CHANNEL(fioc);
-if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
+if (offset &&
+qio_channel_io_seek(QIO_CHANNEL(fioc), offset, SEEK_SET, errp) < 0) {
 return;
 }
-qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
-qio_channel_add_watch_full(ioc, G_IO_IN,
-   file_accept_incoming_migration,
-   NULL, NULL,
-   g_main_context_get_thread_default());
+
+if (migrate_multifd()) {
+channels += migrate_multifd_channels();
+}
+
+do {
+QIOChannel *ioc = QIO_CHANNEL(fioc);
+
+qio_channel_set_name(ioc, "migration-file-incoming");
+qio_channel_add_watch_full(ioc, G_IO_IN,
+   file_accept_incoming_migration,
+   NULL, NULL,
+   g_main_context_get_thread_default());
+
+fioc = qio_channel_file_new_fd(dup(fioc->fd));
+
+if (!fioc || fioc->fd == -1) {
+error_setg(errp, "Error creating migration incoming channel");
+break;
+}
+} while (++i < channels);
 }
diff --git a/migration/migration.c b/migration/migration.c
index 2669600d25..faeb75a59b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -910,7 +910,8 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error 
**errp)
 uint32_t channel_magic = 0;
 int ret = 0;
 
-if (migrate_multifd() && !migrate_postcopy_ram() &&
+if (migrate_multifd() && !migrate_mapped_ram() &&
+!migrate_postcopy_ram() &&
 qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
 /*
  * With multiple channels, it is possible that we receive channels
diff --git a/migration/multifd.c b/migration/multifd.c
index caef1076ca..ea08f1aa9e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1545,8 +1545,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error 
**errp)
 }
 trace_multifd_recv_new_channel(id);
 } else {
-/* next patch gives this a meaningful value */
-id = 0;
+id = qatomic_read(_recv_state->count);
 }
 
 p = _recv_state->params[id];
-- 
2.44.0

[PULL 27/27] migration/multifd: Document two places for mapped-ram

2024-03-03 Thread peterx

From: Peter Xu 

Add two documentations for mapped-ram migration on two spots that may not
be extremely clear.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240301091524.39900-1-pet...@redhat.com
Cc: Prasad Pandit 
[peterx: fix two English errors per Prasad]
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 12 
 migration/ram.c |  8 +++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index b4e5a9dfcc..d4a44da559 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -709,6 +709,18 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams 
*p, Error **errp)
 {
 if (p->c) {
 migration_ioc_unregister_yank(p->c);
+/*
+ * An explicit close() on the channel here is normally not
+ * required, but can be helpful for "file:" iochannels, where it
+ * will include fdatasync() to make sure the data is flushed to the
+ * disk backend.
+ *
+ * The object_unref() cannot guarantee that because: (1) finalize()
+ * of the iochannel is only triggered on the last reference, and
+ * it's not guaranteed that we always hold the last refcount when
+ * reaching here, and, (2) even if finalize() is invoked, it only
+ * does a close(fd) without data flush.
+ */
 qio_channel_close(p->c, _abort);
 object_unref(OBJECT(p->c));
 p->c = NULL;
diff --git a/migration/ram.c b/migration/ram.c
index 1f1b5297cf..c79e3de521 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4258,7 +4258,13 @@ static int ram_load_precopy(QEMUFile *f)
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
 case RAM_SAVE_FLAG_MEM_SIZE:
 ret = parse_ramblocks(f, addr);
-
+/*
+ * For mapped-ram migration (to a file) using multifd, we sync
+ * once and for all here to make sure all tasks we queued to
+ * multifd threads are completed, so that all the ramblocks
+ * (including all the guest memory pages within) are fully
+ * loaded after this sync returns.
+ */
 if (migrate_mapped_ram()) {
 multifd_recv_sync_main();
 }
-- 
2.44.0

[PULL 22/27] migration/multifd: Prepare multifd sync for mapped-ram migration

2024-03-03 Thread peterx

From: Fabiano Rosas 

The mapped-ram migration can be performed live or non-live, but it is
always asynchronous, i.e. the source machine and the destination
machine are not migrating at the same time. We only need some pieces
of the multifd sync operations.

multifd_send_sync_main()

  Issued by the ram migration code on the migration thread, causes the
  multifd send channels to synchronize with the migration thread and
  makes the sending side emit a packet with the MULTIFD_FLUSH flag.

  With mapped-ram we want to maintain the sync on the sending side
  because that provides ordering between the rounds of dirty pages when
  migrating live.

MULTIFD_FLUSH
-
  On the receiving side, the presence of the MULTIFD_FLUSH flag on a
  packet causes the receiving channels to start synchronizing with the
  main thread.

  We're not using packets with mapped-ram, so there's no MULTIFD_FLUSH
  flag and therefore no channel sync on the receiving side.

multifd_recv_sync_main()

  Issued by the migration thread when the ram migration flag
  RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread
  on the receiving side to start synchronizing with the recv
  channels. Due to compatibility, this is also issued when
  RAM_SAVE_FLAG_EOS is received.

  For mapped-ram we only need to synchronize the channels at the end of
  migration to avoid doing cleanup before the channels have finished
  their IO.

Make sure the multifd syncs are only issued at the appropriate times.

Note that due to pre-existing backward compatibility issues, we have
the multifd_flush_after_each_section property that can cause a sync to
happen at EOS. Since the EOS flag is needed on the stream, allow
mapped-ram to just ignore it.

Also emit an error if any other unexpected flags are found on the
stream.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240229153017.2221-20-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/ram.c | 38 +++---
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 18620784c6..329153d97d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1362,14 +1362,18 @@ static int find_dirty_block(RAMState *rs, 
PageSearchStatus *pss)
 pss->block = QLIST_NEXT_RCU(pss->block, next);
 if (!pss->block) {
 if (migrate_multifd() &&
-!migrate_multifd_flush_after_each_section()) {
+(!migrate_multifd_flush_after_each_section() ||
+ migrate_mapped_ram())) {
 QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
 int ret = multifd_send_sync_main();
 if (ret < 0) {
 return ret;
 }
-qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
-qemu_fflush(f);
+
+if (!migrate_mapped_ram()) {
+qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
+qemu_fflush(f);
+}
 }
 /*
  * If memory migration starts over, we will meet a dirtied page
@@ -3111,7 +3115,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 return ret;
 }
 
-if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
+if (migrate_multifd() && !migrate_multifd_flush_after_each_section()
+&& !migrate_mapped_ram()) {
 qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
 }
 
@@ -3242,7 +3247,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
 if (ret >= 0
 && migration_is_setup_or_active(migrate_get_current()->state)) {
-if (migrate_multifd() && migrate_multifd_flush_after_each_section()) {
+if (migrate_multifd() && migrate_multifd_flush_after_each_section() &&
+!migrate_mapped_ram()) {
 ret = multifd_send_sync_main();
 if (ret < 0) {
 return ret;
@@ -3334,7 +3340,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 }
 
-if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
+if (migrate_multifd() && !migrate_multifd_flush_after_each_section() &&
+!migrate_mapped_ram()) {
 qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
 }
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
@@ -4137,6 +4144,12 @@ static int ram_load_precopy(QEMUFile *f)
 invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
 }
 
+if (migrate_mapped_ram()) {
+invalid_flags |= (RAM_SAVE_FLAG_HOOK | RAM_SAVE_FLAG_MULTIFD_FLUSH |
+  RAM_SAVE_FLAG_PAGE | RAM_SAVE_FLAG_XBZRLE |
+  RAM_SAVE_FLAG_ZERO);
+}
+
 while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
 ram_addr_t addr;
 void *host = NULL, *host_bak = NULL;
@@ -4158,6 +4171,8 @@ static int

[PULL 23/27] migration/multifd: Support outgoing mapped-ram stream format

2024-03-03 Thread peterx

From: Fabiano Rosas 

The new mapped-ram stream format uses a file transport and puts ram
pages in the migration file at their respective offsets and can be
done in parallel by using the pwritev system call which takes iovecs
and an offset.

Add support to enabling the new format along with multifd to make use
of the threading and page handling already in place.

This requires multifd to stop sending headers and leaving the stream
format to the mapped-ram code. When it comes time to write the data, we
need to call a version of qio_channel_write that can take an offset.

Usage on HMP is:

(qemu) stop
(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability mapped-ram on
(qemu) migrate_set_parameter max-bandwidth 0
(qemu) migrate_set_parameter multifd-channels 8
(qemu) migrate file:migfile

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-21-faro...@suse.de
Signed-off-by: Peter Xu 
---
 include/qemu/bitops.h | 13 +++
 migration/file.h  |  2 ++
 migration/ram.h   |  1 +
 migration/file.c  | 54 +++
 migration/migration.c | 17 ++
 migration/multifd.c   | 24 +--
 migration/options.c   | 13 ++-
 migration/ram.c   | 17 +++---
 8 files changed, 125 insertions(+), 16 deletions(-)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index cb3526d1f4..2c0a2fe751 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -67,6 +67,19 @@ static inline void clear_bit(long nr, unsigned long *addr)
 *p &= ~mask;
 }
 
+/**
+ * clear_bit_atomic - Clears a bit in memory atomically
+ * @nr: Bit to clear
+ * @addr: Address to start counting from
+ */
+static inline void clear_bit_atomic(long nr, unsigned long *addr)
+{
+unsigned long mask = BIT_MASK(nr);
+unsigned long *p = addr + BIT_WORD(nr);
+
+return qatomic_and(p, ~mask);
+}
+
 /**
  * change_bit - Toggle a bit in memory
  * @nr: Bit to change
diff --git a/migration/file.h b/migration/file.h
index 4577f9efdd..01a338cac7 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -19,4 +19,6 @@ void file_start_outgoing_migration(MigrationState *s,
 int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
 void file_cleanup_outgoing_migration(void);
 bool file_send_channel_create(gpointer opaque, Error **errp);
+int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
+int niov, RAMBlock *block, Error **errp);
 #endif
diff --git a/migration/ram.h b/migration/ram.h
index 9b937a446b..b9ac0da587 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -75,6 +75,7 @@ bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, 
Error **errp);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 void postcopy_preempt_shutdown_file(MigrationState *s);
 void *postcopy_preempt_thread(void *opaque);
+void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset);
 
 /* ram cache */
 int colo_init_ram_cache(void);
diff --git a/migration/file.c b/migration/file.c
index 2f8b626b27..d949a941d0 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -150,3 +150,57 @@ void file_start_incoming_migration(FileMigrationArgs 
*file_args, Error **errp)
 }
 } while (++i < channels);
 }
+
+int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
+int niov, RAMBlock *block, Error **errp)
+{
+ssize_t ret = -1;
+int i, slice_idx, slice_num;
+uintptr_t base, next, offset;
+size_t len;
+
+slice_idx = 0;
+slice_num = 1;
+
+/*
+ * If the iov array doesn't have contiguous elements, we need to
+ * split it in slices because we only have one file offset for the
+ * whole iov. Do this here so callers don't need to break the iov
+ * array themselves.
+ */
+for (i = 0; i < niov; i++, slice_num++) {
+base = (uintptr_t) iov[i].iov_base;
+
+if (i != niov - 1) {
+len = iov[i].iov_len;
+next = (uintptr_t) iov[i + 1].iov_base;
+
+if (base + len == next) {
+continue;
+}
+}
+
+/*
+ * Use the offset of the first element of the segment that
+ * we're sending.
+ */
+offset = (uintptr_t) iov[slice_idx].iov_base - (uintptr_t) block->host;
+if (offset >= block->used_length) {
+error_setg(errp, "offset " RAM_ADDR_FMT
+   "outside of ramblock %s range", offset, block->idstr);
+ret = -1;
+break;
+}
+
+ret = qio_channel_pwritev(ioc, [slice_idx], slice_num,
+  block->pages_offset + offset, errp);
+if (ret < 0) {
+break;
+}
+
+slice_idx += slice_num;
+slice_num = 0;
+}
+
+return (ret < 0) ? ret : 0;
+}
diff --git a/migration/migration.c b/migration/migration.c

[PULL 25/27] migration/multifd: Add mapped-ram support to fd: URI

2024-03-03 Thread peterx

From: Fabiano Rosas 

If we receive a file descriptor that points to a regular file, there's
nothing stopping us from doing multifd migration with mapped-ram to
that file.

Enable the fd: URI to work with multifd + mapped-ram.

Note that the fds passed into multifd are duplicated because we want
to avoid cross-thread effects when doing cleanup (i.e. close(fd)). The
original fd doesn't need to be duplicated because monitor_get_fd()
transfers ownership to the caller.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240229153017.2221-23-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/fd.h|  2 ++
 migration/fd.c| 44 +++
 migration/file.c  | 18 --
 migration/migration.c |  4 
 migration/multifd.c   |  2 ++
 5 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/migration/fd.h b/migration/fd.h
index b901bc014e..0c0a18d9e7 100644
--- a/migration/fd.h
+++ b/migration/fd.h
@@ -20,4 +20,6 @@ void fd_start_incoming_migration(const char *fdname, Error 
**errp);
 
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
  Error **errp);
+void fd_cleanup_outgoing_migration(void);
+int fd_args_get_fd(void);
 #endif
diff --git a/migration/fd.c b/migration/fd.c
index 0eb677dcae..d4ae72d132 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -15,18 +15,41 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "channel.h"
 #include "fd.h"
 #include "migration.h"
 #include "monitor/monitor.h"
+#include "io/channel-file.h"
 #include "io/channel-util.h"
+#include "options.h"
 #include "trace.h"
 
 
+static struct FdOutgoingArgs {
+int fd;
+} outgoing_args;
+
+int fd_args_get_fd(void)
+{
+return outgoing_args.fd;
+}
+
+void fd_cleanup_outgoing_migration(void)
+{
+if (outgoing_args.fd > 0) {
+close(outgoing_args.fd);
+outgoing_args.fd = -1;
+}
+}
+
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
**errp)
 {
 QIOChannel *ioc;
 int fd = monitor_get_fd(monitor_cur(), fdname, errp);
+
+outgoing_args.fd = -1;
+
 if (fd == -1) {
 return;
 }
@@ -38,6 +61,8 @@ void fd_start_outgoing_migration(MigrationState *s, const 
char *fdname, Error **
 return;
 }
 
+outgoing_args.fd = fd;
+
 qio_channel_set_name(ioc, "migration-fd-outgoing");
 migration_channel_connect(s, ioc, NULL, NULL);
 object_unref(OBJECT(ioc));
@@ -73,4 +98,23 @@ void fd_start_incoming_migration(const char *fdname, Error 
**errp)
fd_accept_incoming_migration,
NULL, NULL,
g_main_context_get_thread_default());
+
+if (migrate_multifd()) {
+int channels = migrate_multifd_channels();
+
+while (channels--) {
+ioc = QIO_CHANNEL(qio_channel_file_new_fd(dup(fd)));
+
+if (QIO_CHANNEL_FILE(ioc)->fd == -1) {
+error_setg(errp, "Failed to duplicate fd %d", fd);
+return;
+}
+
+qio_channel_set_name(ioc, "migration-fd-incoming");
+qio_channel_add_watch_full(ioc, G_IO_IN,
+   fd_accept_incoming_migration,
+   NULL, NULL,
+   g_main_context_get_thread_default());
+}
+}
 }
diff --git a/migration/file.c b/migration/file.c
index 499d2782fe..164b079966 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -11,6 +11,7 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "channel.h"
+#include "fd.h"
 #include "file.h"
 #include "migration.h"
 #include "io/channel-file.h"
@@ -53,15 +54,20 @@ bool file_send_channel_create(gpointer opaque, Error **errp)
 {
 QIOChannelFile *ioc;
 int flags = O_WRONLY;
-bool ret = true;
-
-ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
-if (!ioc) {
-ret = false;
-goto out;
+bool ret = false;
+int fd = fd_args_get_fd();
+
+if (fd && fd != -1) {
+ioc = qio_channel_file_new_fd(dup(fd));
+} else {
+ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
+if (!ioc) {
+goto out;
+}
 }
 
 multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
+ret = true;
 
 out:
 /*
diff --git a/migration/migration.c b/migration/migration.c
index b9baab543a..a49fcd53ee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -140,6 +140,10 @@ static bool 
transport_supports_multi_channels(MigrationAddress *addr)
 if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
 SocketAddress *saddr = >u.socket;
 
+if (saddr->type == SOCKET_ADDRESS_TYPE_FD) {
+return migrate_mapped_ram();
+}
+
 return (saddr->type == SOCKET_ADDRESS_TYPE_INET ||

[PULL 10/27] migration/ram: Introduce 'mapped-ram' migration capability

2024-03-03 Thread peterx

From: Fabiano Rosas 

Add a new migration capability 'mapped-ram'.

The core of the feature is to ensure that RAM pages are mapped
directly to offsets in the resulting migration file instead of being
streamed at arbitrary points.

The reasons why we'd want such behavior are:

 - The resulting file will have a bounded size, since pages which are
   dirtied multiple times will always go to a fixed location in the
   file, rather than constantly being added to a sequential
   stream. This eliminates cases where a VM with, say, 1G of RAM can
   result in a migration file that's 10s of GBs, provided that the
   workload constantly redirties memory.

 - It paves the way to implement O_DIRECT-enabled save/restore of the
   migration stream as the pages are ensured to be written at aligned
   offsets.

 - It allows the usage of multifd so we can write RAM pages to the
   migration file in parallel.

For now, enabling the capability has no effect. The next couple of
patches implement the core functionality.

Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-8-faro...@suse.de
Signed-off-by: Peter Xu 
---
 docs/devel/migration/features.rst   |   1 +
 docs/devel/migration/mapped-ram.rst | 138 
 qapi/migration.json |   6 +-
 migration/options.h |   1 +
 migration/migration.c   |   7 ++
 migration/options.c |  34 +++
 migration/savevm.c  |   1 +
 7 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 docs/devel/migration/mapped-ram.rst

diff --git a/docs/devel/migration/features.rst 
b/docs/devel/migration/features.rst
index a9acaf618e..9d1abd2587 100644
--- a/docs/devel/migration/features.rst
+++ b/docs/devel/migration/features.rst
@@ -10,3 +10,4 @@ Migration has plenty of features to support different use 
cases.
dirty-limit
vfio
virtio
+   mapped-ram
diff --git a/docs/devel/migration/mapped-ram.rst 
b/docs/devel/migration/mapped-ram.rst
new file mode 100644
index 00..fa4cefd9fc
--- /dev/null
+++ b/docs/devel/migration/mapped-ram.rst
@@ -0,0 +1,138 @@
+Mapped-ram
+==
+
+Mapped-ram is a new stream format for the RAM section designed to
+supplement the existing ``file:`` migration and make it compatible
+with ``multifd``. This enables parallel migration of a guest's RAM to
+a file.
+
+The core of the feature is to ensure that RAM pages are mapped
+directly to offsets in the resulting migration file. This enables the
+``multifd`` threads to write exclusively to those offsets even if the
+guest is constantly dirtying pages (i.e. live migration). Another
+benefit is that the resulting file will have a bounded size, since
+pages which are dirtied multiple times will always go to a fixed
+location in the file, rather than constantly being added to a
+sequential stream. Having the pages at fixed offsets also allows the
+usage of O_DIRECT for save/restore of the migration stream as the
+pages are ensured to be written respecting O_DIRECT alignment
+restrictions (direct-io support not yet implemented).
+
+Usage
+-
+
+On both source and destination, enable the ``multifd`` and
+``mapped-ram`` capabilities:
+
+``migrate_set_capability multifd on``
+
+``migrate_set_capability mapped-ram on``
+
+Use a ``file:`` URL for migration:
+
+``migrate file:/path/to/migration/file``
+
+Mapped-ram migration is best done non-live, i.e. by stopping the VM on
+the source side before migrating.
+
+Use-cases
+-
+
+The mapped-ram feature was designed for use cases where the migration
+stream will be directed to a file in the filesystem and not
+immediately restored on the destination VM [#]_. These could be
+thought of as snapshots. We can further categorize them into live and
+non-live.
+
+- Non-live snapshot
+
+If the use case requires a VM to be stopped before taking a snapshot,
+that's the ideal scenario for mapped-ram migration. Not having to
+track dirty pages, the migration will write the RAM pages to the disk
+as fast as it can.
+
+Note: if a snapshot is taken of a running VM, but the VM will be
+stopped after the snapshot by the admin, then consider stopping it
+right before the snapshot to take benefit of the performance gains
+mentioned above.
+
+- Live snapshot
+
+If the use case requires that the VM keeps running during and after
+the snapshot operation, then mapped-ram migration can still be used,
+but will be less performant. Other strategies such as
+background-snapshot should be evaluated as well. One benefit of
+mapped-ram in this scenario is portability since background-snapshot
+depends on async dirty tracking (KVM_GET_DIRTY_LOG) which is not
+supported outside of Linux.
+
+.. [#] While this same effect could be obtained with the usage of
+   snapshots or the ``file:`` migration alone, mapped-ram provides
+   a performance increase for VMs with larger RAM sizes (10s

[PULL 17/27] migration/multifd: Allow multifd without packets

2024-03-03 Thread peterx

From: Fabiano Rosas 

For the upcoming support to the new 'mapped-ram' migration stream
format, we cannot use multifd packets because each write into the
ramblock section in the migration file is expected to contain only the
guest pages. They are written at their respective offsets relative to
the ramblock section header.

There is no space for the packet information and the expected gains
from the new approach come partly from being able to write the pages
sequentially without extraneous data in between.

The new format also simply doesn't need the packets and all necessary
information can be taken from the standard migration headers with some
(future) changes to multifd code.

Use the presence of the mapped-ram capability to decide whether to
send packets.

This only moves code under multifd_use_packets(), it has no effect for
now as mapped-ram cannot yet be enabled with multifd.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-15-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 175 +---
 1 file changed, 114 insertions(+), 61 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 3a8520097b..8c43424c81 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -92,6 +92,11 @@ struct {
 MultiFDMethods *ops;
 } *multifd_recv_state;
 
+static bool multifd_use_packets(void)
+{
+return !migrate_mapped_ram();
+}
+
 /* Multifd without compression */
 
 /**
@@ -122,6 +127,19 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, 
Error **errp)
 return;
 }
 
+static void multifd_send_prepare_iovs(MultiFDSendParams *p)
+{
+MultiFDPages_t *pages = p->pages;
+
+for (int i = 0; i < pages->num; i++) {
+p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
+p->iov[p->iovs_num].iov_len = p->page_size;
+p->iovs_num++;
+}
+
+p->next_packet_size = pages->num * p->page_size;
+}
+
 /**
  * nocomp_send_prepare: prepare date to be able to send
  *
@@ -136,9 +154,13 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, 
Error **errp)
 static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 bool use_zero_copy_send = migrate_zero_copy_send();
-MultiFDPages_t *pages = p->pages;
 int ret;
 
+if (!multifd_use_packets()) {
+multifd_send_prepare_iovs(p);
+return 0;
+}
+
 if (!use_zero_copy_send) {
 /*
  * Only !zerocopy needs the header in IOV; zerocopy will
@@ -147,13 +169,7 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error 
**errp)
 multifd_send_prepare_header(p);
 }
 
-for (int i = 0; i < pages->num; i++) {
-p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
-p->iov[p->iovs_num].iov_len = p->page_size;
-p->iovs_num++;
-}
-
-p->next_packet_size = pages->num * p->page_size;
+multifd_send_prepare_iovs(p);
 p->flags |= MULTIFD_FLAG_NOCOMP;
 
 multifd_send_fill_packet(p);
@@ -208,7 +224,13 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
  */
 static int nocomp_recv(MultiFDRecvParams *p, Error **errp)
 {
-uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
+uint32_t flags;
+
+if (!multifd_use_packets()) {
+return 0;
+}
+
+flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 
 if (flags != MULTIFD_FLAG_NOCOMP) {
 error_setg(errp, "multifd %u: flags received %x flags expected %x",
@@ -795,15 +817,18 @@ static void *multifd_send_thread(void *opaque)
 MigrationThread *thread = NULL;
 Error *local_err = NULL;
 int ret = 0;
+bool use_packets = multifd_use_packets();
 
 thread = migration_threads_add(p->name, qemu_get_thread_id());
 
 trace_multifd_send_thread_start(p->id);
 rcu_register_thread();
 
-if (multifd_send_initial_packet(p, _err) < 0) {
-ret = -1;
-goto out;
+if (use_packets) {
+if (multifd_send_initial_packet(p, _err) < 0) {
+ret = -1;
+goto out;
+}
 }
 
 while (true) {
@@ -854,16 +879,20 @@ static void *multifd_send_thread(void *opaque)
  * it doesn't require explicit memory barriers.
  */
 assert(qatomic_read(>pending_sync));
-p->flags = MULTIFD_FLAG_SYNC;
-multifd_send_fill_packet(p);
-ret = qio_channel_write_all(p->c, (void *)p->packet,
-p->packet_len, _err);
-if (ret != 0) {
-break;
+
+if (use_packets) {
+p->flags = MULTIFD_FLAG_SYNC;
+multifd_send_fill_packet(p);
+ret = qio_channel_write_all(p->c, (void *)p->packet,
+p->packet_len, _err);
+if (ret != 0) {
+break;
+}
+/* p->next_packet_size will always

[PULL 20/27] migration/multifd: Add outgoing QIOChannelFile support

2024-03-03 Thread peterx

From: Fabiano Rosas 

Allow multifd to open file-backed channels. This will be used when
enabling the mapped-ram migration stream format which expects a
seekable transport.

The QIOChannel read and write methods will use the preadv/pwritev
versions which don't update the file offset at each call so we can
reuse the fd without re-opening for every channel.

Contrary to the socket migration, the file migration doesn't need an
asynchronous channel creation process, so expose
multifd_channel_connect() and call it directly.

Note that this is just setup code and multifd cannot yet make use of
the file channels.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240229153017.2221-18-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/file.h|  4 
 migration/multifd.h |  1 +
 migration/file.c| 37 +
 migration/multifd.c | 18 +++---
 4 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/migration/file.h b/migration/file.h
index 37d6a08bfc..4577f9efdd 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -9,10 +9,14 @@
 #define QEMU_MIGRATION_FILE_H
 
 #include "qapi/qapi-types-migration.h"
+#include "io/task.h"
+#include "channel.h"
 
 void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp);
 
 void file_start_outgoing_migration(MigrationState *s,
FileMigrationArgs *file_args, Error **errp);
 int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
+void file_cleanup_outgoing_migration(void);
+bool file_send_channel_create(gpointer opaque, Error **errp);
 #endif
diff --git a/migration/multifd.h b/migration/multifd.h
index 1d8bbaf96b..db8887f088 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -227,5 +227,6 @@ static inline void 
multifd_send_prepare_header(MultiFDSendParams *p)
 p->iovs_num++;
 }
 
+void multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc);
 
 #endif
diff --git a/migration/file.c b/migration/file.c
index 22d052a71f..a350dd61f0 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -12,12 +12,17 @@
 #include "channel.h"
 #include "file.h"
 #include "migration.h"
+#include "multifd.h"
 #include "io/channel-file.h"
 #include "io/channel-util.h"
 #include "trace.h"
 
 #define OFFSET_OPTION ",offset="
 
+static struct FileOutgoingArgs {
+char *fname;
+} outgoing_args;
+
 /* Remove the offset option from @filespec and return it in @offsetp. */
 
 int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
@@ -37,6 +42,36 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, 
Error **errp)
 return 0;
 }
 
+void file_cleanup_outgoing_migration(void)
+{
+g_free(outgoing_args.fname);
+outgoing_args.fname = NULL;
+}
+
+bool file_send_channel_create(gpointer opaque, Error **errp)
+{
+QIOChannelFile *ioc;
+int flags = O_WRONLY;
+bool ret = true;
+
+ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
+if (!ioc) {
+ret = false;
+goto out;
+}
+
+multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
+
+out:
+/*
+ * File channel creation is synchronous. However posting this
+ * semaphore here is simpler than adding a special case.
+ */
+multifd_send_channel_created();
+
+return ret;
+}
+
 void file_start_outgoing_migration(MigrationState *s,
FileMigrationArgs *file_args, Error **errp)
 {
@@ -53,6 +88,8 @@ void file_start_outgoing_migration(MigrationState *s,
 return;
 }
 
+outgoing_args.fname = g_strdup(filename);
+
 ioc = QIO_CHANNEL(fioc);
 if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
 return;
diff --git a/migration/multifd.c b/migration/multifd.c
index 3574fd3953..caef1076ca 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -17,6 +17,7 @@
 #include "exec/ramblock.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "file.h"
 #include "ram.h"
 #include "migration.h"
 #include "migration-stats.h"
@@ -28,6 +29,7 @@
 #include "threadinfo.h"
 #include "options.h"
 #include "qemu/yank.h"
+#include "io/channel-file.h"
 #include "io/channel-socket.h"
 #include "yank_functions.h"
 
@@ -694,6 +696,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams 
*p, Error **errp)
 {
 if (p->c) {
 migration_ioc_unregister_yank(p->c);
+qio_channel_close(p->c, _abort);
 object_unref(OBJECT(p->c));
 p->c = NULL;
 }
@@ -715,6 +718,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams 
*p, Error **errp)
 
 static void multifd_send_cleanup_state(void)
 {
+file_cleanup_outgoing_migration();
 socket_cleanup_outgoing_migration();
 qemu_sem_destroy(_send_state->channels_created);
 qemu_sem_destroy(_send_state->channels_ready);
@@ -977,7 +981,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
 return true;

[PULL 26/27] tests/qtest/migration: Add a multifd + mapped-ram migration test

2024-03-03 Thread peterx

From: Fabiano Rosas 

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-24-faro...@suse.de
Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 68 
 1 file changed, 68 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 4c5551f7d0..4023d808f9 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2248,6 +2248,46 @@ static void test_precopy_file_mapped_ram(void)
 test_file_common(, true);
 }
 
+static void *migrate_multifd_mapped_ram_start(QTestState *from, QTestState *to)
+{
+migrate_mapped_ram_start(from, to);
+
+migrate_set_parameter_int(from, "multifd-channels", 4);
+migrate_set_parameter_int(to, "multifd-channels", 4);
+
+migrate_set_capability(from, "multifd", true);
+migrate_set_capability(to, "multifd", true);
+
+return NULL;
+}
+
+static void test_multifd_file_mapped_ram_live(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+   FILE_TEST_FILENAME);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = migrate_multifd_mapped_ram_start,
+};
+
+test_file_common(, false);
+}
+
+static void test_multifd_file_mapped_ram(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+   FILE_TEST_FILENAME);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = migrate_multifd_mapped_ram_start,
+};
+
+test_file_common(, true);
+}
+
+
 static void test_precopy_tcp_plain(void)
 {
 MigrateCommon args = {
@@ -2524,6 +2564,25 @@ static void test_migrate_precopy_fd_file_mapped_ram(void)
 };
 test_file_common(, true);
 }
+
+static void *migrate_multifd_fd_mapped_ram_start(QTestState *from,
+QTestState *to)
+{
+migrate_multifd_mapped_ram_start(from, to);
+return migrate_precopy_fd_file_start(from, to);
+}
+
+static void test_multifd_fd_mapped_ram(void)
+{
+MigrateCommon args = {
+.connect_uri = "fd:fd-mig",
+.listen_uri = "defer",
+.start_hook = migrate_multifd_fd_mapped_ram_start,
+.finish_hook = test_migrate_fd_finish_hook
+};
+
+test_file_common(, true);
+}
 #endif /* _WIN32 */
 
 static void do_test_validate_uuid(MigrateStart *args, bool should_fail)
@@ -3576,6 +3635,15 @@ int main(int argc, char **argv)
 migration_test_add("/migration/precopy/file/mapped-ram/live",
test_precopy_file_mapped_ram_live);
 
+migration_test_add("/migration/multifd/file/mapped-ram",
+   test_multifd_file_mapped_ram);
+migration_test_add("/migration/multifd/file/mapped-ram/live",
+   test_multifd_file_mapped_ram_live);
+#ifndef _WIN32
+migration_test_add("/migration/multifd/fd/mapped-ram",
+   test_multifd_fd_mapped_ram);
+#endif
+
 #ifdef CONFIG_GNUTLS
 migration_test_add("/migration/precopy/unix/tls/psk",
test_precopy_unix_tls_psk);
-- 
2.44.0

[PULL 19/27] migration/multifd: Add a wrapper for channels_created

2024-03-03 Thread peterx

From: Fabiano Rosas 

We'll need to access multifd_send_state->channels_created from outside
multifd.c, so introduce a helper for that.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-17-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.h | 1 +
 migration/multifd.c | 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 1be985978e..1d8bbaf96b 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -17,6 +17,7 @@ typedef struct MultiFDRecvData MultiFDRecvData;
 
 bool multifd_send_setup(void);
 void multifd_send_shutdown(void);
+void multifd_send_channel_created(void);
 int multifd_recv_setup(Error **errp);
 void multifd_recv_cleanup(void);
 void multifd_recv_shutdown(void);
diff --git a/migration/multifd.c b/migration/multifd.c
index d470af73ba..3574fd3953 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -101,6 +101,11 @@ static bool multifd_use_packets(void)
 return !migrate_mapped_ram();
 }
 
+void multifd_send_channel_created(void)
+{
+qemu_sem_post(_send_state->channels_created);
+}
+
 /* Multifd without compression */
 
 /**
@@ -1023,7 +1028,7 @@ out:
  * Here we're not interested whether creation succeeded, only that
  * it happened at all.
  */
-qemu_sem_post(_send_state->channels_created);
+multifd_send_channel_created();
 
 if (ret) {
 return;
-- 
2.44.0

[PULL 14/27] tests/qtest/migration: Add tests for mapped-ram file-based migration

2024-03-03 Thread peterx

From: Fabiano Rosas 

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-12-faro...@suse.de
Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 59 
 1 file changed, 59 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 8c35f3457b..4c5551f7d0 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2200,6 +2200,14 @@ static void *test_mode_reboot_start(QTestState *from, 
QTestState *to)
 return NULL;
 }
 
+static void *migrate_mapped_ram_start(QTestState *from, QTestState *to)
+{
+migrate_set_capability(from, "mapped-ram", true);
+migrate_set_capability(to, "mapped-ram", true);
+
+return NULL;
+}
+
 static void test_mode_reboot(void)
 {
 g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
@@ -2214,6 +,32 @@ static void test_mode_reboot(void)
 test_file_common(, true);
 }
 
+static void test_precopy_file_mapped_ram_live(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+   FILE_TEST_FILENAME);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = migrate_mapped_ram_start,
+};
+
+test_file_common(, false);
+}
+
+static void test_precopy_file_mapped_ram(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+   FILE_TEST_FILENAME);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = migrate_mapped_ram_start,
+};
+
+test_file_common(, true);
+}
+
 static void test_precopy_tcp_plain(void)
 {
 MigrateCommon args = {
@@ -2462,6 +2496,13 @@ static void *migrate_precopy_fd_file_start(QTestState 
*from, QTestState *to)
 return NULL;
 }
 
+static void *migrate_fd_file_mapped_ram_start(QTestState *from, QTestState *to)
+{
+migrate_mapped_ram_start(from, to);
+
+return migrate_precopy_fd_file_start(from, to);
+}
+
 static void test_migrate_precopy_fd_file(void)
 {
 MigrateCommon args = {
@@ -2472,6 +2513,17 @@ static void test_migrate_precopy_fd_file(void)
 };
 test_file_common(, true);
 }
+
+static void test_migrate_precopy_fd_file_mapped_ram(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.connect_uri = "fd:fd-mig",
+.start_hook = migrate_fd_file_mapped_ram_start,
+.finish_hook = test_migrate_fd_finish_hook
+};
+test_file_common(, true);
+}
 #endif /* _WIN32 */
 
 static void do_test_validate_uuid(MigrateStart *args, bool should_fail)
@@ -3519,6 +3571,11 @@ int main(int argc, char **argv)
 migration_test_add("/migration/mode/reboot", test_mode_reboot);
 }
 
+migration_test_add("/migration/precopy/file/mapped-ram",
+   test_precopy_file_mapped_ram);
+migration_test_add("/migration/precopy/file/mapped-ram/live",
+   test_precopy_file_mapped_ram_live);
+
 #ifdef CONFIG_GNUTLS
 migration_test_add("/migration/precopy/unix/tls/psk",
test_precopy_unix_tls_psk);
@@ -3580,6 +3637,8 @@ int main(int argc, char **argv)
test_migrate_precopy_fd_socket);
 migration_test_add("/migration/precopy/fd/file",
test_migrate_precopy_fd_file);
+migration_test_add("/migration/precopy/fd/file/mapped-ram",
+   test_migrate_precopy_fd_file_mapped_ram);
 #endif
 migration_test_add("/migration/validate_uuid", test_validate_uuid);
 migration_test_add("/migration/validate_uuid_error",
-- 
2.44.0

[PULL 04/27] migration/multifd: Cleanup multifd_recv_sync_main

2024-03-03 Thread peterx

From: Fabiano Rosas 

Some minor cleanups and documentation for multifd_recv_sync_main.

Use thread_count as done in other parts of the code. Remove p->id from
the multifd_recv_state sync, since that is global and not tied to a
channel. Add documentation for the sync steps.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-2-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.c| 17 +
 migration/trace-events |  2 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 6c07f19af1..c7389bf833 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1182,18 +1182,27 @@ void multifd_recv_cleanup(void)
 
 void multifd_recv_sync_main(void)
 {
+int thread_count = migrate_multifd_channels();
 int i;
 
 if (!migrate_multifd()) {
 return;
 }
-for (i = 0; i < migrate_multifd_channels(); i++) {
-MultiFDRecvParams *p = _recv_state->params[i];
 
-trace_multifd_recv_sync_main_wait(p->id);
+/*
+ * Initiate the synchronization by waiting for all channels.
+ * For socket-based migration this means each channel has received
+ * the SYNC packet on the stream.
+ */
+for (i = 0; i < thread_count; i++) {
+trace_multifd_recv_sync_main_wait(i);
 qemu_sem_wait(_recv_state->sem_sync);
 }
-for (i = 0; i < migrate_multifd_channels(); i++) {
+
+/*
+ * Sync done. Release the channels for the next iteration.
+ */
+for (i = 0; i < thread_count; i++) {
 MultiFDRecvParams *p = _recv_state->params[i];
 
 WITH_QEMU_LOCK_GUARD(>mutex) {
diff --git a/migration/trace-events b/migration/trace-events
index 298ad2b0dd..bf1a069632 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -132,7 +132,7 @@ multifd_recv(uint8_t id, uint64_t packet_num, uint32_t 
used, uint32_t flags, uin
 multifd_recv_new_channel(uint8_t id) "channel %u"
 multifd_recv_sync_main(long packet_num) "packet num %ld"
 multifd_recv_sync_main_signal(uint8_t id) "channel %u"
-multifd_recv_sync_main_wait(uint8_t id) "channel %u"
+multifd_recv_sync_main_wait(uint8_t id) "iter %u"
 multifd_recv_terminate_threads(bool error) "error %d"
 multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel 
%u packets %" PRIu64 " pages %" PRIu64
 multifd_recv_thread_start(uint8_t id) "%u"
-- 
2.44.0

[PULL 15/27] migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data

2024-03-03 Thread peterx

From: Fabiano Rosas 

Use a more specific name for the compression data so we can use the
generic for the multifd core code.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-13-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.h  |  4 ++--
 migration/multifd-zlib.c | 20 ++--
 migration/multifd-zstd.c | 20 ++--
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index b3fe27ae93..adccd3532f 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -127,7 +127,7 @@ typedef struct {
 /* number of iovs used */
 uint32_t iovs_num;
 /* used for compression methods */
-void *data;
+void *compress_data;
 }  MultiFDSendParams;
 
 typedef struct {
@@ -183,7 +183,7 @@ typedef struct {
 /* num of non zero pages */
 uint32_t normal_num;
 /* used for de-compression methods */
-void *data;
+void *compress_data;
 } MultiFDRecvParams;
 
 typedef struct {
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 012e3bdea1..2a8f5fc9a6 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -69,7 +69,7 @@ static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
 err_msg = "out of memory for buf";
 goto err_free_zbuff;
 }
-p->data = z;
+p->compress_data = z;
 return 0;
 
 err_free_zbuff:
@@ -92,15 +92,15 @@ err_free_z:
  */
 static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
-struct zlib_data *z = p->data;
+struct zlib_data *z = p->compress_data;
 
 deflateEnd(>zs);
 g_free(z->zbuff);
 z->zbuff = NULL;
 g_free(z->buf);
 z->buf = NULL;
-g_free(p->data);
-p->data = NULL;
+g_free(p->compress_data);
+p->compress_data = NULL;
 }
 
 /**
@@ -117,7 +117,7 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
 static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 MultiFDPages_t *pages = p->pages;
-struct zlib_data *z = p->data;
+struct zlib_data *z = p->compress_data;
 z_stream *zs = >zs;
 uint32_t out_size = 0;
 int ret;
@@ -194,7 +194,7 @@ static int zlib_recv_setup(MultiFDRecvParams *p, Error 
**errp)
 struct zlib_data *z = g_new0(struct zlib_data, 1);
 z_stream *zs = >zs;
 
-p->data = z;
+p->compress_data = z;
 zs->zalloc = Z_NULL;
 zs->zfree = Z_NULL;
 zs->opaque = Z_NULL;
@@ -224,13 +224,13 @@ static int zlib_recv_setup(MultiFDRecvParams *p, Error 
**errp)
  */
 static void zlib_recv_cleanup(MultiFDRecvParams *p)
 {
-struct zlib_data *z = p->data;
+struct zlib_data *z = p->compress_data;
 
 inflateEnd(>zs);
 g_free(z->zbuff);
 z->zbuff = NULL;
-g_free(p->data);
-p->data = NULL;
+g_free(p->compress_data);
+p->compress_data = NULL;
 }
 
 /**
@@ -246,7 +246,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
  */
 static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
-struct zlib_data *z = p->data;
+struct zlib_data *z = p->compress_data;
 z_stream *zs = >zs;
 uint32_t in_size = p->next_packet_size;
 /* we measure the change of total_out */
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index dc8fe43e94..593cf290ad 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -52,7 +52,7 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
 struct zstd_data *z = g_new0(struct zstd_data, 1);
 int res;
 
-p->data = z;
+p->compress_data = z;
 z->zcs = ZSTD_createCStream();
 if (!z->zcs) {
 g_free(z);
@@ -90,14 +90,14 @@ static int zstd_send_setup(MultiFDSendParams *p, Error 
**errp)
  */
 static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
-struct zstd_data *z = p->data;
+struct zstd_data *z = p->compress_data;
 
 ZSTD_freeCStream(z->zcs);
 z->zcs = NULL;
 g_free(z->zbuff);
 z->zbuff = NULL;
-g_free(p->data);
-p->data = NULL;
+g_free(p->compress_data);
+p->compress_data = NULL;
 }
 
 /**
@@ -114,7 +114,7 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error 
**errp)
 static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 MultiFDPages_t *pages = p->pages;
-struct zstd_data *z = p->data;
+struct zstd_data *z = p->compress_data;
 int ret;
 uint32_t i;
 
@@ -183,7 +183,7 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error 
**errp)
 struct zstd_data *z = g_new0(struct zstd_data, 1);
 int ret;
 
-p->data = z;
+p->compress_data = z;
 z->zds = ZSTD_createDStream();
 if (!z->zds) {
 g_free(z);
@@ -221,14 +221,14 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error 
**errp)
  */
 static void zstd_recv_cleanup(MultiFDRecvParams *p)
 {
-struct zstd_data *z = p->data;
+struct zstd_data *z = p->compress_data;
 
 ZSTD_freeDStream(z->zds);

[PULL 16/27] migration/multifd: Decouple recv method from pages

2024-03-03 Thread peterx

From: Fabiano Rosas 

Next patches will abstract the type of data being received by the
channels, so do some cleanup now to remove references to pages and
dependency on 'normal_num'.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-14-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.h  |  4 ++--
 migration/multifd-zlib.c |  6 +++---
 migration/multifd-zstd.c |  6 +++---
 migration/multifd.c  | 13 -
 4 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index adccd3532f..6a54377cc1 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -197,8 +197,8 @@ typedef struct {
 int (*recv_setup)(MultiFDRecvParams *p, Error **errp);
 /* Cleanup for receiving side */
 void (*recv_cleanup)(MultiFDRecvParams *p);
-/* Read all pages */
-int (*recv_pages)(MultiFDRecvParams *p, Error **errp);
+/* Read all data */
+int (*recv)(MultiFDRecvParams *p, Error **errp);
 } MultiFDMethods;
 
 void multifd_register_ops(int method, MultiFDMethods *ops);
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 2a8f5fc9a6..6120faad65 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -234,7 +234,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
 }
 
 /**
- * zlib_recv_pages: read the data from the channel into actual pages
+ * zlib_recv: read the data from the channel into actual pages
  *
  * Read the compressed buffer, and uncompress it into the actual
  * pages.
@@ -244,7 +244,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
  * @p: Params for the channel that we are using
  * @errp: pointer to an error
  */
-static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
+static int zlib_recv(MultiFDRecvParams *p, Error **errp)
 {
 struct zlib_data *z = p->compress_data;
 z_stream *zs = >zs;
@@ -319,7 +319,7 @@ static MultiFDMethods multifd_zlib_ops = {
 .send_prepare = zlib_send_prepare,
 .recv_setup = zlib_recv_setup,
 .recv_cleanup = zlib_recv_cleanup,
-.recv_pages = zlib_recv_pages
+.recv = zlib_recv
 };
 
 static void multifd_zlib_register(void)
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 593cf290ad..cac236833d 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -232,7 +232,7 @@ static void zstd_recv_cleanup(MultiFDRecvParams *p)
 }
 
 /**
- * zstd_recv_pages: read the data from the channel into actual pages
+ * zstd_recv: read the data from the channel into actual pages
  *
  * Read the compressed buffer, and uncompress it into the actual
  * pages.
@@ -242,7 +242,7 @@ static void zstd_recv_cleanup(MultiFDRecvParams *p)
  * @p: Params for the channel that we are using
  * @errp: pointer to an error
  */
-static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
+static int zstd_recv(MultiFDRecvParams *p, Error **errp)
 {
 uint32_t in_size = p->next_packet_size;
 uint32_t out_size = 0;
@@ -310,7 +310,7 @@ static MultiFDMethods multifd_zstd_ops = {
 .send_prepare = zstd_send_prepare,
 .recv_setup = zstd_recv_setup,
 .recv_cleanup = zstd_recv_cleanup,
-.recv_pages = zstd_recv_pages
+.recv = zstd_recv
 };
 
 static void multifd_zstd_register(void)
diff --git a/migration/multifd.c b/migration/multifd.c
index c7389bf833..3a8520097b 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -197,7 +197,7 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
 }
 
 /**
- * nocomp_recv_pages: read the data from the channel into actual pages
+ * nocomp_recv: read the data from the channel
  *
  * For no compression we just need to read things into the correct place.
  *
@@ -206,7 +206,7 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
  * @p: Params for the channel that we are using
  * @errp: pointer to an error
  */
-static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
+static int nocomp_recv(MultiFDRecvParams *p, Error **errp)
 {
 uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 
@@ -228,7 +228,7 @@ static MultiFDMethods multifd_nocomp_ops = {
 .send_prepare = nocomp_send_prepare,
 .recv_setup = nocomp_recv_setup,
 .recv_cleanup = nocomp_recv_cleanup,
-.recv_pages = nocomp_recv_pages
+.recv = nocomp_recv
 };
 
 static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = {
@@ -1227,6 +1227,8 @@ static void *multifd_recv_thread(void *opaque)
 
 while (true) {
 uint32_t flags;
+bool has_data = false;
+p->normal_num = 0;
 
 if (multifd_recv_should_exit()) {
 break;
@@ -1248,10 +1250,11 @@ static void *multifd_recv_thread(void *opaque)
 flags = p->flags;
 /* recv methods don't know how to handle the SYNC flag */
 p->flags &= ~MULTIFD_FLAG_SYNC;
+has_data = !!p->normal_num;
 qemu_mutex_unlock(>mutex);
 
-if (p->normal_num) {
-ret =

[PULL 00/27] Migration next patches

2024-03-03 Thread peterx

From: Peter Xu 

The following changes since commit c0c6a0e3528b88aaad0b9d333e295707a195587b:

  Merge tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu 
into staging (2024-02-28 17:27:10 +)

are available in the Git repository at:

  https://gitlab.com/peterx/qemu.git tags/migration-next-pull-request

for you to fetch changes up to 1a6e217c35b6dbab10fdc1e02640b8d60b2dc663:

  migration/multifd: Document two places for mapped-ram (2024-03-04 08:31:11 
+0800)


Migartion pull request for 20240304

- Bryan's fix on multifd compression level API
- Fabiano's mapped-ram series (base + multifd only)
- Steve's amend on cpr document in qapi/



Bryan Zhang (2):
  migration: Properly apply migration compression level parameters
  tests/migration: Set compression level in migration tests

Fabiano Rosas (20):
  migration/multifd: Cleanup multifd_recv_sync_main
  io: fsync before closing a file channel
  migration/qemu-file: add utility methods for working with seekable
channels
  migration/ram: Introduce 'mapped-ram' migration capability
  migration: Add mapped-ram URI compatibility check
  migration/ram: Add outgoing 'mapped-ram' migration
  migration/ram: Add incoming 'mapped-ram' migration
  tests/qtest/migration: Add tests for mapped-ram file-based migration
  migration/multifd: Rename MultiFDSend|RecvParams::data to
compress_data
  migration/multifd: Decouple recv method from pages
  migration/multifd: Allow multifd without packets
  migration/multifd: Allow receiving pages without packets
  migration/multifd: Add a wrapper for channels_created
  migration/multifd: Add outgoing QIOChannelFile support
  migration/multifd: Add incoming QIOChannelFile support
  migration/multifd: Prepare multifd sync for mapped-ram migration
  migration/multifd: Support outgoing mapped-ram stream format
  migration/multifd: Support incoming mapped-ram stream format
  migration/multifd: Add mapped-ram support to fd: URI
  tests/qtest/migration: Add a multifd + mapped-ram migration test

Nikolay Borisov (3):
  io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
  io: Add generic pwritev/preadv interface
  io: implement io_pwritev/preadv for QIOChannelFile

Peter Xu (1):
  migration/multifd: Document two places for mapped-ram

Steve Sistare (1):
  migration: massage cpr-reboot documentation

 docs/devel/migration/features.rst   |   1 +
 docs/devel/migration/mapped-ram.rst | 138 +
 qapi/migration.json |  42 +--
 include/exec/ramblock.h |  13 +
 include/io/channel.h|  83 ++
 include/migration/qemu-file-types.h |   2 +
 include/qemu/bitops.h   |  13 +
 migration/fd.h  |   2 +
 migration/file.h|   8 +
 migration/multifd.h |  27 +-
 migration/options.h |   1 +
 migration/qemu-file.h   |   6 +
 migration/ram.h |   1 +
 io/channel-file.c   |  69 +
 io/channel.c|  58 
 migration/fd.c  |  44 +++
 migration/file.c| 149 +-
 migration/migration.c   |  56 +++-
 migration/multifd-zlib.c|  26 +-
 migration/multifd-zstd.c|  26 +-
 migration/multifd.c | 417 ++--
 migration/options.c |  47 
 migration/qemu-file.c   | 106 +++
 migration/ram.c | 351 +--
 migration/savevm.c  |   1 +
 tests/qtest/migration-test.c| 137 +
 migration/trace-events  |   2 +-
 27 files changed, 1666 insertions(+), 160 deletions(-)
 create mode 100644 docs/devel/migration/mapped-ram.rst

-- 
2.44.0

[PULL 08/27] io: fsync before closing a file channel

2024-03-03 Thread peterx

From: Fabiano Rosas 

Make sure the data is flushed to disk before closing file
channels. This is to ensure data is on disk and not lost in the event
of a host crash.

This is currently being implemented to affect the migration code when
migrating to a file, but all QIOChannelFile users should benefit from
the change.

Reviewed-by: "Daniel P. Berrangé" 
Acked-by: "Daniel P. Berrangé" 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-6-faro...@suse.de
Signed-off-by: Peter Xu 
---
 io/channel-file.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/io/channel-file.c b/io/channel-file.c
index a6ad7770c6..d4706fa592 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -242,6 +242,11 @@ static int qio_channel_file_close(QIOChannel *ioc,
 {
 QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
 
+if (qemu_fdatasync(fioc->fd) < 0) {
+error_setg_errno(errp, errno,
+ "Unable to synchronize file data with storage 
device");
+return -1;
+}
 if (qemu_close(fioc->fd) < 0) {
 error_setg_errno(errp, errno,
  "Unable to close file");
-- 
2.44.0

[PULL 02/27] migration: Properly apply migration compression level parameters

2024-03-03 Thread peterx

From: Bryan Zhang 

Some glue code was missing, so that using `qmp_migrate_set_parameters`
to set `multifd-zstd-level` or `multifd-zlib-level` did not work. This
commit adds the glue code to fix that.

Signed-off-by: Bryan Zhang 
Link: 
https://lore.kernel.org/r/20240301035901.4006936-2-bryan.zh...@bytedance.com
Signed-off-by: Peter Xu 
---
 migration/options.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index 3e3e0b93b4..1cd3cc7c33 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -1312,6 +1312,12 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 if (params->has_multifd_compression) {
 dest->multifd_compression = params->multifd_compression;
 }
+if (params->has_multifd_zlib_level) {
+dest->multifd_zlib_level = params->multifd_zlib_level;
+}
+if (params->has_multifd_zstd_level) {
+dest->multifd_zstd_level = params->multifd_zstd_level;
+}
 if (params->has_xbzrle_cache_size) {
 dest->xbzrle_cache_size = params->xbzrle_cache_size;
 }
@@ -1447,6 +1453,12 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 if (params->has_multifd_compression) {
 s->parameters.multifd_compression = params->multifd_compression;
 }
+if (params->has_multifd_zlib_level) {
+s->parameters.multifd_zlib_level = params->multifd_zlib_level;
+}
+if (params->has_multifd_zstd_level) {
+s->parameters.multifd_zstd_level = params->multifd_zstd_level;
+}
 if (params->has_xbzrle_cache_size) {
 s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
 xbzrle_cache_resize(params->xbzrle_cache_size, errp);
-- 
2.44.0

[PULL 06/27] io: Add generic pwritev/preadv interface

2024-03-03 Thread peterx

From: Nikolay Borisov 

Introduce basic pwritev/preadv support in the generic channel layer.
Specific implementation will follow for the file channel as this is
required in order to support migration streams with fixed location of
each ram page.

Signed-off-by: Nikolay Borisov 
Reviewed-by: "Daniel P. Berrangé" 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-4-faro...@suse.de
Signed-off-by: Peter Xu 
---
 include/io/channel.h | 82 
 io/channel.c | 58 +++
 2 files changed, 140 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index fcb19fd672..7986c49c71 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -131,6 +131,16 @@ struct QIOChannelClass {
Error **errp);
 
 /* Optional callbacks */
+ssize_t (*io_pwritev)(QIOChannel *ioc,
+  const struct iovec *iov,
+  size_t niov,
+  off_t offset,
+  Error **errp);
+ssize_t (*io_preadv)(QIOChannel *ioc,
+ const struct iovec *iov,
+ size_t niov,
+ off_t offset,
+ Error **errp);
 int (*io_shutdown)(QIOChannel *ioc,
QIOChannelShutdown how,
Error **errp);
@@ -529,6 +539,78 @@ void qio_channel_set_follow_coroutine_ctx(QIOChannel *ioc, 
bool enabled);
 int qio_channel_close(QIOChannel *ioc,
   Error **errp);
 
+/**
+ * qio_channel_pwritev
+ * @ioc: the channel object
+ * @iov: the array of memory regions to write data from
+ * @niov: the length of the @iov array
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error. To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ * Behaves as qio_channel_writev_full, apart from not supporting
+ * sending of file handles as well as beginning the write at the
+ * passed @offset
+ *
+ */
+ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
+size_t niov, off_t offset, Error **errp);
+
+/**
+ * qio_channel_pwrite
+ * @ioc: the channel object
+ * @buf: the memory region to write data into
+ * @buflen: the number of bytes to @buf
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error. To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ */
+ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, size_t buflen,
+   off_t offset, Error **errp);
+
+/**
+ * qio_channel_preadv
+ * @ioc: the channel object
+ * @iov: the array of memory regions to read data into
+ * @niov: the length of the @iov array
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error.  To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ * Behaves as qio_channel_readv_full, apart from not supporting
+ * receiving of file handles as well as beginning the read at the
+ * passed @offset
+ *
+ */
+ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
+   size_t niov, off_t offset, Error **errp);
+
+/**
+ * qio_channel_pread
+ * @ioc: the channel object
+ * @buf: the memory region to write data into
+ * @buflen: the number of bytes to @buf
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error.  To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ */
+ssize_t qio_channel_pread(QIOChannel *ioc, char *buf, size_t buflen,
+  off_t offset, Error **errp);
+
 /**
  * qio_channel_shutdown:
  * @ioc: the channel object
diff --git a/io/channel.c b/io/channel.c
index 86c5834510..a1f12f8e90 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -454,6 +454,64 @@ GSource *qio_channel_add_watch_source(QIOChannel *ioc,
 }
 
 
+ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
+size_t niov, off_t offset, Error **errp)
+{
+QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+if (!klass->io_pwritev) {
+error_setg(errp, "Channel does not support pwritev");
+

[PULL 11/27] migration: Add mapped-ram URI compatibility check

2024-03-03 Thread peterx

From: Fabiano Rosas 

The mapped-ram migration format needs a channel that supports seeking
to be able to write each page to an arbitrary offset in the migration
stream.

Reviewed-by: "Daniel P. Berrangé" 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-9-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/migration.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 69f68f940d..2669600d25 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -148,10 +148,39 @@ static bool 
transport_supports_multi_channels(MigrationAddress *addr)
 return false;
 }
 
+static bool migration_needs_seekable_channel(void)
+{
+return migrate_mapped_ram();
+}
+
+static bool transport_supports_seeking(MigrationAddress *addr)
+{
+if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
+return true;
+}
+
+/*
+ * At this point, the user might not yet have passed the file
+ * descriptor to QEMU, so we cannot know for sure whether it
+ * refers to a plain file or a socket. Let it through anyway.
+ */
+if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
+return addr->u.socket.type == SOCKET_ADDRESS_TYPE_FD;
+}
+
+return false;
+}
+
 static bool
 migration_channels_and_transport_compatible(MigrationAddress *addr,
 Error **errp)
 {
+if (migration_needs_seekable_channel() &&
+!transport_supports_seeking(addr)) {
+error_setg(errp, "Migration requires seekable transport (e.g. file)");
+return false;
+}
+
 if (migration_needs_multiple_sockets() &&
 !transport_supports_multi_channels(addr)) {
 error_setg(errp, "Migration requires multi-channel URIs (e.g. tcp)");
-- 
2.44.0

[PULL 01/27] migration: massage cpr-reboot documentation

2024-03-03 Thread peterx

From: Steve Sistare 

Re-wrap the cpr-reboot documentation to 70 columns, use '@' for
cpr-reboot references, capitalize COLO and VFIO, and tweak the
wording.

Suggested-by: Markus Armbruster 
Signed-off-by: Steve Sistare 
Link: 
https://lore.kernel.org/r/1709218462-3640-1-git-send-email-steven.sist...@oracle.com
[peterx: s/qemu/QEMU per Markus's suggestion]
Reviewed-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 qapi/migration.json | 46 +++--
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 0b33a71ab4..b603aa6f25 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -636,28 +636,30 @@
 #
 # @normal: the original form of migration. (since 8.2)
 #
-# @cpr-reboot: The migrate command stops the VM and saves state to the URI.
-# After quitting qemu, the user resumes by running qemu -incoming.
-#
-# This mode allows the user to quit qemu, and restart an updated version
-# of qemu.  The user may even update and reboot the OS before restarting,
-# as long as the URI persists across a reboot.
-#
-# Unlike normal mode, the use of certain local storage options does not
-# block the migration, but the user must not modify guest block devices
-# between the quit and restart.
-#
-# This mode supports vfio devices provided the user first puts the guest
-# in the suspended runstate, such as by issuing guest-suspend-ram to the
-# qemu guest agent.
-#
-# Best performance is achieved when the memory backend is shared and the
-# @x-ignore-shared migration capability is set, but this is not required.
-# Further, if the user reboots before restarting such a configuration, the
-# shared backend must be be non-volatile across reboot, such as by backing
-# it with a dax device.
-#
-# cpr-reboot may not be used with postcopy, colo, or background-snapshot.
+# @cpr-reboot: The migrate command stops the VM and saves state to
+# the URI.  After quitting QEMU, the user resumes by running
+# QEMU -incoming.
+#
+# This mode allows the user to quit QEMU, optionally update and
+# reboot the OS, and restart QEMU.  If the user reboots, the URI
+# must persist across the reboot, such as by using a file.
+#
+# Unlike normal mode, the use of certain local storage options
+# does not block the migration, but the user must not modify the
+# contents of guest block devices between the quit and restart.
+#
+# This mode supports VFIO devices provided the user first puts
+# the guest in the suspended runstate, such as by issuing
+# guest-suspend-ram to the QEMU guest agent.
+#
+# Best performance is achieved when the memory backend is shared
+# and the @x-ignore-shared migration capability is set, but this
+# is not required.  Further, if the user reboots before restarting
+# such a configuration, the shared memory must persist across the
+# reboot, such as by backing it with a dax device.
+#
+# @cpr-reboot may not be used with postcopy, background-snapshot,
+# or COLO.
 #
 # (since 8.2)
 ##
-- 
2.44.0

[PULL 05/27] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file

2024-03-03 Thread peterx

From: Nikolay Borisov 

Add a generic QIOChannel feature SEEKABLE which would be used by the
qemu_file* apis. For the time being this will be only implemented for
file channels.

Signed-off-by: Nikolay Borisov 
Reviewed-by: "Daniel P. Berrangé" 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240229153017.2221-3-faro...@suse.de
Signed-off-by: Peter Xu 
---
 include/io/channel.h | 1 +
 io/channel-file.c| 8 
 2 files changed, 9 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index 5f9dbaab65..fcb19fd672 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -44,6 +44,7 @@ enum QIOChannelFeature {
 QIO_CHANNEL_FEATURE_LISTEN,
 QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
 QIO_CHANNEL_FEATURE_READ_MSG_PEEK,
+QIO_CHANNEL_FEATURE_SEEKABLE,
 };
 
 
diff --git a/io/channel-file.c b/io/channel-file.c
index 4a12c61886..f91bf6db1c 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -36,6 +36,10 @@ qio_channel_file_new_fd(int fd)
 
 ioc->fd = fd;
 
+if (lseek(fd, 0, SEEK_CUR) != (off_t)-1) {
+qio_channel_set_feature(QIO_CHANNEL(ioc), 
QIO_CHANNEL_FEATURE_SEEKABLE);
+}
+
 trace_qio_channel_file_new_fd(ioc, fd);
 
 return ioc;
@@ -60,6 +64,10 @@ qio_channel_file_new_path(const char *path,
 return NULL;
 }
 
+if (lseek(ioc->fd, 0, SEEK_CUR) != (off_t)-1) {
+qio_channel_set_feature(QIO_CHANNEL(ioc), 
QIO_CHANNEL_FEATURE_SEEKABLE);
+}
+
 trace_qio_channel_file_new_path(ioc, path, flags, mode, ioc->fd);
 
 return ioc;
-- 
2.44.0

[PULL 09/27] migration/qemu-file: add utility methods for working with seekable channels

2024-03-03 Thread peterx

From: Fabiano Rosas 

Add utility methods that will be needed when implementing 'mapped-ram'
migration capability.

Signed-off-by: Fabiano Rosas 
Reviewed-by: "Daniel P. Berrangé" 
Link: https://lore.kernel.org/r/20240229153017.2221-7-faro...@suse.de
Signed-off-by: Peter Xu 
---
 include/migration/qemu-file-types.h |   2 +
 migration/qemu-file.h   |   6 ++
 migration/qemu-file.c   | 106 
 3 files changed, 114 insertions(+)

diff --git a/include/migration/qemu-file-types.h 
b/include/migration/qemu-file-types.h
index 9ba163f333..adec5abc07 100644
--- a/include/migration/qemu-file-types.h
+++ b/include/migration/qemu-file-types.h
@@ -50,6 +50,8 @@ unsigned int qemu_get_be16(QEMUFile *f);
 unsigned int qemu_get_be32(QEMUFile *f);
 uint64_t qemu_get_be64(QEMUFile *f);
 
+bool qemu_file_is_seekable(QEMUFile *f);
+
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
 qemu_put_be64(f, *pv);
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 8aec9fabf7..32fd4a34fd 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -75,6 +75,12 @@ QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 int qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
+void qemu_set_offset(QEMUFile *f, off_t off, int whence);
+off_t qemu_get_offset(QEMUFile *f);
+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
+off_t pos);
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
+  off_t pos);
 
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 94231ff295..b10c882629 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -33,6 +33,7 @@
 #include "options.h"
 #include "qapi/error.h"
 #include "rdma.h"
+#include "io/channel-file.h"
 
 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
@@ -255,6 +256,10 @@ static void qemu_iovec_release_ram(QEMUFile *f)
 memset(f->may_free, 0, sizeof(f->may_free));
 }
 
+bool qemu_file_is_seekable(QEMUFile *f)
+{
+return qio_channel_has_feature(f->ioc, QIO_CHANNEL_FEATURE_SEEKABLE);
+}
 
 /**
  * Flushes QEMUFile buffer
@@ -447,6 +452,107 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, 
size_t size)
 }
 }
 
+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
+off_t pos)
+{
+Error *err = NULL;
+size_t ret;
+
+if (f->last_error) {
+return;
+}
+
+qemu_fflush(f);
+ret = qio_channel_pwrite(f->ioc, (char *)buf, buflen, pos, );
+
+if (err) {
+qemu_file_set_error_obj(f, -EIO, err);
+return;
+}
+
+if ((ssize_t)ret == QIO_CHANNEL_ERR_BLOCK) {
+qemu_file_set_error_obj(f, -EAGAIN, NULL);
+return;
+}
+
+if (ret != buflen) {
+error_setg(, "Partial write of size %zu, expected %zu", ret,
+   buflen);
+qemu_file_set_error_obj(f, -EIO, err);
+return;
+}
+
+stat64_add(_stats.qemu_file_transferred, buflen);
+
+return;
+}
+
+
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
+  off_t pos)
+{
+Error *err = NULL;
+size_t ret;
+
+if (f->last_error) {
+return 0;
+}
+
+ret = qio_channel_pread(f->ioc, (char *)buf, buflen, pos, );
+
+if ((ssize_t)ret == -1 || err) {
+qemu_file_set_error_obj(f, -EIO, err);
+return 0;
+}
+
+if ((ssize_t)ret == QIO_CHANNEL_ERR_BLOCK) {
+qemu_file_set_error_obj(f, -EAGAIN, NULL);
+return 0;
+}
+
+if (ret != buflen) {
+error_setg(, "Partial read of size %zu, expected %zu", ret, 
buflen);
+qemu_file_set_error_obj(f, -EIO, err);
+return 0;
+}
+
+return ret;
+}
+
+void qemu_set_offset(QEMUFile *f, off_t off, int whence)
+{
+Error *err = NULL;
+off_t ret;
+
+if (qemu_file_is_writable(f)) {
+qemu_fflush(f);
+} else {
+/* Drop all cached buffers if existed; will trigger a re-fill later */
+f->buf_index = 0;
+f->buf_size = 0;
+}
+
+ret = qio_channel_io_seek(f->ioc, off, whence, );
+if (ret == (off_t)-1) {
+qemu_file_set_error_obj(f, -EIO, err);
+}
+}
+
+off_t qemu_get_offset(QEMUFile *f)
+{
+Error *err = NULL;
+off_t ret;
+
+qemu_fflush(f);
+
+ret = qio_channel_io_seek(f->ioc, 0, SEEK_CUR, );
+if (ret == (off_t)-1) {
+qemu_file_set_error_obj(f, -EIO, err);
+}
+return ret;
+}
+
+
 void qemu_put_byte(QEMUFile *f, int v)
 {
 if (f->last_error) {
-- 
2.44.0

[PATCH] migration/multifd: Document two places for mapped-ram

2024-03-01 Thread peterx

From: Peter Xu 

Add two documentations for mapped-ram migration on two spots that may not
be extremely clear.

Signed-off-by: Peter Xu 
---
Based-on: <20240229153017.2221-1-faro...@suse.de>
---
 migration/multifd.c | 12 
 migration/ram.c |  8 +++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index b4e5a9dfcc..2942395ce2 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -709,6 +709,18 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams 
*p, Error **errp)
 {
 if (p->c) {
 migration_ioc_unregister_yank(p->c);
+/*
+ * An explicitly close() on the channel here is normally not
+ * required, but can be helpful for "file:" iochannels, where it
+ * will include an fdatasync() to make sure the data is flushed to
+ * the disk backend.
+ *
+ * The object_unref() cannot guarantee that because: (1) finalize()
+ * of the iochannel is only triggered on the last reference, and
+ * it's not guaranteed that we always hold the last refcount when
+ * reaching here, and, (2) even if finalize() is invoked, it only
+ * does a close(fd) without data flush.
+ */
 qio_channel_close(p->c, _abort);
 object_unref(OBJECT(p->c));
 p->c = NULL;
diff --git a/migration/ram.c b/migration/ram.c
index 1f1b5297cf..c79e3de521 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4258,7 +4258,13 @@ static int ram_load_precopy(QEMUFile *f)
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
 case RAM_SAVE_FLAG_MEM_SIZE:
 ret = parse_ramblocks(f, addr);
-
+/*
+ * For mapped-ram migration (to a file) using multifd, we sync
+ * once and for all here to make sure all tasks we queued to
+ * multifd threads are completed, so that all the ramblocks
+ * (including all the guest memory pages within) are fully
+ * loaded after this sync returns.
+ */
 if (migrate_mapped_ram()) {
 multifd_recv_sync_main();
 }
-- 
2.44.0

[PULL 21/25] migration: update cpr-reboot description

2024-02-27 Thread peterx

From: Steve Sistare 

Clarify qapi for cpr-reboot migration mode, and add vfio support.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Link: 
https://lore.kernel.org/r/1708622920-68779-14-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 qapi/migration.json | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 7303e57e8e..bee5e71fe3 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -636,19 +636,28 @@
 #
 # @normal: the original form of migration. (since 8.2)
 #
-# @cpr-reboot: The migrate command saves state to a file, allowing one to
-#  quit qemu, reboot to an updated kernel, and restart an updated
-#  version of qemu.  The caller must specify a migration URI
-#  that writes to and reads from a file.  Unlike normal mode,
-#  the use of certain local storage options does not block the
-#  migration, but the caller must not modify guest block devices
-#  between the quit and restart.  To avoid saving guest RAM to the
-#  file, the memory backend must be shared, and the 
@x-ignore-shared
-#  migration capability must be set.  Guest RAM must be 
non-volatile
-#  across reboot, such as by backing it with a dax device, but this
-#  is not enforced.  The restarted qemu arguments must match those
-#  used to initially start qemu, plus the -incoming option.
-#  (since 8.2)
+# @cpr-reboot: The migrate command stops the VM and saves state to the URI.
+# After quitting qemu, the user resumes by running qemu -incoming.
+#
+# This mode allows the user to quit qemu, and restart an updated version
+# of qemu.  The user may even update and reboot the OS before restarting,
+# as long as the URI persists across a reboot.
+#
+# Unlike normal mode, the use of certain local storage options does not
+# block the migration, but the user must not modify guest block devices
+# between the quit and restart.
+#
+# This mode supports vfio devices provided the user first puts the guest
+# in the suspended runstate, such as by issuing guest-suspend-ram to the
+# qemu guest agent.
+#
+# Best performance is achieved when the memory backend is shared and the
+# @x-ignore-shared migration capability is set, but this is not required.
+# Further, if the user reboots before restarting such a configuration, the
+# shared backend must be be non-volatile across reboot, such as by backing
+# it with a dax device.
+#
+# (since 8.2)
 ##
 { 'enum': 'MigMode',
   'data': [ 'normal', 'cpr-reboot' ] }
-- 
2.43.0

[PULL 17/25] migration: per-mode notifiers

2024-02-27 Thread peterx

From: Steve Sistare 

Keep a separate list of migration notifiers for each migration mode.

Suggested-by: Peter Xu 
Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Reviewed-by: David Hildenbrand 
Link: 
https://lore.kernel.org/r/1708622920-68779-8-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h |  6 ++
 migration/migration.c| 22 +-
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e36a1f3ec4..4dc06a92b7 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -86,6 +86,12 @@ typedef int (*MigrationNotifyFunc)(NotifierWithReturn 
*notify,
 void migration_add_notifier(NotifierWithReturn *notify,
 MigrationNotifyFunc func);
 
+/*
+ * Same as migration_add_notifier, but applies to be specified @mode.
+ */
+void migration_add_notifier_mode(NotifierWithReturn *notify,
+ MigrationNotifyFunc func, MigMode mode);
+
 void migration_remove_notifier(NotifierWithReturn *notify);
 void migration_call_notifiers(MigrationState *s, MigrationEventType type);
 bool migration_in_setup(MigrationState *);
diff --git a/migration/migration.c b/migration/migration.c
index 33149c462c..925103b61a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -69,8 +69,13 @@
 #include "qemu/sockets.h"
 #include "sysemu/kvm.h"
 
-static NotifierWithReturnList migration_state_notifiers =
-NOTIFIER_WITH_RETURN_LIST_INITIALIZER(migration_state_notifiers);
+#define NOTIFIER_ELEM_INIT(array, elem)\
+[elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem])
+
+static NotifierWithReturnList migration_state_notifiers[] = {
+NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
+NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
+};
 
 /* Messages sent on the return path from destination to source */
 enum mig_rp_message_type {
@@ -1463,11 +1468,17 @@ static void migrate_fd_cancel(MigrationState *s)
 }
 }
 
+void migration_add_notifier_mode(NotifierWithReturn *notify,
+ MigrationNotifyFunc func, MigMode mode)
+{
+notify->notify = (NotifierWithReturnFunc)func;
+notifier_with_return_list_add(_state_notifiers[mode], notify);
+}
+
 void migration_add_notifier(NotifierWithReturn *notify,
 MigrationNotifyFunc func)
 {
-notify->notify = (NotifierWithReturnFunc)func;
-notifier_with_return_list_add(_state_notifiers, notify);
+migration_add_notifier_mode(notify, func, MIG_MODE_NORMAL);
 }
 
 void migration_remove_notifier(NotifierWithReturn *notify)
@@ -1480,10 +1491,11 @@ void migration_remove_notifier(NotifierWithReturn 
*notify)
 
 void migration_call_notifiers(MigrationState *s, MigrationEventType type)
 {
+MigMode mode = s->parameters.mode;
 MigrationEvent e;
 
 e.type = type;
-notifier_with_return_list_notify(_state_notifiers, , 0);
+notifier_with_return_list_notify(_state_notifiers[mode], , 0);
 }
 
 bool migration_in_setup(MigrationState *s)
-- 
2.43.0

[PULL 13/25] migration: convert to NotifierWithReturn

2024-02-27 Thread peterx

From: Steve Sistare 

Change all migration notifiers to type NotifierWithReturn, so notifiers
can return an error status in a future patch.  For now, pass NULL for the
notifier error parameter, and do not check the return value.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Reviewed-by: David Hildenbrand 
Link: 
https://lore.kernel.org/r/1708622920-68779-4-git-send-email-steven.sist...@oracle.com
[peterx: dropped unexpected update to roms/seabios-hppa]
Signed-off-by: Peter Xu 
---
 include/hw/vfio/vfio-common.h  |  2 +-
 include/hw/virtio/virtio-net.h |  2 +-
 include/migration/misc.h   |  6 +++---
 include/qemu/notify.h  |  1 +
 hw/net/virtio-net.c|  4 +++-
 hw/vfio/migration.c|  4 +++-
 migration/migration.c  | 16 
 net/vhost-vdpa.c   |  6 --
 ui/spice-core.c|  8 +---
 9 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9b7ef7d02b..4a6c262f77 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -62,7 +62,7 @@ typedef struct VFIORegion {
 typedef struct VFIOMigration {
 struct VFIODevice *vbasedev;
 VMChangeStateEntry *vm_state;
-Notifier migration_state;
+NotifierWithReturn migration_state;
 uint32_t device_state;
 int data_fd;
 void *data_buffer;
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 55977f01f0..eaee8f4243 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -221,7 +221,7 @@ struct VirtIONet {
 DeviceListener primary_listener;
 QDict *primary_opts;
 bool primary_opts_from_json;
-Notifier migration_state;
+NotifierWithReturn migration_state;
 VirtioNetRssData rss_data;
 struct NetRxPkt *rx_pkt;
 struct EBPFRSSContext ebpf_rss;
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 5e65c18f1a..b62e351d96 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,9 +60,9 @@ void migration_object_init(void);
 void migration_shutdown(void);
 bool migration_is_idle(void);
 bool migration_is_active(MigrationState *);
-void migration_add_notifier(Notifier *notify,
-void (*func)(Notifier *notifier, void *data));
-void migration_remove_notifier(Notifier *notify);
+void migration_add_notifier(NotifierWithReturn *notify,
+NotifierWithReturnFunc func);
+void migration_remove_notifier(NotifierWithReturn *notify);
 void migration_call_notifiers(MigrationState *s);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
diff --git a/include/qemu/notify.h b/include/qemu/notify.h
index 9a85631864..abf18dbf59 100644
--- a/include/qemu/notify.h
+++ b/include/qemu/notify.h
@@ -45,6 +45,7 @@ bool notifier_list_empty(NotifierList *list);
 /* Same as Notifier but allows .notify() to return errors */
 typedef struct NotifierWithReturn NotifierWithReturn;
 
+/* Return int to allow for different failure modes and recovery actions */
 typedef int (*NotifierWithReturnFunc)(NotifierWithReturn *notifier, void *data,
   Error **errp);
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5a79bc3a3a..75f4e8664d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3534,11 +3534,13 @@ static void 
virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
 }
 }
 
-static void virtio_net_migration_state_notifier(Notifier *notifier, void *data)
+static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier,
+   void *data, Error **errp)
 {
 MigrationState *s = data;
 VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
 virtio_net_handle_migration_primary(n, s);
+return 0;
 }
 
 static bool failover_hide_primary_device(DeviceListener *listener,
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 70e6b1a709..6b6acc4140 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -754,7 +754,8 @@ static void vfio_vmstate_change(void *opaque, bool running, 
RunState state)
   mig_state_to_str(new_state));
 }
 
-static void vfio_migration_state_notifier(Notifier *notifier, void *data)
+static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
+ void *data, Error **errp)
 {
 MigrationState *s = data;
 VFIOMigration *migration = container_of(notifier, VFIOMigration,
@@ -770,6 +771,7 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
 case MIGRATION_STATUS_FAILED:
 vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_RUNNING);
 }
+return 0;
 }
 
 static void vfio_migration_free(VFIODevice *vbasedev)
diff --git a/migration/migration.c b/migration/migration.c
index ab21de2cad

[PULL 25/25] migration: Use migrate_has_error() in close_return_path_on_source()

2024-02-27 Thread peterx

From: Cédric Le Goater 

close_return_path_on_source() retrieves the migration error from the
the QEMUFile '->to_dst_file' to know if a shutdown is required. This
shutdown is required to exit the return-path thread.

Avoid relying on '->to_dst_file' and use migrate_has_error() instead.

(using to_dst_file is a heuristic to infer whether
rp_state.from_dst_file might be stuck on a recvmsg(). Using a generic
method for detecting errors is more reliable. We also want to reduce
dependency on QEMUFile::last_error)

Suggested-by: Peter Xu 
Signed-off-by: Cédric Le Goater 
Reviewed-by: Peter Xu 
[added some words about the motivation for this patch]
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240226203122.22894-3-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/migration.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 7ba2b60e46..bab68bcbef 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2429,8 +2429,7 @@ static bool close_return_path_on_source(MigrationState 
*ms)
  * cause it to unblock if it's stuck waiting for the destination.
  */
 WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
-if (ms->to_dst_file && ms->rp_state.from_dst_file &&
-qemu_file_get_error(ms->to_dst_file)) {
+if (migrate_has_error(ms) && ms->rp_state.from_dst_file) {
 qemu_file_shutdown(ms->rp_state.from_dst_file);
 }
 }
-- 
2.43.0

[PULL 16/25] migration: MigrationNotifyFunc

2024-02-27 Thread peterx

From: Steve Sistare 

Define MigrationNotifyFunc to improve type safety and simplify migration
notifiers.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Reviewed-by: David Hildenbrand 
Link: 
https://lore.kernel.org/r/1708622920-68779-7-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h | 5 -
 hw/net/virtio-net.c  | 4 +---
 hw/vfio/migration.c  | 3 +--
 migration/migration.c| 4 ++--
 net/vhost-vdpa.c | 6 ++
 ui/spice-core.c  | 4 +---
 6 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e6150009e0..e36a1f3ec4 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -72,6 +72,9 @@ typedef struct MigrationEvent {
 MigrationEventType type;
 } MigrationEvent;
 
+typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
+   MigrationEvent *e, Error **errp);
+
 /*
  * Register the notifier @notify to be called when a migration event occurs
  * for MIG_MODE_NORMAL, as specified by the MigrationEvent passed to @func.
@@ -81,7 +84,7 @@ typedef struct MigrationEvent {
  *- MIG_EVENT_PRECOPY_FAILED
  */
 void migration_add_notifier(NotifierWithReturn *notify,
-NotifierWithReturnFunc func);
+MigrationNotifyFunc func);
 
 void migration_remove_notifier(NotifierWithReturn *notify);
 void migration_call_notifiers(MigrationState *s, MigrationEventType type);
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e803f98c3a..a3c711b56d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3535,10 +3535,8 @@ static void 
virtio_net_handle_migration_primary(VirtIONet *n, MigrationEvent *e)
 }
 
 static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier,
-   void *data, Error **errp)
+   MigrationEvent *e, Error **errp)
 {
-MigrationEvent *e = data;
-
 VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
 virtio_net_handle_migration_primary(n, e);
 return 0;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 869d8417d6..50140eda87 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -755,9 +755,8 @@ static void vfio_vmstate_change(void *opaque, bool running, 
RunState state)
 }
 
 static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
- void *data, Error **errp)
+ MigrationEvent *e, Error **errp)
 {
-MigrationEvent *e = data;
 VFIOMigration *migration = container_of(notifier, VFIOMigration,
 migration_state);
 VFIODevice *vbasedev = migration->vbasedev;
diff --git a/migration/migration.c b/migration/migration.c
index 8f7f2d92f4..33149c462c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1464,9 +1464,9 @@ static void migrate_fd_cancel(MigrationState *s)
 }
 
 void migration_add_notifier(NotifierWithReturn *notify,
-NotifierWithReturnFunc func)
+MigrationNotifyFunc func)
 {
-notify->notify = func;
+notify->notify = (NotifierWithReturnFunc)func;
 notifier_with_return_list_add(_state_notifiers, notify);
 }
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a29d18a9ef..e6bdb4562d 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -323,11 +323,9 @@ static void 
vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
 }
 
 static int vdpa_net_migration_state_notifier(NotifierWithReturn *notifier,
- void *data, Error **errp)
+ MigrationEvent *e, Error **errp)
 {
-MigrationEvent *e = data;
-VhostVDPAState *s = container_of(notifier, VhostVDPAState,
- migration_state);
+VhostVDPAState *s = container_of(notifier, VhostVDPAState, 
migration_state);
 
 if (e->type == MIG_EVENT_PRECOPY_SETUP) {
 vhost_vdpa_net_log_global_enable(s, true);
diff --git a/ui/spice-core.c b/ui/spice-core.c
index 0a59876da2..15be640286 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -569,10 +569,8 @@ static SpiceInfo *qmp_query_spice_real(Error **errp)
 }
 
 static int migration_state_notifier(NotifierWithReturn *notifier,
-void *data, Error **errp)
+MigrationEvent *e, Error **errp)
 {
-MigrationEvent *e = data;
-
 if (!spice_have_target_host) {
 return 0;
 }
-- 
2.43.0

[PULL 22/25] migration: options incompatible with cpr

2024-02-27 Thread peterx

From: Steve Sistare 

Fail the migration request if options are set that are incompatible
with cpr.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Link: 
https://lore.kernel.org/r/1708622920-68779-15-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 qapi/migration.json   |  2 ++
 migration/migration.c | 17 +
 2 files changed, 19 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index bee5e71fe3..0b33a71ab4 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -657,6 +657,8 @@
 # shared backend must be be non-volatile across reboot, such as by backing
 # it with a dax device.
 #
+# cpr-reboot may not be used with postcopy, colo, or background-snapshot.
+#
 # (since 8.2)
 ##
 { 'enum': 'MigMode',
diff --git a/migration/migration.c b/migration/migration.c
index 90a90947fb..7652fd4d14 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1953,6 +1953,23 @@ static bool migrate_prepare(MigrationState *s, bool blk, 
bool blk_inc,
 return false;
 }
 
+if (migrate_mode_is_cpr(s)) {
+const char *conflict = NULL;
+
+if (migrate_postcopy()) {
+conflict = "postcopy";
+} else if (migrate_background_snapshot()) {
+conflict = "background snapshot";
+} else if (migrate_colo()) {
+conflict = "COLO";
+}
+
+if (conflict) {
+error_setg(errp, "Cannot use %s with CPR", conflict);
+return false;
+}
+}
+
 if (blk || blk_inc) {
 if (migrate_colo()) {
 error_setg(errp, "No disk migration is required in COLO mode");
-- 
2.43.0

[PULL 19/25] migration: notifier error checking

2024-02-27 Thread peterx

From: Steve Sistare 

Check the status returned by migration notifiers for event type
MIG_EVENT_PRECOPY_SETUP, and report errors.  None of the notifiers
return an error status at this time.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Link: 
https://lore.kernel.org/r/1708622920-68779-10-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h |  8 +++-
 migration/migration.c| 25 -
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4dc06a92b7..e4933b815b 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -72,6 +72,11 @@ typedef struct MigrationEvent {
 MigrationEventType type;
 } MigrationEvent;
 
+/*
+ * A MigrationNotifyFunc may return an error code and an Error object,
+ * but only when @e->type is MIG_EVENT_PRECOPY_SETUP.  The code is an int
+ * to allow for different failure modes and recovery actions.
+ */
 typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
MigrationEvent *e, Error **errp);
 
@@ -93,7 +98,8 @@ void migration_add_notifier_mode(NotifierWithReturn *notify,
  MigrationNotifyFunc func, MigMode mode);
 
 void migration_remove_notifier(NotifierWithReturn *notify);
-void migration_call_notifiers(MigrationState *s, MigrationEventType type);
+int migration_call_notifiers(MigrationState *s, MigrationEventType type,
+ Error **errp);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/migration/migration.c b/migration/migration.c
index 6a115d28b8..37c836b0b0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1376,7 +1376,7 @@ static void migrate_fd_cleanup(MigrationState *s)
 }
 type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
  MIG_EVENT_PRECOPY_DONE;
-migration_call_notifiers(s, type);
+migration_call_notifiers(s, type, NULL);
 block_cleanup_parameters();
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
@@ -1489,13 +1489,18 @@ void migration_remove_notifier(NotifierWithReturn 
*notify)
 }
 }
 
-void migration_call_notifiers(MigrationState *s, MigrationEventType type)
+int migration_call_notifiers(MigrationState *s, MigrationEventType type,
+ Error **errp)
 {
 MigMode mode = s->parameters.mode;
 MigrationEvent e;
+int ret;
 
 e.type = type;
-notifier_with_return_list_notify(_state_notifiers[mode], , 0);
+ret = notifier_with_return_list_notify(_state_notifiers[mode],
+   , errp);
+assert(!ret || type == MIG_EVENT_PRECOPY_SETUP);
+return ret;
 }
 
 bool migration_in_setup(MigrationState *s)
@@ -2549,7 +2554,7 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
  * at the transition to postcopy and after the device state; in particular
  * spice needs to trigger a transition now
  */
-migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE);
+migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE, NULL);
 
 migration_downtime_end(ms);
 
@@ -2569,11 +2574,10 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 
 ret = qemu_file_get_error(ms->to_dst_file);
 if (ret) {
-error_setg(errp, "postcopy_start: Migration stream errored");
-migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-  MIGRATION_STATUS_FAILED);
+error_setg_errno(errp, -ret, "postcopy_start: Migration stream error");
+bql_lock();
+goto fail;
 }
-
 trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
 
 return ret;
@@ -2594,6 +2598,7 @@ fail:
 error_report_err(local_err);
 }
 }
+migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
 bql_unlock();
 return -1;
 }
@@ -3613,7 +3618,9 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 rate_limit = migrate_max_bandwidth();
 
 /* Notify before starting migration thread */
-migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP);
+if (migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP, _err)) {
+goto fail;
+}
 }
 
 migration_rate_set(rate_limit);
-- 
2.43.0

[PULL 24/25] migration: Join the return path thread before releasing to_dst_file

2024-02-27 Thread peterx

From: Fabiano Rosas 

The return path thread might hang at a blocking system call. Before
joining the thread we might need to issue a shutdown() on the socket
file descriptor to release it. To determine whether the shutdown() is
necessary we look at the QEMUFile error.

Make sure we only clean up the QEMUFile after the return path has been
waited for.

This fixes a hang when qemu_savevm_state_setup() produced an error
that was detected by migration_detect_error(). That skips
migration_completion() so close_return_path_on_source() would get
stuck waiting for the RP thread to terminate.

Reported-by: Cédric Le Goater 
Tested-by: Cédric Le Goater 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240226203122.22894-2-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/migration.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ccb13fa94a..7ba2b60e46 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1342,6 +1342,8 @@ static void migrate_fd_cleanup(MigrationState *s)
 
 qemu_savevm_state_cleanup();
 
+close_return_path_on_source(s);
+
 if (s->to_dst_file) {
 QEMUFile *tmp;
 
@@ -1366,12 +1368,6 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_fclose(tmp);
 }
 
-/*
- * We already cleaned up to_dst_file, so errors from the return
- * path might be due to that, ignore them.
- */
-close_return_path_on_source(s);
-
 assert(!migration_is_active(s));
 
 if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -2914,6 +2910,13 @@ static MigThrError postcopy_pause(MigrationState *s)
 while (true) {
 QEMUFile *file;
 
+/*
+ * We're already pausing, so ignore any errors on the return
+ * path and just wait for the thread to finish. It will be
+ * re-created when we resume.
+ */
+close_return_path_on_source(s);
+
 /*
  * Current channel is possibly broken. Release it.  Note that this is
  * guaranteed even without lock because to_dst_file should only be
@@ -2933,13 +2936,6 @@ static MigThrError postcopy_pause(MigrationState *s)
 qemu_file_shutdown(file);
 qemu_fclose(file);
 
-/*
- * We're already pausing, so ignore any errors on the return
- * path and just wait for the thread to finish. It will be
- * re-created when we resume.
- */
-close_return_path_on_source(s);
-
 migrate_set_state(>state, s->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);
 
-- 
2.43.0

[PULL 15/25] migration: remove postcopy_after_devices

2024-02-27 Thread peterx

From: Steve Sistare 

postcopy_after_devices and migration_in_postcopy_after_devices are no
longer used, so delete them.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Link: 
https://lore.kernel.org/r/1708622920-68779-6-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h | 1 -
 migration/migration.h| 2 --
 migration/migration.c| 7 ---
 3 files changed, 10 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 9e4abae97f..e6150009e0 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -89,7 +89,6 @@ bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
 /* ...and after the device transmission */
-bool migration_in_postcopy_after_devices(MigrationState *);
 /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
 /* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */
diff --git a/migration/migration.h b/migration/migration.h
index f2c8b8f286..aef8afbe1f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -348,8 +348,6 @@ struct MigrationState {
 
 /* Flag set once the migration has been asked to enter postcopy */
 bool start_postcopy;
-/* Flag set after postcopy has sent the device state */
-bool postcopy_after_devices;
 
 /* Flag set once the migration thread is running (and needs joining) */
 bool migration_thread_running;
diff --git a/migration/migration.c b/migration/migration.c
index 4650c21f67..8f7f2d92f4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1527,11 +1527,6 @@ bool migration_postcopy_is_alive(int state)
 }
 }
 
-bool migration_in_postcopy_after_devices(MigrationState *s)
-{
-return migration_in_postcopy() && s->postcopy_after_devices;
-}
-
 bool migration_in_incoming_postcopy(void)
 {
 PostcopyState ps = postcopy_state_get();
@@ -1613,7 +1608,6 @@ int migrate_init(MigrationState *s, Error **errp)
 s->expected_downtime = 0;
 s->setup_time = 0;
 s->start_postcopy = false;
-s->postcopy_after_devices = false;
 s->migration_thread_running = false;
 error_free(s->error);
 s->error = NULL;
@@ -2543,7 +2537,6 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
  * at the transition to postcopy and after the device state; in particular
  * spice needs to trigger a transition now
  */
-ms->postcopy_after_devices = true;
 migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE);
 
 migration_downtime_end(ms);
-- 
2.43.0

[PULL 20/25] migration: stop vm for cpr

2024-02-27 Thread peterx

From: Steve Sistare 

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Link: 
https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h |  1 +
 migration/migration.h|  2 --
 migration/migration.c| 51 
 3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e4933b815b..5d1aa593ed 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ void migration_object_init(void);
 void migration_shutdown(void);
 bool migration_is_idle(void);
 bool migration_is_active(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
 
 typedef enum MigrationEventType {
 MIG_EVENT_PRECOPY_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index aef8afbe1f..65c0b61cbd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
  */
 void migration_rp_kick(MigrationState *s);
 
-int migration_stop_vm(RunState state);
-
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 37c836b0b0..90a90947fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, 
gconstpointer bp)
 return (a > b) - (a < b);
 }
 
-int migration_stop_vm(RunState state)
+static int migration_stop_vm(MigrationState *s, RunState state)
 {
-int ret = vm_stop_force_state(state);
+int ret;
+
+migration_downtime_start(s);
+
+s->vm_old_state = runstate_get();
+global_state_store();
+
+ret = vm_stop_force_state(state);
 
 trace_vmstate_downtime_checkpoint("src-vm-stopped");
+trace_migration_completion_vm_stop(ret);
 
 return ret;
 }
@@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s)
 s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 }
 
+bool migrate_mode_is_cpr(MigrationState *s)
+{
+return s->parameters.mode == MIG_MODE_CPR_REBOOT;
+}
+
 int migrate_init(MigrationState *s, Error **errp)
 {
 int ret;
@@ -2454,10 +2467,7 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 bql_lock();
 trace_postcopy_start_set_run();
 
-migration_downtime_start(ms);
-
-global_state_store();
-ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
+ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
 if (ret < 0) {
 goto fail;
 }
@@ -2652,15 +2662,12 @@ static int migration_completion_precopy(MigrationState 
*s,
 int ret;
 
 bql_lock();
-migration_downtime_start(s);
-
-s->vm_old_state = runstate_get();
-global_state_store();
 
-ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
-trace_migration_completion_vm_stop(ret);
-if (ret < 0) {
-goto out_unlock;
+if (!migrate_mode_is_cpr(s)) {
+ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
+if (ret < 0) {
+goto out_unlock;
+}
 }
 
 ret = migration_maybe_pause(s, current_active_state,
@@ -3500,15 +3507,10 @@ static void *bg_migration_thread(void *opaque)
 s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
 
 trace_migration_thread_setup_complete();
-migration_downtime_start(s);
 
 bql_lock();
 
-s->vm_old_state = runstate_get();
-
-global_state_store();
-/* Forcibly stop VM before saving state of vCPUs and devices */
-if (migration_stop_vm(RUN_STATE_PAUSED)) {
+if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
 goto fail;
 }
 /*
@@ -3584,6 +3586,7 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 Error *local_err = NULL;
 uint64_t rate_limit;
 bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
+int ret;
 
 /*
  * If there's a previous error, free it and prepare for another one.
@@ -3655,6 +3658,14 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 return;
 }
 
+if (migrate_mode_is_cpr(s)) {
+ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
+if (ret < 0) {
+error_setg(_err, "migration_stop_vm failed, error %d", -ret);
+goto fail;
+}
+}
+
 if (migrate_background_snapshot()) {
 qemu_thread_create(>thread, "bg_snapshot",
 bg_migration_thread, s, QEMU_THREAD_JOINABLE);
-- 
2.43.0

[PULL 23/25] migration: Fix qmp_query_migrate mbps value

2024-02-27 Thread peterx

From: Fabiano Rosas 

The QMP command query_migrate might see incorrect throughput numbers
if it runs after we've set the migration completion status but before
migration_calculate_complete() has updated s->total_time and s->mbps.

The migration status would show COMPLETED, but the throughput value
would be the one from the last iteration and not the one from the
whole migration. This will usually be a larger value due to the time
period being smaller (one iteration).

Move migration_calculate_complete() earlier so that the status
MIGRATION_STATUS_COMPLETED is only emitted after the final counters
update. Keep everything under the BQL so the QMP thread sees the
updates as atomic.

Rename migration_calculate_complete to migration_completion_end to
reflect its new purpose of also updating s->state.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240226143335.14282-1-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/migration.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 7652fd4d14..ccb13fa94a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -107,6 +107,7 @@ static int migration_maybe_pause(MigrationState *s,
  int new_state);
 static void migrate_fd_cancel(MigrationState *s);
 static bool close_return_path_on_source(MigrationState *s);
+static void migration_completion_end(MigrationState *s);
 
 static void migration_downtime_start(MigrationState *s)
 {
@@ -2787,8 +2788,7 @@ static void migration_completion(MigrationState *s)
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
 } else {
-migrate_set_state(>state, current_active_state,
-  MIGRATION_STATUS_COMPLETED);
+migration_completion_end(s);
 }
 
 return;
@@ -2825,8 +2825,7 @@ static void bg_migration_completion(MigrationState *s)
 goto fail;
 }
 
-migrate_set_state(>state, current_active_state,
-  MIGRATION_STATUS_COMPLETED);
+migration_completion_end(s);
 return;
 
 fail:
@@ -3028,18 +3027,28 @@ static MigThrError 
migration_detect_error(MigrationState *s)
 }
 }
 
-static void migration_calculate_complete(MigrationState *s)
+static void migration_completion_end(MigrationState *s)
 {
 uint64_t bytes = migration_transferred_bytes();
 int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 int64_t transfer_time;
 
+/*
+ * Take the BQL here so that query-migrate on the QMP thread sees:
+ * - atomic update of s->total_time and s->mbps;
+ * - correct ordering of s->mbps update vs. s->state;
+ */
+bql_lock();
 migration_downtime_end(s);
 s->total_time = end_time - s->start_time;
 transfer_time = s->total_time - s->setup_time;
 if (transfer_time) {
 s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
 }
+
+migrate_set_state(>state, s->state,
+  MIGRATION_STATUS_COMPLETED);
+bql_unlock();
 }
 
 static void update_iteration_initial_status(MigrationState *s)
@@ -3186,7 +3195,6 @@ static void migration_iteration_finish(MigrationState *s)
 bql_lock();
 switch (s->state) {
 case MIGRATION_STATUS_COMPLETED:
-migration_calculate_complete(s);
 runstate_set(RUN_STATE_POSTMIGRATE);
 break;
 case MIGRATION_STATUS_COLO:
@@ -3230,9 +3238,6 @@ static void bg_migration_iteration_finish(MigrationState 
*s)
 bql_lock();
 switch (s->state) {
 case MIGRATION_STATUS_COMPLETED:
-migration_calculate_complete(s);
-break;
-
 case MIGRATION_STATUS_ACTIVE:
 case MIGRATION_STATUS_FAILED:
 case MIGRATION_STATUS_CANCELLED:
-- 
2.43.0

[PULL 18/25] migration: refactor migrate_fd_connect failures

2024-02-27 Thread peterx

From: Steve Sistare 

Move common code for the error path in migrate_fd_connect to a shared
fail label.  No functional change.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Reviewed-by: David Hildenbrand 
Link: 
https://lore.kernel.org/r/1708622920-68779-9-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 migration/migration.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 925103b61a..6a115d28b8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3627,11 +3627,7 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 if (migrate_postcopy_ram() || migrate_return_path()) {
 if (open_return_path_on_source(s)) {
 error_setg(_err, "Unable to open return-path for postcopy");
-migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
-migrate_set_error(s, local_err);
-error_report_err(local_err);
-migrate_fd_cleanup(s);
-return;
+goto fail;
 }
 }
 
@@ -3660,6 +3656,13 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 migration_thread, s, QEMU_THREAD_JOINABLE);
 }
 s->migration_thread_running = true;
+return;
+
+fail:
+migrate_set_error(s, local_err);
+migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
+error_report_err(local_err);
+migrate_fd_cleanup(s);
 }
 
 static void migration_class_init(ObjectClass *klass, void *data)
-- 
2.43.0

[PULL 12/25] migration: remove error from notifier data

2024-02-27 Thread peterx

From: Steve Sistare 

Remove the error object from opaque data passed to notifiers.
Use the new error parameter passed to the notifier instead.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
Reviewed-by: David Hildenbrand 
Link: 
https://lore.kernel.org/r/1708622920-68779-3-git-send-email-steven.sist...@oracle.com
Signed-off-by: Peter Xu 
---
 include/migration/misc.h | 1 -
 migration/postcopy-ram.h | 1 -
 hw/virtio/vhost-user.c   | 8 
 migration/postcopy-ram.c | 1 -
 migration/ram.c  | 1 -
 5 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 1bc8902e6d..5e65c18f1a 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -31,7 +31,6 @@ typedef enum PrecopyNotifyReason {
 
 typedef struct PrecopyNotifyData {
 enum PrecopyNotifyReason reason;
-Error **errp;
 } PrecopyNotifyData;
 
 void precopy_infrastructure_init(void);
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 442ab89752..ecae941211 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -128,7 +128,6 @@ enum PostcopyNotifyReason {
 
 struct PostcopyNotifyData {
 enum PostcopyNotifyReason reason;
-Error **errp;
 };
 
 void postcopy_add_notifier(NotifierWithReturn *nn);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index f502345f37..a1eea8547e 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2096,20 +2096,20 @@ static int 
vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
 if (!virtio_has_feature(dev->protocol_features,
 VHOST_USER_PROTOCOL_F_PAGEFAULT)) {
 /* TODO: Get the device name into this error somehow */
-error_setg(pnd->errp,
+error_setg(errp,
"vhost-user backend not capable of postcopy");
 return -ENOENT;
 }
 break;
 
 case POSTCOPY_NOTIFY_INBOUND_ADVISE:
-return vhost_user_postcopy_advise(dev, pnd->errp);
+return vhost_user_postcopy_advise(dev, errp);
 
 case POSTCOPY_NOTIFY_INBOUND_LISTEN:
-return vhost_user_postcopy_listen(dev, pnd->errp);
+return vhost_user_postcopy_listen(dev, errp);
 
 case POSTCOPY_NOTIFY_INBOUND_END:
-return vhost_user_postcopy_end(dev, pnd->errp);
+return vhost_user_postcopy_end(dev, errp);
 
 default:
 /* We ignore notifications we don't know */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 3ab2f6b8fd..0273dc6a94 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -77,7 +77,6 @@ int postcopy_notify(enum PostcopyNotifyReason reason, Error 
**errp)
 {
 struct PostcopyNotifyData pnd;
 pnd.reason = reason;
-pnd.errp = errp;
 
 return notifier_with_return_list_notify(_notifier_list,
 , errp);
diff --git a/migration/ram.c b/migration/ram.c
index 5b6b09edd9..45a00b45ed 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -426,7 +426,6 @@ int precopy_notify(PrecopyNotifyReason reason, Error **errp)
 {
 PrecopyNotifyData pnd;
 pnd.reason = reason;
-pnd.errp = errp;
 
 return notifier_with_return_list_notify(_notifier_list, , 
errp);
 }
-- 
2.43.0

[PULL 05/25] migration/multifd: Release recv sem_sync earlier

2024-02-27 Thread peterx

From: Fabiano Rosas 

Now that multifd_recv_terminate_threads() is called only once, release
the recv side sem_sync earlier like we do for the send side.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240220224138.24759-6-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index fba00b9e8f..43f0820996 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1104,6 +1104,12 @@ static void multifd_recv_terminate_threads(Error *err)
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDRecvParams *p = _recv_state->params[i];
 
+/*
+ * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
+ * however try to wakeup it without harm in cleanup phase.
+ */
+qemu_sem_post(>sem_sync);
+
 /*
  * We could arrive here for two reasons:
  *  - normal quit, i.e. everything went fine, just finished
@@ -1162,12 +1168,6 @@ void multifd_recv_cleanup(void)
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDRecvParams *p = _recv_state->params[i];
 
-/*
- * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
- * however try to wakeup it without harm in cleanup phase.
- */
-qemu_sem_post(>sem_sync);
-
 if (p->thread_created) {
 qemu_thread_join(>thread);
 }
-- 
2.43.0

1 2 3 4 >

1 - 100 of 305 matches

Mail list logo