Peter Xu <[email protected]> writes:

> Migration module was there for 10+ years.  Initially, it was in most cases
> based on coroutines.  As more features were added into the framework, like
> postcopy, multifd, etc.. it became a mixture of threads and coroutines.
>
> I'm guessing coroutines just can't fix all issues that migration want to
> resolve.
>
> After all these years, migration is now heavily based on a threaded model.
>
> Now there's still a major part of migration framework that is still not
> thread-based, which is precopy load.  We do load in a separate thread in
> postcopy since the 1st day postcopy was introduced, however that requires a
> separate state transition from precopy loading all devices first, which
> still happens in the main thread of a coroutine.
>
> This patch tries to move the migration incoming side to be run inside a
> separate thread (mig/dst/main) just like the src (mig/src/main).  The
> entrance to be migration_incoming_thread().
>
> Quite a few things are needed to make it fly..  One note here is we need to
> change all these things in one patch to not break anything.  The other way
> to do this is add code to make all paths (that this patch touched) be ready
> for either coroutine or thread.  That may cause confusions in another way.
> So reviewers, please take my sincere apology on the hardness of reviewing
> this patch: it covers a few modules at the same time, and with some risky
> changes.
>
> BQL Analysis
> ============
>
> Firstly, when moving it over to the thread, it means the thread cannot take
> BQL during the whole process of loading anymore, because otherwise it can
> block main thread from using the BQL for all kinds of other concurrent
> tasks (for example, processing QMP / HMP commands).
>
> Here the first question to ask is: what needs BQL during precopy load, and
> what doesn't?
>

I just noticed that the BQL held at process_incoming_migration_co is
also responsible for stopping qmp_migrate_set_capabilities from being
dispatched.

Any point during incoming migration when BQL is unlocked we have a
window where a capability could be changed. Same for parameters, for
that matter.

To make matters worse, the -incoming cmdline will trigger
qmp_migrate_incoming->...->migration_transport_compatible early on, but
until the channels finally connect and process_incoming_migration_co
starts it's possible to just change a capability in an incompatible way
and the transport will never be validated again.

One example:

-- >8 --
>From 99bd88aa0a8b6d4e7c52196f25d344a2800b3d89 Mon Sep 17 00:00:00 2001
From: Fabiano Rosas <[email protected]>
Date: Thu, 8 Jan 2026 17:21:20 -0300
Subject: [PATCH] tmp

---
 tests/qtest/migration/precopy-tests.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tests/qtest/migration/precopy-tests.c 
b/tests/qtest/migration/precopy-tests.c
index aca7ed51ef..3f1a2870ee 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -158,6 +158,13 @@ static int new_rdma_link(char *buffer, bool ipv6)
     return -1;
 }
 
+static void *migrate_rdma_set_caps(QTestState *from, QTestState *to)
+{
+    migrate_set_capability(to, "mapped-ram", true);
+
+    return NULL;
+}
+
 static void __test_precopy_rdma_plain(MigrateCommon *args, bool ipv6)
 {
     char buffer[128] = {};
@@ -185,6 +192,7 @@ static void __test_precopy_rdma_plain(MigrateCommon *args, 
bool ipv6)
 
     args->listen_uri = uri;
     args->connect_uri = uri;
+    args->start_hook = migrate_rdma_set_caps;
 
     test_precopy_common(args);
 }
-- 
2.51.0

Reply via email to