Peter Xu <[email protected]> writes: > Migration module was there for 10+ years. Initially, it was in most cases > based on coroutines. As more features were added into the framework, like > postcopy, multifd, etc.. it became a mixture of threads and coroutines. > > I'm guessing coroutines just can't fix all issues that migration want to > resolve. > > After all these years, migration is now heavily based on a threaded model. > > Now there's still a major part of migration framework that is still not > thread-based, which is precopy load. We do load in a separate thread in > postcopy since the 1st day postcopy was introduced, however that requires a > separate state transition from precopy loading all devices first, which > still happens in the main thread of a coroutine. > > This patch tries to move the migration incoming side to be run inside a > separate thread (mig/dst/main) just like the src (mig/src/main). The > entrance to be migration_incoming_thread(). > > Quite a few things are needed to make it fly.. One note here is we need to > change all these things in one patch to not break anything. The other way > to do this is add code to make all paths (that this patch touched) be ready > for either coroutine or thread. That may cause confusions in another way. > So reviewers, please take my sincere apology on the hardness of reviewing > this patch: it covers a few modules at the same time, and with some risky > changes. > > BQL Analysis > ============ > > Firstly, when moving it over to the thread, it means the thread cannot take > BQL during the whole process of loading anymore, because otherwise it can > block main thread from using the BQL for all kinds of other concurrent > tasks (for example, processing QMP / HMP commands). > > Here the first question to ask is: what needs BQL during precopy load, and > what doesn't? >
I just noticed that the BQL held at process_incoming_migration_co is also responsible for stopping qmp_migrate_set_capabilities from being dispatched. Any point during incoming migration when BQL is unlocked we have a window where a capability could be changed. Same for parameters, for that matter. To make matters worse, the -incoming cmdline will trigger qmp_migrate_incoming->...->migration_transport_compatible early on, but until the channels finally connect and process_incoming_migration_co starts it's possible to just change a capability in an incompatible way and the transport will never be validated again. One example: -- >8 -- >From 99bd88aa0a8b6d4e7c52196f25d344a2800b3d89 Mon Sep 17 00:00:00 2001 From: Fabiano Rosas <[email protected]> Date: Thu, 8 Jan 2026 17:21:20 -0300 Subject: [PATCH] tmp --- tests/qtest/migration/precopy-tests.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c index aca7ed51ef..3f1a2870ee 100644 --- a/tests/qtest/migration/precopy-tests.c +++ b/tests/qtest/migration/precopy-tests.c @@ -158,6 +158,13 @@ static int new_rdma_link(char *buffer, bool ipv6) return -1; } +static void *migrate_rdma_set_caps(QTestState *from, QTestState *to) +{ + migrate_set_capability(to, "mapped-ram", true); + + return NULL; +} + static void __test_precopy_rdma_plain(MigrateCommon *args, bool ipv6) { char buffer[128] = {}; @@ -185,6 +192,7 @@ static void __test_precopy_rdma_plain(MigrateCommon *args, bool ipv6) args->listen_uri = uri; args->connect_uri = uri; + args->start_hook = migrate_rdma_set_caps; test_precopy_common(args); } -- 2.51.0
