On Wed, 22 Oct 2025 15:25:59 -0400 Peter Xu <[email protected]> wrote:
> This is v1, however not 10.2 material. The earliest I see fit would still
> be 11.0+ even if everything goes extremely smooth.
>
> Removal of RFC is only about that I'm more confident this should be able to
> land without breaking something too easily, as I smoked it slightly more
> cross-archs this time. AFAIU the best (and possibly only..) way to prove
> it solid is to merge it.. likely in the early phase of a dev cycle.
>
> The plan is we'll try to get to more device setups too soon, before it
> could land.
>
> Background
> ==========
>
> Nowadays, live migration heavily depends on threads. For example, most of
> the major features that will be used nowadays in live migration (multifd,
> postcopy, mapped-ram, vfio, etc.) all work with threads internally.
>
> But still, from time to time, we'll see some coroutines floating around the
> migration context. The major one is precopy's loadvm, which is internally
> a coroutine. It is still a critical path that any live migration depends on.
>
> A mixture of using both coroutines and threads is prone to issues. Some
> examples can refer to commit e65cec5e5d ("migration/ram: Yield periodically
> to the main loop") or commit 7afbdada7e ("migration/postcopy: ensure
> preempt channel is ready before loading states").
>
> It was a coroutine since this work (thanks to Fabiano, the archeologist,
> digging the link):
>
> https://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01136.html
>
> [...]
>
> Tests
> =====
>
> Default CI passes.
>
> RDMA unit tests pass as usual. I also tried out cancellation / failure
> tests over RDMA channels, making sure nothing is stuck.
>
> I also roughly measured how long it takes to run the whole 80+ migration
> qtest suite, and see no measurable difference before / after this series.
>
> I didn't test COLO, I wanted to but the doc example didn't work.
>
> Risks
> =====
>
> This series has the risk of breaking things. I would be surprised if it
> didn't..
>
> The current way of taking BQL during FULL section load may cause issues, it
> means when the IOs are unstable we could be waiting for IO (in the new
> migration incoming thread) with BQL held. This is low possibility, though,
> only happens when the network halts during flushing the device states.
> However still possible. One solution is to further breakdown the BQL
> critical sections to smaller sections, as mentioned in TODO.
>
> Anything more than welcomed: suggestions, questions, objections, tests..
>
> TODO
> ====
>
> - Finer grained BQL breakdown
>
> Peter Xu (13):
> io: Add qio_channel_wait_cond() helper
> migration: Properly wait on G_IO_IN when peeking messages
> migration/rdma: Fix wrong context in qio_channel_rdma_shutdown()
> migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread
> migration/rdma: Change io_create_watch() to return immediately
> migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED()
> migration: Pass in bql_held information from qemu_loadvm_state()
> migration: Thread-ify precopy vmstate load process
> migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel
> migration/postcopy: Remove workaround on wait preempt channel
> migration/ram: Remove workaround on ram yield during load
> migration: Allow blocking mode for incoming live migration
> migration/vfio: Drop BQL dependency for loadvm SWITCHOVER_START
>
> include/io/channel.h | 15 +++
> include/migration/colo.h | 6 +-
> migration/migration.h | 109 +++++++++++++++++--
> migration/savevm.h | 4 +-
> hw/vfio/migration-multifd.c | 3 -
> io/channel.c | 21 ++--
> migration/channel.c | 7 +-
> migration/colo-stubs.c | 2 +-
> migration/colo.c | 26 ++---
> migration/migration.c | 81 ++++++++------
> migration/qemu-file.c | 6 +-
> migration/ram.c | 13 +--
> migration/rdma.c | 204 ++++++++----------------------------
> migration/savevm.c | 98 +++++++++--------
> migration/trace-events | 4 +-
> 15 files changed, 291 insertions(+), 308 deletions(-)
>
Works well in my COLO testing. Fro the whole series:
Tested-by: Lukas Straub <[email protected]>
pgpQFBW5jkhgX.pgp
Description: OpenPGP digital signature
