On Sat, Jan 17, 2026 at 03:00:37PM +0100, Lukas Straub wrote: > On Wed, 22 Oct 2025 15:25:59 -0400 > Peter Xu <[email protected]> wrote: > > > This is v1, however not 10.2 material. The earliest I see fit would still > > be 11.0+ even if everything goes extremely smooth. > > > > Removal of RFC is only about that I'm more confident this should be able to > > land without breaking something too easily, as I smoked it slightly more > > cross-archs this time. AFAIU the best (and possibly only..) way to prove > > it solid is to merge it.. likely in the early phase of a dev cycle. > > > > The plan is we'll try to get to more device setups too soon, before it > > could land. > > > > Background > > ========== > > > > Nowadays, live migration heavily depends on threads. For example, most of > > the major features that will be used nowadays in live migration (multifd, > > postcopy, mapped-ram, vfio, etc.) all work with threads internally. > > > > But still, from time to time, we'll see some coroutines floating around the > > migration context. The major one is precopy's loadvm, which is internally > > a coroutine. It is still a critical path that any live migration depends > > on. > > > > A mixture of using both coroutines and threads is prone to issues. Some > > examples can refer to commit e65cec5e5d ("migration/ram: Yield periodically > > to the main loop") or commit 7afbdada7e ("migration/postcopy: ensure > > preempt channel is ready before loading states"). > > > > It was a coroutine since this work (thanks to Fabiano, the archeologist, > > digging the link): > > > > https://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01136.html > > > > [...] > > > > Tests > > ===== > > > > Default CI passes. > > > > RDMA unit tests pass as usual. I also tried out cancellation / failure > > tests over RDMA channels, making sure nothing is stuck. > > > > I also roughly measured how long it takes to run the whole 80+ migration > > qtest suite, and see no measurable difference before / after this series. > > > > I didn't test COLO, I wanted to but the doc example didn't work. > > > > Risks > > ===== > > > > This series has the risk of breaking things. I would be surprised if it > > didn't.. > > > > The current way of taking BQL during FULL section load may cause issues, it > > means when the IOs are unstable we could be waiting for IO (in the new > > migration incoming thread) with BQL held. This is low possibility, though, > > only happens when the network halts during flushing the device states. > > However still possible. One solution is to further breakdown the BQL > > critical sections to smaller sections, as mentioned in TODO. > > > > Anything more than welcomed: suggestions, questions, objections, tests.. > > > > TODO > > ==== > > > > - Finer grained BQL breakdown > > > > Peter Xu (13): > > io: Add qio_channel_wait_cond() helper > > migration: Properly wait on G_IO_IN when peeking messages > > migration/rdma: Fix wrong context in qio_channel_rdma_shutdown() > > migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread > > migration/rdma: Change io_create_watch() to return immediately > > migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED() > > migration: Pass in bql_held information from qemu_loadvm_state() > > migration: Thread-ify precopy vmstate load process > > migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel > > migration/postcopy: Remove workaround on wait preempt channel > > migration/ram: Remove workaround on ram yield during load > > migration: Allow blocking mode for incoming live migration > > migration/vfio: Drop BQL dependency for loadvm SWITCHOVER_START > > > > include/io/channel.h | 15 +++ > > include/migration/colo.h | 6 +- > > migration/migration.h | 109 +++++++++++++++++-- > > migration/savevm.h | 4 +- > > hw/vfio/migration-multifd.c | 3 - > > io/channel.c | 21 ++-- > > migration/channel.c | 7 +- > > migration/colo-stubs.c | 2 +- > > migration/colo.c | 26 ++--- > > migration/migration.c | 81 ++++++++------ > > migration/qemu-file.c | 6 +- > > migration/ram.c | 13 +-- > > migration/rdma.c | 204 ++++++++---------------------------- > > migration/savevm.c | 98 +++++++++-------- > > migration/trace-events | 4 +- > > 15 files changed, 291 insertions(+), 308 deletions(-) > > > > Works well in my COLO testing. Fro the whole series: > > Tested-by: Lukas Straub <[email protected]>
Thanks for the testing. Instead of applying it all over, the major change on COLO is patch 8, I'll move the tag over if no objections. -- Peter Xu
