On Sat, Jan 17, 2026 at 03:00:37PM +0100, Lukas Straub wrote:
> On Wed, 22 Oct 2025 15:25:59 -0400
> Peter Xu <[email protected]> wrote:
> 
> > This is v1, however not 10.2 material.  The earliest I see fit would still
> > be 11.0+ even if everything goes extremely smooth.
> > 
> > Removal of RFC is only about that I'm more confident this should be able to
> > land without breaking something too easily, as I smoked it slightly more
> > cross-archs this time.  AFAIU the best (and possibly only..) way to prove
> > it solid is to merge it.. likely in the early phase of a dev cycle.
> > 
> > The plan is we'll try to get to more device setups too soon, before it
> > could land.
> > 
> > Background
> > ==========
> > 
> > Nowadays, live migration heavily depends on threads. For example, most of
> > the major features that will be used nowadays in live migration (multifd,
> > postcopy, mapped-ram, vfio, etc.) all work with threads internally.
> > 
> > But still, from time to time, we'll see some coroutines floating around the
> > migration context.  The major one is precopy's loadvm, which is internally
> > a coroutine.  It is still a critical path that any live migration depends 
> > on.
> > 
> > A mixture of using both coroutines and threads is prone to issues.  Some
> > examples can refer to commit e65cec5e5d ("migration/ram: Yield periodically
> > to the main loop") or commit 7afbdada7e ("migration/postcopy: ensure
> > preempt channel is ready before loading states").
> > 
> > It was a coroutine since this work (thanks to Fabiano, the archeologist,
> > digging the link):
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01136.html
> > 
> > [...]
> >
> > Tests
> > =====
> > 
> > Default CI passes.
> > 
> > RDMA unit tests pass as usual. I also tried out cancellation / failure
> > tests over RDMA channels, making sure nothing is stuck.
> > 
> > I also roughly measured how long it takes to run the whole 80+ migration
> > qtest suite, and see no measurable difference before / after this series.
> > 
> > I didn't test COLO, I wanted to but the doc example didn't work.
> > 
> > Risks
> > =====
> > 
> > This series has the risk of breaking things.  I would be surprised if it
> > didn't..
> > 
> > The current way of taking BQL during FULL section load may cause issues, it
> > means when the IOs are unstable we could be waiting for IO (in the new
> > migration incoming thread) with BQL held.  This is low possibility, though,
> > only happens when the network halts during flushing the device states.
> > However still possible.  One solution is to further breakdown the BQL
> > critical sections to smaller sections, as mentioned in TODO.
> > 
> > Anything more than welcomed: suggestions, questions, objections, tests..
> > 
> > TODO
> > ====
> > 
> > - Finer grained BQL breakdown
> > 
> > Peter Xu (13):
> >   io: Add qio_channel_wait_cond() helper
> >   migration: Properly wait on G_IO_IN when peeking messages
> >   migration/rdma: Fix wrong context in qio_channel_rdma_shutdown()
> >   migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread
> >   migration/rdma: Change io_create_watch() to return immediately
> >   migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED()
> >   migration: Pass in bql_held information from qemu_loadvm_state()
> >   migration: Thread-ify precopy vmstate load process
> >   migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel
> >   migration/postcopy: Remove workaround on wait preempt channel
> >   migration/ram: Remove workaround on ram yield during load
> >   migration: Allow blocking mode for incoming live migration
> >   migration/vfio: Drop BQL dependency for loadvm SWITCHOVER_START
> > 
> >  include/io/channel.h        |  15 +++
> >  include/migration/colo.h    |   6 +-
> >  migration/migration.h       | 109 +++++++++++++++++--
> >  migration/savevm.h          |   4 +-
> >  hw/vfio/migration-multifd.c |   3 -
> >  io/channel.c                |  21 ++--
> >  migration/channel.c         |   7 +-
> >  migration/colo-stubs.c      |   2 +-
> >  migration/colo.c            |  26 ++---
> >  migration/migration.c       |  81 ++++++++------
> >  migration/qemu-file.c       |   6 +-
> >  migration/ram.c             |  13 +--
> >  migration/rdma.c            | 204 ++++++++----------------------------
> >  migration/savevm.c          |  98 +++++++++--------
> >  migration/trace-events      |   4 +-
> >  15 files changed, 291 insertions(+), 308 deletions(-)
> > 
> 
> Works well in my COLO testing. Fro the whole series:
> 
> Tested-by: Lukas Straub <[email protected]>

Thanks for the testing.

Instead of applying it all over, the major change on COLO is patch 8, I'll
move the tag over if no objections.

-- 
Peter Xu


Reply via email to