On Sat, 17 Jan 2026 20:49:13 +0100
Lukas Straub <[email protected]> wrote:

> On Thu, 15 Jan 2026 18:38:51 -0500
> Peter Xu <[email protected]> wrote:
> 
> > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:  
> > > * Peter Xu ([email protected]) wrote:    
> > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:    
> > > > > Nack.
> > > > > 
> > > > > This code has users, as explained in my other email:
> > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> > > > >     
> > > > 
> > > > Please then rework that series and consider include the following (I
> > > > believe I pointed out a long time ago somewhere..):
> > > >     
> > >     
> > > > - Some form of justification of why multifd needs to be enabled for 
> > > > COLO.
> > > >   For example, in your cluster deployment, using multifd can improve XXX
> > > >   by YYY.  Please describe the use case and improvements.    
> > > 
> > > That one is pretty easy; since COLO is regularly taking snapshots, the 
> > > faster
> > > the snapshoting the less overhead there is.    
> > 
> > Thanks for chiming in, Dave.  I can explain why I want to request for some
> > numbers.
> > 
> > Firstly, numbers normally proves it's used in a real system.  It's at least
> > being used and seriously tested.
> > 
> > Secondly, per my very limited understanding to COLO... the two VMs in most
> > cases should be in-sync state already when both sides generate the same
> > network packets.
> > 
> > Another sync (where multifd can start to take effect) is only needed when
> > there're packets misalignments, but IIUC it should be rare.  I don't know
> > how rare it is, it would be good if Lukas could introduce some of those
> > numbers in his deployment to help us understand COLO better if we'll need
> > to keep it.  
> 
> It really depends on the workload and if you want to tune for
> throughput or latency.
> 
> You need to do a checkpoint eventually and the more time passes between
> checkpoints the more dirty memory you have to transfer during the
> checkpoint.
> 
> Also keep in mind that the guest is stopped during checkpoints. Because
> even if we continue running the guest, we can not release the mismatched
> packets since that would expose a state of the guest to the outside
> world that is not yet replicated to the secondary.
> 
> So the migration performance is actually the most important part in
> COLO to keep the checkpoints as short as possible.
> 
> I have quite a few more performance and cleanup patches on my hands,
> for example to transfer dirty memory between checkpoints.
> 
> > 
> > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > should be when heartbeat gets lost; that normally should happen when two
> > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > there's no migration happening, only the src recording what has changed.
> > Hence I think some number with description of the measurements may help us
> > understand how important multifd is to COLO.
> > 
> > Supporting multifd will cause new COLO functions to inject into core
> > migration code paths (even if not much..). I want to make sure such (new)
> > complexity is justified. I also want to avoid introducing a feature only
> > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > it'll be useful".  
> 
> What COLO needs from migration at the low level:
> 
> Primary/Outgoing side:
> 
> Not much actually, we just need a way to incrementally send the
> dirtied memory and the full device state.
> Also, we ensure that migration never actually finishes since we will
> never do a switchover. For example we never set
> RAMState::last_stage with COLO.
> 
> Secondary/Incoming side:
> 
> colo cache:
> Since the secondary always needs to be ready to take over (even during
> checkpointing), we can not write the received ram pages directly to
> the guest ram to prevent having half of the old and half of the new
> contents.
> So we redirect the received ram pages to the colo cache. This is
> basically a mirror of the primary side ram.
> It also simplifies the primary side since from it's point of view it's
> just a normal migration target. So primary side doesn't have to care
> about dirtied pages on the secondary for example.
> 
> Dirty Bitmap:
> With COLO we also need a dirty bitmap on the incoming side to track
> 1. pages dirtied by the secondary guest
> 2. pages dirtied by the primary guest (incoming ram pages)
> In the last step during the checkpointing, this bitmap is then used
> to overwrite the guest ram with the colo cache so the secondary guest
> is in sync with the primary guest.
> 
> All this individually is very little code as you can see from my
> multifd patch. Just something to keep in mind I guess.

PS:
Also when the primary or secondary dies, from qemu's point of view the
migration socket(s) starts blocking. So the migration code needs to be
able to recover from such a hanging/blocking socket. This works fine
right now with yank.

> 
> 
> At the high level we have the COLO framework outgoing and incoming
> threads which just tell the migration code to:
> Send all ram pages (qemu_savevm_live_state()) on the outgoing side
> paired with a qemu_loadvm_state_main on the incoming side.
> Send the device state (qemu_save_device_state()) paired with writing
> that stream to a buffer on the incoming side.
> And finally flusing the colo cache and loading the device state on the
> incoming side.
> 
> And of course we coordinate with the colo block replication and
> colo-compare.
> 
> Best regards,
> Lukas Straub
> 
> > 
> > After these days, I found removing code is sometimes harder than writting
> > new..
> > 
> > Thanks,
> >   
> > > 
> > > Lukas: Given COLO has a bunch of different features (i.e. the block
> > > replication, the clever network comparison etc) do you know which ones
> > > are used in the setups you are aware of?
> > > 
> > > I'd guess the tricky part of a test would be the network side; I'm
> > > not too sure how you'd set that in a test.    
> >   
> 

Attachment: pgpvZU08_HLrb.pgp
Description: OpenPGP digital signature

Reply via email to