On Sat, 17 Jan 2026 20:49:13 +0100 Lukas Straub <[email protected]> wrote:
> On Thu, 15 Jan 2026 18:38:51 -0500 > Peter Xu <[email protected]> wrote: > > > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote: > > > * Peter Xu ([email protected]) wrote: > > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote: > > > > > Nack. > > > > > > > > > > This code has users, as explained in my other email: > > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464 > > > > > > > > > > > > > Please then rework that series and consider include the following (I > > > > believe I pointed out a long time ago somewhere..): > > > > > > > > > > > - Some form of justification of why multifd needs to be enabled for > > > > COLO. > > > > For example, in your cluster deployment, using multifd can improve XXX > > > > by YYY. Please describe the use case and improvements. > > > > > > That one is pretty easy; since COLO is regularly taking snapshots, the > > > faster > > > the snapshoting the less overhead there is. > > > > Thanks for chiming in, Dave. I can explain why I want to request for some > > numbers. > > > > Firstly, numbers normally proves it's used in a real system. It's at least > > being used and seriously tested. > > > > Secondly, per my very limited understanding to COLO... the two VMs in most > > cases should be in-sync state already when both sides generate the same > > network packets. > > > > Another sync (where multifd can start to take effect) is only needed when > > there're packets misalignments, but IIUC it should be rare. I don't know > > how rare it is, it would be good if Lukas could introduce some of those > > numbers in his deployment to help us understand COLO better if we'll need > > to keep it. > > It really depends on the workload and if you want to tune for > throughput or latency. > > You need to do a checkpoint eventually and the more time passes between > checkpoints the more dirty memory you have to transfer during the > checkpoint. > > Also keep in mind that the guest is stopped during checkpoints. Because > even if we continue running the guest, we can not release the mismatched > packets since that would expose a state of the guest to the outside > world that is not yet replicated to the secondary. > > So the migration performance is actually the most important part in > COLO to keep the checkpoints as short as possible. > > I have quite a few more performance and cleanup patches on my hands, > for example to transfer dirty memory between checkpoints. > > > > > IIUC, the critical path of COLO shouldn't be migration on its own? It > > should be when heartbeat gets lost; that normally should happen when two > > VMs are in sync. In this path, I don't see how multifd helps.. because > > there's no migration happening, only the src recording what has changed. > > Hence I think some number with description of the measurements may help us > > understand how important multifd is to COLO. > > > > Supporting multifd will cause new COLO functions to inject into core > > migration code paths (even if not much..). I want to make sure such (new) > > complexity is justified. I also want to avoid introducing a feature only > > because "we have XXX, then let's support XXX in COLO too, maybe some day > > it'll be useful". > > What COLO needs from migration at the low level: > > Primary/Outgoing side: > > Not much actually, we just need a way to incrementally send the > dirtied memory and the full device state. > Also, we ensure that migration never actually finishes since we will > never do a switchover. For example we never set > RAMState::last_stage with COLO. > > Secondary/Incoming side: > > colo cache: > Since the secondary always needs to be ready to take over (even during > checkpointing), we can not write the received ram pages directly to > the guest ram to prevent having half of the old and half of the new > contents. > So we redirect the received ram pages to the colo cache. This is > basically a mirror of the primary side ram. > It also simplifies the primary side since from it's point of view it's > just a normal migration target. So primary side doesn't have to care > about dirtied pages on the secondary for example. > > Dirty Bitmap: > With COLO we also need a dirty bitmap on the incoming side to track > 1. pages dirtied by the secondary guest > 2. pages dirtied by the primary guest (incoming ram pages) > In the last step during the checkpointing, this bitmap is then used > to overwrite the guest ram with the colo cache so the secondary guest > is in sync with the primary guest. > > All this individually is very little code as you can see from my > multifd patch. Just something to keep in mind I guess. PS: Also when the primary or secondary dies, from qemu's point of view the migration socket(s) starts blocking. So the migration code needs to be able to recover from such a hanging/blocking socket. This works fine right now with yank. > > > At the high level we have the COLO framework outgoing and incoming > threads which just tell the migration code to: > Send all ram pages (qemu_savevm_live_state()) on the outgoing side > paired with a qemu_loadvm_state_main on the incoming side. > Send the device state (qemu_save_device_state()) paired with writing > that stream to a buffer on the incoming side. > And finally flusing the colo cache and loading the device state on the > incoming side. > > And of course we coordinate with the colo block replication and > colo-compare. > > Best regards, > Lukas Straub > > > > > After these days, I found removing code is sometimes harder than writting > > new.. > > > > Thanks, > > > > > > > > Lukas: Given COLO has a bunch of different features (i.e. the block > > > replication, the clever network comparison etc) do you know which ones > > > are used in the setups you are aware of? > > > > > > I'd guess the tricky part of a test would be the network side; I'm > > > not too sure how you'd set that in a test. > > >
pgpvZU08_HLrb.pgp
Description: OpenPGP digital signature
