* Peter Xu ([email protected]) wrote: > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote: > > * Peter Xu ([email protected]) wrote: > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote: > > > > Nack. > > > > > > > > This code has users, as explained in my other email: > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464 > > > > > > Please then rework that series and consider include the following (I > > > believe I pointed out a long time ago somewhere..): > > > > > > > > - Some form of justification of why multifd needs to be enabled for COLO. > > > For example, in your cluster deployment, using multifd can improve XXX > > > by YYY. Please describe the use case and improvements. > > > > That one is pretty easy; since COLO is regularly taking snapshots, the > > faster > > the snapshoting the less overhead there is. > > Thanks for chiming in, Dave. I can explain why I want to request for some > numbers. > > Firstly, numbers normally proves it's used in a real system. It's at least > being used and seriously tested.
Fair. > Secondly, per my very limited understanding to COLO... the two VMs in most > cases should be in-sync state already when both sides generate the same > network packets. (It's about a decade since I did any serious Colo, so I'll try and remember) > Another sync (where multifd can start to take effect) is only needed when > there're packets misalignments, but IIUC it should be rare. I don't know > how rare it is, it would be good if Lukas could introduce some of those > numbers in his deployment to help us understand COLO better if we'll need > to keep it. In reality misalignments are actually pretty common - although it's very workload dependent. Any randomness in the order of execution in a multi-threaded guest for example, or when a timer arrives etc can change the packet generation. The migration time then becomes a latency issue before you can transmit the mismatched packet once it's detected. I think You still need to send a regular stream of snapshots even without having *yet* received a packet difference. Now, I'm trying to remember the reasoning; for a start if you leave the difference too long the migration snapshot gets larger (which I think needs to be stored on RAM on the dest?) and also you increase the chances of them getting a packet difference from randomness increases. I seem to remember there were clever schemes to get the optimal snapshot scheme. > IIUC, the critical path of COLO shouldn't be migration on its own? It > should be when heartbeat gets lost; that normally should happen when two > VMs are in sync. In this path, I don't see how multifd helps.. because > there's no migration happening, only the src recording what has changed. > Hence I think some number with description of the measurements may help us > understand how important multifd is to COLO. There's more than one critical path: a) Time to recovery when one host fails b) Overhead when both hosts are happy. > Supporting multifd will cause new COLO functions to inject into core > migration code paths (even if not much..). I want to make sure such (new) > complexity is justified. I also want to avoid introducing a feature only > because "we have XXX, then let's support XXX in COLO too, maybe some day > it'll be useful". I can't remember where the COLO code got into the main migration paths; is that the reception side storing the received differences somewhere else? > After these days, I found removing code is sometimes harder than writting > new.. Haha yes. Dave > Thanks, > > > > > Lukas: Given COLO has a bunch of different features (i.e. the block > > replication, the clever network comparison etc) do you know which ones > > are used in the setups you are aware of? > > > > I'd guess the tricky part of a test would be the network side; I'm > > not too sure how you'd set that in a test. > > -- > Peter Xu > -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ dave @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/
