On Sun, Jan 25, 2026 at 06:18:36PM +0100, Lukas Straub wrote: > On Wed, 21 Jan 2026 20:37:51 +0100 > Lukas Straub <[email protected]> wrote: > > > On Tue, 20 Jan 2026 12:23:08 -0500 > > Peter Xu <[email protected]> wrote: > > > > > On Sat, Jan 17, 2026 at 03:09:12PM +0100, Lukas Straub wrote: > > > > Add a COLO migration test for COLO migration and failover. > > > > > > > > COLO does not support q35 machine at this time. > > > > > > > > [...] > > > > > > > > +int test_colo_common(MigrateCommon *args, bool > > > > failover_during_checkpoint, > > > > + bool primary_failover) > > > > +{ > > > > + QTestState *from, *to; > > > > + void *data_hook = NULL; > > > > + > > > > + /* > > > > + * For the COLO test, both VMs will run in parallel. Thus both VMs > > > > want to > > > > + * open the image read/write at the same time. Using read-only=on > > > > is not > > > > + * possible here, because ide-hd does not support read-only > > > > backing image. > > > > + * > > > > + * So use -snapshot, where each qemu instance creates its own > > > > writable > > > > + * snapshot internally while leaving the real image read-only. > > > > + */ > > > > + args->start.opts_source = "-snapshot"; > > > > + args->start.opts_target = "-snapshot"; > > > > + > > > > + /* > > > > + * COLO migration code logs many errors when the migration socket > > > > + * is shut down, these are expected so we hide them here. > > > > + */ > > > > + args->start.hide_stderr = true; > > > > + > > > > + /* > > > > + * COLO currently does not work with Q35 machine > > > > + */ > > > > + args->start.force_pc_machine = true; > > > > + > > > > + args->start.oob = true; > > > > > > Just curious: is OOB required in COLO for some reason? I understand yank > > > you used below uses OOB, so the question is behind that, on what can be > > > blocked in main thread, and special in COLO. > > There is a lot that can hang: > The netfilters all run on the main loop and use blocking write. > fiter-mirror on the primary side mirrors packets to the secondary and > can hang. > filter-redirect on the secondary side redirects packets to primary's > colo-compare and can hang. > The nbd client on the primary side that is connected to the nbd server > on the secondary side can hang. Especially during vm_stop() which fluses > all inflight block io with BQL held.
None of them are used in this unit test, right? I agree if OOB is needed in production we should also enable it in the unit tests. Said that, would you please add a comment into the test case explaining this? E.g. what can fail in reality, and why we still test OOB (because we want to get as close to production COLO use case as possible). Thanks, -- Peter Xu
