On 2025-05-27 17:58, Peter Xu wrote: > There're a few things off here in that logic, rewrite it. When at it, add > rich comment to explain each of the decisions. > > Since this is very sensitive path for migration, below are the list of > things changed with their reasonings. > > (1) Exact pending size is only needed for precopy not postcopy > > Fundamentally it's because "exact" version only does one more deep > sync to fetch the pending results, while in postcopy's case it's > never going to sync anything more than estimate as the VM on source > is stopped. > > (2) Do _not_ rely on threshold_size anymore to decide whether postcopy > should complete > > threshold_size was calculated from the expected downtime and > bandwidth only during precopy as an efficient way to decide when to > switchover. It's not sensible to rely on threshold_size in postcopy. > > For precopy, if switchover is decided, the migration will complete > soon. It's not true for postcopy. Logically speaking, postcopy > should only complete the migration if all pending data is flushed. > > Here it used to work because save_complete() used to implicitly > contain save_live_iterate() when there's pending size. > > Even if that looks benign, having RAMs to be migrated in postcopy's > save_complete() has other bad side effects: > > (a) Since save_complete() needs to be run once at a time, it means > when moving RAM there's no way moving other things (rather than > round-robin iterating the vmstate handlers like what we do with > ITERABLE phase). Not an immediate concern, but it may stop working > in the future when there're more than one iterables (e.g. vfio > postcopy). > > (b) postcopy recovery, unfortunately, only works during ITERABLE > phase. IOW, if src QEMU moves RAM during postcopy's save_complete() > and network failed, then it'll crash both QEMUs... OTOH if it failed > during iteration it'll still be recoverable. IOW, this change should > further reduce the window QEMU split brain and crash in extreme cases. > > If we enable the ram_save_complete() tracepoints, we'll see this > before this patch: > > 1267959@1748381938.294066:ram_save_complete dirty=9627, done=0 > 1267959@1748381938.308884:ram_save_complete dirty=0, done=1 > > It means in this migration there're 9627 pages migrated at complete() > of postcopy phase. > > After this change, all the postcopy RAM should be migrated in iterable > phase, rather than save_complete(): > > 1267959@1748381938.294066:ram_save_complete dirty=0, done=0 > 1267959@1748381938.308884:ram_save_complete dirty=0, done=1 > > (3) Adjust when to decide to switch to postcopy > > This shouldn't be super important, the movement makes sure there's > only one in_postcopy check, then we are clear on what we do with the > two completely differnt use cases (precopy v.s. postcopy). > > (4) Trivial touch up on threshold_size comparision > > Which changes: > > "(!pending_size || pending_size < s->threshold_size)" > > into: > > "(pending_size <= s->threshold_size)" > > Signed-off-by: Peter Xu <pet...@redhat.com> > --- > migration/migration.c | 56 +++++++++++++++++++++++++++++++------------ > 1 file changed, 41 insertions(+), 15 deletions(-)
Reviewed-by: Juraj Marcin <jmar...@redhat.com>