On 2025-05-27 17:58, Peter Xu wrote:
> There're a few things off here in that logic, rewrite it.  When at it, add
> rich comment to explain each of the decisions.
> 
> Since this is very sensitive path for migration, below are the list of
> things changed with their reasonings.
> 
>   (1) Exact pending size is only needed for precopy not postcopy
> 
>       Fundamentally it's because "exact" version only does one more deep
>       sync to fetch the pending results, while in postcopy's case it's
>       never going to sync anything more than estimate as the VM on source
>       is stopped.
> 
>   (2) Do _not_ rely on threshold_size anymore to decide whether postcopy
>       should complete
> 
>       threshold_size was calculated from the expected downtime and
>       bandwidth only during precopy as an efficient way to decide when to
>       switchover.  It's not sensible to rely on threshold_size in postcopy.
> 
>       For precopy, if switchover is decided, the migration will complete
>       soon.  It's not true for postcopy.  Logically speaking, postcopy
>       should only complete the migration if all pending data is flushed.
> 
>       Here it used to work because save_complete() used to implicitly
>       contain save_live_iterate() when there's pending size.
> 
>       Even if that looks benign, having RAMs to be migrated in postcopy's
>       save_complete() has other bad side effects:
> 
>       (a) Since save_complete() needs to be run once at a time, it means
>       when moving RAM there's no way moving other things (rather than
>       round-robin iterating the vmstate handlers like what we do with
>       ITERABLE phase).  Not an immediate concern, but it may stop working
>       in the future when there're more than one iterables (e.g. vfio
>       postcopy).
> 
>       (b) postcopy recovery, unfortunately, only works during ITERABLE
>       phase. IOW, if src QEMU moves RAM during postcopy's save_complete()
>       and network failed, then it'll crash both QEMUs... OTOH if it failed
>       during iteration it'll still be recoverable.  IOW, this change should
>       further reduce the window QEMU split brain and crash in extreme cases.
> 
>       If we enable the ram_save_complete() tracepoints, we'll see this
>       before this patch:
> 
>       1267959@1748381938.294066:ram_save_complete dirty=9627, done=0
>       1267959@1748381938.308884:ram_save_complete dirty=0, done=1
> 
>       It means in this migration there're 9627 pages migrated at complete()
>       of postcopy phase.
> 
>       After this change, all the postcopy RAM should be migrated in iterable
>       phase, rather than save_complete():
> 
>       1267959@1748381938.294066:ram_save_complete dirty=0, done=0
>       1267959@1748381938.308884:ram_save_complete dirty=0, done=1
> 
>   (3) Adjust when to decide to switch to postcopy
> 
>       This shouldn't be super important, the movement makes sure there's
>       only one in_postcopy check, then we are clear on what we do with the
>       two completely differnt use cases (precopy v.s. postcopy).
> 
>   (4) Trivial touch up on threshold_size comparision
> 
>   Which changes:
> 
>   "(!pending_size || pending_size < s->threshold_size)"
> 
>   into:
> 
>   "(pending_size <= s->threshold_size)"
> 
> Signed-off-by: Peter Xu <pet...@redhat.com>
> ---
>  migration/migration.c | 56 +++++++++++++++++++++++++++++++------------
>  1 file changed, 41 insertions(+), 15 deletions(-)

Reviewed-by: Juraj Marcin <jmar...@redhat.com>


Reply via email to