On Mon, May 11, 2026 at 07:03:08PM +0100, Peter Maydell wrote: > On Mon, 11 May 2026 at 18:47, Peter Xu <[email protected]> wrote: > > > > On Mon, May 11, 2026 at 04:47:22PM +0100, Peter Maydell wrote: > > > On Mon, 11 May 2026 at 16:20, Peter Xu <[email protected]> wrote: > > > > > > > > Commit dd4fe8844b changed the reporting of expected downtime behavior, > > > > so > > > > that the value will be calculated on-demand. One side effect on the > > > > change > > > > is QEMU will allow the calculation to happen anytime even if there's no > > > > transfer happening for a short while. > > > > > > > > PeterM reported an ubsan report from clang when running migration-test > > > > with > > > > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test > > > > concurrently so some of the src QEMU may not get chance to push any > > > > data, > > > > causing mbps to be 0: > > > > > > > > ../migration/migration.c:1051:12: runtime error: -nan is outside the > > > > range of representable values of type 'long' > > > > > > > > Fix it by properly handle both Inf and Nan. One note is we can't use > > > > ">"/">=" check here otherwise we cannot cover Nan. > > > > > > > > Link: > > > > https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=w-nsh...@mail.gmail.com > > > > Reported-by: Peter Maydell <[email protected]> > > > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand") > > > > Signed-off-by: Peter Xu <[email protected]> > > > > --- > > > > migration/migration.c | 18 +++++++++++++++++- > > > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > index b6f78eb3ac..e4103cd3f0 100644 > > > > --- a/migration/migration.c > > > > +++ b/migration/migration.c > > > > @@ -1044,12 +1044,28 @@ static bool > > > > migrate_show_downtime(MigrationState *s) > > > > /* Return expected downtime (unit: milliseconds) */ > > > > int64_t migration_downtime_calc_expected(MigrationState *s) > > > > { > > > > + double expected_ms; > > > > + > > > > if (mig_stats.dirty_sync_count <= 1) { > > > > return migrate_downtime_limit(); > > > > } > > > > > > > > - return mig_stats.dirty_bytes_last_sync / > > > > + expected_ms = mig_stats.dirty_bytes_last_sync / > > > > migration_get_switchover_bw(s) * 1000; > > > > + > > > > + /* > > > > + * This "<" check covers two cases where we want to fallback to > > > > + * INT64_MAX, the 1st case is obvious, but the 2nd is not: > > > > + * > > > > + * (1) when expected_ms is Inf, or anything too big for int64_t > > > > + * (2) when expected_ms is Nan (division by zero), evaluation of > > > > this > > > > > > This should say "zero divided by zero" -- general division by > > > zero gives Inf, and it's only 0 / 0 that runs into NaN. > > > > > > > + * if clause will be FALSE > > > > + */ > > > > + if (expected_ms < (double)INT64_MAX) { > > > > > > This works, but maybe we should write it out > > > if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) { > > > > I agree using isnan() is better than comment. Though code in the patch for > > the next line here is: > > > > + return (int64_t) expected_ms; > > Oops, yes, I got the sense of the condition wrong. > > > Do you think below should work? > > > > expected_ms = ...; > > > > /* For isnan() (0/0) case, we can return anything; return MAX too */ > > if (isnan(expected_ms) || expected_ms >= (double)INT64_MAX) { > > return INT64_MAX; > > } > > Yes, this will work. But I think rather than "return anything" > we ought to say why what we're returning is a sensible value > for the use case we have. How about:
Just to mention, here when I mentioned "anything", what actually in my mind is the previous valid value we reported, like before the change of commit dd4fe8844b5, here we used to have a cache value and only update if we transferred more than 10k bytes (which itself is a magic value). But I'm not sure if we need to keep that behavior either.. > > /* > * If we haven't been able to transfer any data, the result here > * could be NaN (for 0 / 0) or infinity (something else / 0). Theoretically, we can also come to affinity if we sent something small but the total dirty data is rediculously large, but yeah, I'm OK with this wording; even if it may not be accurate, it's clear enough to me as a comment to help reading. > * Return INT64_MAX as our best approximation to "this will > * take forever to complete". If the problem is transient > * (e.g. we just haven't started to transfer yet) we'll > * recalculate to a more accurate figure later. > */ > > ? I'll use the comment suggested, thanks. -- Peter Xu
