On Mon, 11 May 2026 at 18:47, Peter Xu <[email protected]> wrote:
>
> On Mon, May 11, 2026 at 04:47:22PM +0100, Peter Maydell wrote:
> > On Mon, 11 May 2026 at 16:20, Peter Xu <[email protected]> wrote:
> > >
> > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > > that the value will be calculated on-demand. One side effect on the
> > > change
> > > is QEMU will allow the calculation to happen anytime even if there's no
> > > transfer happening for a short while.
> > >
> > > PeterM reported an ubsan report from clang when running migration-test
> > > with
> > > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> > > concurrently so some of the src QEMU may not get chance to push any data,
> > > causing mbps to be 0:
> > >
> > > ../migration/migration.c:1051:12: runtime error: -nan is outside the
> > > range of representable values of type 'long'
> > >
> > > Fix it by properly handle both Inf and Nan. One note is we can't use
> > > ">"/">=" check here otherwise we cannot cover Nan.
> > >
> > > Link:
> > > https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=w-nsh...@mail.gmail.com
> > > Reported-by: Peter Maydell <[email protected]>
> > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > > Signed-off-by: Peter Xu <[email protected]>
> > > ---
> > > migration/migration.c | 18 +++++++++++++++++-
> > > 1 file changed, 17 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index b6f78eb3ac..e4103cd3f0 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState
> > > *s)
> > > /* Return expected downtime (unit: milliseconds) */
> > > int64_t migration_downtime_calc_expected(MigrationState *s)
> > > {
> > > + double expected_ms;
> > > +
> > > if (mig_stats.dirty_sync_count <= 1) {
> > > return migrate_downtime_limit();
> > > }
> > >
> > > - return mig_stats.dirty_bytes_last_sync /
> > > + expected_ms = mig_stats.dirty_bytes_last_sync /
> > > migration_get_switchover_bw(s) * 1000;
> > > +
> > > + /*
> > > + * This "<" check covers two cases where we want to fallback to
> > > + * INT64_MAX, the 1st case is obvious, but the 2nd is not:
> > > + *
> > > + * (1) when expected_ms is Inf, or anything too big for int64_t
> > > + * (2) when expected_ms is Nan (division by zero), evaluation of this
> >
> > This should say "zero divided by zero" -- general division by
> > zero gives Inf, and it's only 0 / 0 that runs into NaN.
> >
> > > + * if clause will be FALSE
> > > + */
> > > + if (expected_ms < (double)INT64_MAX) {
> >
> > This works, but maybe we should write it out
> > if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) {
>
> I agree using isnan() is better than comment. Though code in the patch for
> the next line here is:
>
> + return (int64_t) expected_ms;
Oops, yes, I got the sense of the condition wrong.
> Do you think below should work?
>
> expected_ms = ...;
>
> /* For isnan() (0/0) case, we can return anything; return MAX too */
> if (isnan(expected_ms) || expected_ms >= (double)INT64_MAX) {
> return INT64_MAX;
> }
Yes, this will work. But I think rather than "return anything"
we ought to say why what we're returning is a sensible value
for the use case we have. How about:
/*
* If we haven't been able to transfer any data, the result here
* could be NaN (for 0 / 0) or infinity (something else / 0).
* Return INT64_MAX as our best approximation to "this will
* take forever to complete". If the problem is transient
* (e.g. we just haven't started to transfer yet) we'll
* recalculate to a more accurate figure later.
*/
?
-- PMM