On Tue, Oct 01, 2024 at 10:02:53AM +0800, Yong Huang wrote:
> On Tue, Oct 1, 2024 at 4:41 AM Peter Xu <pet...@redhat.com> wrote:
> 
> > On Mon, Sep 30, 2024 at 01:14:26AM +0800, yong.hu...@smartx.com wrote:
> > > From: Hyman Huang <yong.hu...@smartx.com>
> > >
> > > When VM is configured with huge memory, the current throttle logic
> > > doesn't look like to scale, because migration_trigger_throttle()
> > > is only called for each iteration, so it won't be invoked for a long
> > > time if one iteration can take a long time.
> > >
> > > The background dirty sync aim to fix the above issue by synchronizing
> > > the ramblock from remote dirty bitmap and, when necessary, triggering
> > > the CPU throttle multiple times during a long iteration.
> > >
> > > This is a trade-off between synchronization overhead and CPU throttle
> > > impact.
> > >
> > > Signed-off-by: Hyman Huang <yong.hu...@smartx.com>
> > > ---
> > >  include/migration/misc.h     |  3 ++
> > >  migration/migration.c        | 11 +++++++
> > >  migration/ram.c              | 64 ++++++++++++++++++++++++++++++++++++
> > >  migration/ram.h              |  3 ++
> > >  migration/trace-events       |  1 +
> > >  system/cpu-timers.c          |  2 ++
> > >  tests/qtest/migration-test.c | 29 ++++++++++++++++
> > >  7 files changed, 113 insertions(+)
> > >
> > > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > > index bfadc5613b..67c00d98f5 100644
> > > --- a/include/migration/misc.h
> > > +++ b/include/migration/misc.h
> > > @@ -111,4 +111,7 @@ bool migration_in_bg_snapshot(void);
> > >  /* migration/block-dirty-bitmap.c */
> > >  void dirty_bitmap_mig_init(void);
> > >
> > > +/* migration/ram.c */
> > > +void bg_ram_dirty_sync_init(void);
> >
> > IMO it's better we don't add this logic to ram.c as I mentioned.  It's
> > closely relevant to auto converge so I think cpu-throttle.c is more
> > suitable.
> 
> 
> > > +
> > >  #endif
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 3dea06d577..224b5dfb4f 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -3285,6 +3285,9 @@ static void
> > migration_iteration_finish(MigrationState *s)
> > >  {
> > >      /* If we enabled cpu throttling for auto-converge, turn it off. */
> > >      cpu_throttle_stop();
> > > +    if (migrate_auto_converge()) {
> > > +        bg_ram_dirty_sync_timer_enable(false);
> > > +    }
> >
> > Please avoid adding this code.  When it's part of auto-converge,
> > cpu_throttle_stop() should already guarantee that timer disabled
> > altogether.
> 
> 
> > >
> > >      bql_lock();
> > >      switch (s->state) {
> > > @@ -3526,6 +3529,14 @@ static void *migration_thread(void *opaque)
> > >
> > >      trace_migration_thread_setup_complete();
> > >
> > > +    /*
> > > +     * Tick the background ramblock dirty sync timer after setup
> > > +     * phase.
> > > +     */
> > > +    if (migrate_auto_converge()) {
> > > +        bg_ram_dirty_sync_timer_enable(true);
> > > +    }
> >
> > Might be good to still stick the enablement with auto-converge; the latter
> > was done in migration_trigger_throttle().  Maybe also enable the timer only
> > there?
> >
> 
> Ok.
> 
> 
> > > +
> > >      while (migration_is_active()) {
> > >          if (urgent || !migration_rate_exceeded(s->to_dst_file)) {
> > >              MigIterateState iter_state = migration_iteration_run(s);
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 67ca3d5d51..995bae1ac9 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -110,6 +110,12 @@
> > >   */
> > >  #define MAPPED_RAM_LOAD_BUF_SIZE 0x100000
> > >
> > > +/* Background ramblock dirty sync trigger every five seconds */
> > > +#define BG_RAM_SYNC_TIMESLICE_MS 5000
> > > +#define BG_RAM_SYNC_TIMER_INTERVAL_MS 1000
> >
> > Why this timer needs to be invoked every 1sec, if it's a 5sec timer?
> >
> 
> The logic is stupid :(, I'll fix that.
> 
> 
> > > +
> > > +static QEMUTimer *bg_ram_dirty_sync_timer;
> > > +
> > >  XBZRLECacheStats xbzrle_counters;
> > >
> > >  /* used by the search for pages to send */
> > > @@ -4543,6 +4549,64 @@ static void
> > ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
> > >      }
> > >  }
> > >
> > > +static void bg_ram_dirty_sync_timer_tick(void *opaque)
> >
> > Please consider moving this function to cpu-throttle.c.
> >
> 
> Yes, I got your idea, but, IMHO, the cpu-throttle.c only implements the CPU
> throttle, not the ramblock dirty sync, the migration_bitmap_sync_precopy
> could or should not be called in the cpu-throttle.c.Do we want to change
> this code level?
> 
> Another way is we define the bg_ram_dirty_sync_timer in cpu-throttle.c
> and modify it in bg_ram_dirty_sync_timer_tick as a extern variable in ram.c
> I prefer the latter, What do you think of it?

I think it's better all logic resides in cpu-throttle.c.

You can have one pre-requisite patch remove "rs" parameter in
migration_bitmap_sync_precopy(), then export it in migration/misc.h.

Maybe you also need to export time_last_bitmap_sync, you can make another
helper for that and also put it in misc.h too.

Said that all, I wonder whether we should move cpu-throttle.c under
migration/, as that's only used in migration.. then we can avoid exporting
in misc.h, but export them in migration.h (which is for internal only).

> 
> 
> >
> > Also please prefix the functions with cpu_throttle_*(), rather than bg_*().
> > It should be part of auto converge function.
> >
> > > +{
> > > +    static int prev_pct;
> > > +    static uint64_t prev_sync_cnt = 2;
> > > +    uint64_t sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> > > +    int cur_pct = cpu_throttle_get_percentage();
> > > +
> > > +    if (prev_pct && !cur_pct) {
> > > +        /* CPU throttle timer stopped, so do we */
> > > +        return;
> > > +    }
> >
> > Didn't follow why this is not needed if cpu throttle is 0.
> >
> 
> The sequence in my head is:
> 
> bg dirty sync -> mig_throttle_guest_down -> throttle -> throttle stop-> bg
> dirty sync stop
> 
> The bg dirty sync may be invoked before throttle, and
> the throttle is also 0 at that time, if we code like:
> 
> if (!cpu_throttle_get_percentage()) {
>     return;
> }
> 
> The bg dirty sync timer can tick only once and stop.
> 
> If we change the sequence as following, we can ignore this case:
> 
> mig_throttle_guest_down -> bg dirty sync -> throttle -> throttle stop-> bg
> dirty sync stop
> 
> But as the code in migration_trigger_throttle indicate:
> 
>   if ((bytes_dirty_period > bytes_dirty_threshold) &&
>         ++rs->dirty_rate_high_cnt >= 2) {
>      ...
>   }
> 
> Since the 1st iteration can not satisfy the condition rs->dirty_rate_high_cnt
> >= 2
> as usual, the mig_throttle_guest_down gets invoked in 3nd iteration with
> high
> probability. If the 2nd iteration is very long, the bg dirty sync can not
> be invoked and
> we may lose the chance to trigger CPU throttle as I mentioned.

I wonder whether it's working if we simply put:

  if (!migrate_auto_converge()) {
      return;
  }

I think this timer should kick even if !cpu_throttle_active(), becuase
!active doesn't mean the feature is off.  In this case the feature is
enabled as long as migrate_auto_converge()==true.

This addes another reason that maybe we want to move system/cpu-throttle.c
under migration/.. because otherwise we'll need to export
migrate_auto_converge() once more.

> 
> 
> > It means dirty rate is probably very low, but then shouldn't this
> > background sync still working (just to notice it grows again)?
> >
> > > +
> > > +    /*
> > > +     * The first iteration copies all memory anyhow and has no
> > > +     * effect on guest performance, therefore omit it to avoid
> > > +     * paying extra for the sync penalty.
> > > +     */
> > > +    if (sync_cnt <= 1) {
> > > +        goto end;
> > > +    }
> > > +
> > > +    if (sync_cnt == prev_sync_cnt) {
> > > +        int64_t curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +        assert(ram_state);
> > > +        if ((curr_time - ram_state->time_last_bitmap_sync) >
> > > +            BG_RAM_SYNC_TIMESLICE_MS) {
> > > +            trace_bg_ram_dirty_sync();
> > > +            WITH_RCU_READ_LOCK_GUARD() {
> > > +                migration_bitmap_sync_precopy(ram_state, false);
> > > +            }
> > > +        }
> > > +    }
> > > +
> > > +end:
> > > +    prev_sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> > > +    prev_pct = cpu_throttle_get_percentage();
> > > +
> > > +    timer_mod(bg_ram_dirty_sync_timer,
> > > +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) +
> > > +            BG_RAM_SYNC_TIMER_INTERVAL_MS);
> >
> > IIUC we only need to fire per 5sec, not 1s?
> >
> 
> Thanks to point out, I'll refine this logic.
> 
> 
> >
> > > +}
> > > +
> > > +void bg_ram_dirty_sync_timer_enable(bool enable)
> > > +{
> > > +    if (enable) {
> > > +        bg_ram_dirty_sync_timer_tick(NULL);
> > > +    } else {
> > > +        timer_del(bg_ram_dirty_sync_timer);
> > > +    }
> > > +}
> > > +
> > > +void bg_ram_dirty_sync_init(void)
> > > +{
> > > +    bg_ram_dirty_sync_timer =
> > > +        timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
> > > +                     bg_ram_dirty_sync_timer_tick, NULL);
> > > +}
> >
> > IMHO all these functions should move to cpu-throttle.c.
> >
> > > +
> > >  static RAMBlockNotifier ram_mig_ram_notifier = {
> > >      .ram_block_resized = ram_mig_ram_block_resized,
> > >  };
> > > diff --git a/migration/ram.h b/migration/ram.h
> > > index bc0318b834..9c1a2f30f1 100644
> > > --- a/migration/ram.h
> > > +++ b/migration/ram.h
> > > @@ -93,4 +93,7 @@ void ram_write_tracking_prepare(void);
> > >  int ram_write_tracking_start(void);
> > >  void ram_write_tracking_stop(void);
> > >
> > > +/* Background ramblock dirty sync */
> > > +void bg_ram_dirty_sync_timer_enable(bool enable);
> > > +
> > >  #endif
> > > diff --git a/migration/trace-events b/migration/trace-events
> > > index c65902f042..3f09e7f383 100644
> > > --- a/migration/trace-events
> > > +++ b/migration/trace-events
> > > @@ -90,6 +90,7 @@ put_qlist_end(const char *field_name, const char
> > *vmsd_name) "%s(%s)"
> > >  qemu_file_fclose(void) ""
> > >
> > >  # ram.c
> > > +bg_ram_dirty_sync(void) ""
> > >  get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned
> > long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> > >  get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset,
> > unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> > >  migration_bitmap_sync_start(void) ""
> > > diff --git a/system/cpu-timers.c b/system/cpu-timers.c
> > > index 0b31c9a1b6..64f0834be4 100644
> > > --- a/system/cpu-timers.c
> > > +++ b/system/cpu-timers.c
> > > @@ -25,6 +25,7 @@
> > >  #include "qemu/osdep.h"
> > >  #include "qemu/cutils.h"
> > >  #include "migration/vmstate.h"
> > > +#include "migration/misc.h"
> > >  #include "qapi/error.h"
> > >  #include "qemu/error-report.h"
> > >  #include "sysemu/cpus.h"
> > > @@ -274,4 +275,5 @@ void cpu_timers_init(void)
> > >      vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
> > >
> > >      cpu_throttle_init();
> > > +    bg_ram_dirty_sync_init();
> > >  }
> > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > > index d6768d5d71..3296f5244d 100644
> > > --- a/tests/qtest/migration-test.c
> > > +++ b/tests/qtest/migration-test.c
> > > @@ -468,6 +468,12 @@ static void migrate_ensure_converge(QTestState *who)
> > >      migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
> > >  }
> > >
> > > +static void migrate_ensure_iteration_last_long(QTestState *who)
> > > +{
> > > +    /* Set 10Byte/s bandwidth limit to make the iteration last long
> > enough */
> > > +    migrate_set_parameter_int(who, "max-bandwidth", 10);
> > > +}
> > > +
> > >  /*
> > >   * Our goal is to ensure that we run a single full migration
> > >   * iteration, and also dirty memory, ensuring that at least
> > > @@ -2791,6 +2797,7 @@ static void test_migrate_auto_converge(void)
> > >       * so we need to decrease a bandwidth.
> > >       */
> > >      const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
> > > +    uint64_t prev_dirty_sync_cnt, dirty_sync_cnt;
> > >
> > >      if (test_migrate_start(&from, &to, uri, &args)) {
> > >          return;
> > > @@ -2827,6 +2834,28 @@ static void test_migrate_auto_converge(void)
> > >      } while (true);
> > >      /* The first percentage of throttling should be at least init_pct */
> > >      g_assert_cmpint(percentage, >=, init_pct);
> > > +
> > > +    /* Make sure the iteration last a long time enough */
> > > +    migrate_ensure_iteration_last_long(from);
> >
> > There's already migrate_ensure_non_converge(), why this is needed?
> >
> 
> I'm feeling that migrate_ensure_non_converge cannot ensure the
> iteration lasts greater than 5s, so i and an extra util function.

non_converge sets it to 3MB/s, while all qtest mem should be >=100MB on all
archs, it looks ok as of now.  Maybe add a comment would suffice?

It's not extremely bad to even miss one here in unit test, if we target
this feature as pretty much optional on top of auto converge.  If you want
to provide a solid test, you can add a stat counter and check it here with
query-migrate.  But again it'll be better move cpu-throttle.c over first,
or it requires another export in misc.h..

> 
> 
> > > +
> > > +    /*
> > > +     * End the loop when the dirty sync count greater than 1.
> > > +     */
> > > +    while ((dirty_sync_cnt = get_migration_pass(from)) < 2) {
> > > +        usleep(1000 * 1000);
> > > +    }
> > > +
> > > +    prev_dirty_sync_cnt = dirty_sync_cnt;
> > > +
> > > +    /*
> > > +     * The dirty sync count must changes in 5 seconds, here we
> > > +     * plus 1 second as error value.
> > > +     */
> > > +    sleep(5 + 1);
> > > +
> > > +    dirty_sync_cnt = get_migration_pass(from);
> > > +    g_assert_cmpint(dirty_sync_cnt, != , prev_dirty_sync_cnt);
> > > +
> > >      /* Now, when we tested that throttling works, let it converge */
> > >      migrate_ensure_converge(from);
> >
> > Please move the test change into a separate patch.  I had a feeling 5+1 sec
> > might still easily fail on CIs (even though this test is not yet run).
> > Maybe it needs to still provide a loop so it waits for a few rounds just in
> > case.
> >
> 
> OK.
> 
> 
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >
> >
> Thanks,
> Yong
> 
> -- 
> Best regards

-- 
Peter Xu


Reply via email to