Re: [PATCH] migration/calc-dirty-rate: millisecond precision period

Peter Xu Mon, 17 Jul 2023 12:09:35 -0700

On Tue, Jul 11, 2023 at 03:38:18PM +0300, gudkov.and...@huawei.com wrote:
> On Thu, Jul 06, 2023 at 03:23:43PM -0400, Peter Xu wrote:
> > On Thu, Jun 29, 2023 at 11:59:03AM +0300, Andrei Gudkov wrote:
> > > Introduces alternative argument calc-time-ms, which is the
> > > the same as calc-time but accepts millisecond value.
> > > Millisecond precision allows to make predictions whether
> > > migration will succeed or not. To do this, calculate dirty
> > > rate with calc-time-ms set to max allowed downtime, convert
> > > measured rate into volume of dirtied memory, and divide by
> > > network throughput. If the value is lower than max allowed
> > > downtime, then migration will converge.
> > > 
> > > Measurement results for single thread randomly writing to
> > > a 24GiB region:
> > > +--------------+--------------------+
> > > | calc-time-ms | dirty-rate (MiB/s) |
> > > +--------------+--------------------+
> > > |          100 |               1880 |
> > > |          200 |               1340 |
> > > |          300 |               1120 |
> > > |          400 |               1030 |
> > > |          500 |                868 |
> > > |          750 |                720 |
> > > |         1000 |                636 |
> > > |         1500 |                498 |
> > > |         2000 |                423 |
> > > +--------------+--------------------+
> > 
> > Do you mean the dirty workload is constant?  Why it differs so much with
> > different calc-time-ms?
> 
> Workload is as constant as it could be. But the naming is misleading.
> What is named "dirty-rate" in fact is not "rate" at all.
> calc-dirty-rate measures number of *uniquely* dirtied pages, i.e. each
> page can contribute to the counter only once during measurement period.
> That's why the values are decreasing. Consider also ad infinitum argument:
> since VM has fixed number of pages and each page can be dirtied only once,
> dirty-rate=number-of-dirtied-pages/calc-time -> 0 as calc-time -> inf.
> It would make more sense to report number as "dirty-volume" --
> without dividing it by calc-time.
> 
> Note that number of *uniquely* dirtied pages in given amount of time is
> exactly what we need for doing migration-related predictions. There is
> no error here.


Is calc-time-ms the duration of the measurement?

Taking the 1st line as example, 1880MB/s * 0.1s = 188MB.
For the 2nd line, 1340MB/s * 0.2s = 268MB.
Even for the longest duration of 2s, that's 846MB in total.

The range is 24GB.  In this case, most of the pages should only be written
once even if random for all these test durations, right?

> 
> > 
> > I also doubt usefulness on huge vms the ms accuracy won't matter much
> > because enable/disable dirty logging overhead can already be ms level for
> > those.
> > 
> The goal is to measure volume of dirtied memory corresponding to desired
> downtime, which is typically order of 100-300ms. I think that error of
> +/-10ms won't make any big difference. Maybe I should've called this 
> "millisecond granularity" and not "millisecond precision".
> 
> > > 
> > > Signed-off-by: Andrei Gudkov <gudkov.and...@huawei.com>
> > > ---
> > >  qapi/migration.json   | 15 ++++++--
> > >  migration/dirtyrate.h | 12 ++++---
> > >  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> > >  3 files changed, 68 insertions(+), 40 deletions(-)
> > > 
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 5bb5ab82a0..dd1afe1982 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -1778,7 +1778,12 @@
> > >  #
> > >  # @start-time: start time in units of second for calculation
> > >  #
> > > -# @calc-time: time in units of second for sample dirty pages
> > > +# @calc-time: time period for which dirty page rate was measured
> > > +#     (rounded down to seconds).
> > > +#
> > > +# @calc-time-ms: actual time period for which dirty page rate was
> > > +#     measured (in milliseconds).  Value may be larger than requested
> > > +#     time period due to measurement overhead.
> > >  #
> > >  # @sample-pages: page count per GB for sample dirty pages the default
> > >  #     value is 512 (since 6.1)
> > > @@ -1796,6 +1801,7 @@
> > >             'status': 'DirtyRateStatus',
> > >             'start-time': 'int64',
> > >             'calc-time': 'int64',
> > > +           'calc-time-ms': 'int64',
> > >             'sample-pages': 'uint64',
> > >             'mode': 'DirtyRateMeasureMode',
> > >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > > @@ -1807,6 +1813,10 @@
> > >  #
> > >  # @calc-time: time in units of second for sample dirty pages
> > >  #
> > > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > > +#    two arguments are mutually exclusive.  Exactly one of them must
> > > +#    be specified. (Since 8.1)
> > > +#
> > >  # @sample-pages: page count per GB for sample dirty pages the default
> > >  #     value is 512 (since 6.1)
> > >  #
> > > @@ -1821,7 +1831,8 @@
> > >  #                                                 'sample-pages': 512} }
> > >  # <- { "return": {} }
> > >  ##
> > > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > > +                                         '*calc-time-ms': 'int64',
> > >                                           '*sample-pages': 'int',
> > >                                           '*mode': 
> > > 'DirtyRateMeasureMode'} }
> > >  
> > > diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> > > index 594a5c0bb6..869c060941 100644
> > > --- a/migration/dirtyrate.h
> > > +++ b/migration/dirtyrate.h
> > > @@ -31,10 +31,12 @@
> > >  #define MIN_RAMBLOCK_SIZE                         128
> > >  
> > >  /*
> > > - * Take 1s as minimum time for calculation duration
> > > + * Allowed range for dirty page rate calculation (in milliseconds).
> > > + * Lower limit relates to the smallest realistic downtime it
> > > + * makes sense to impose on migration.
> > >   */
> > > -#define MIN_FETCH_DIRTYRATE_TIME_SEC              1
> > > -#define MAX_FETCH_DIRTYRATE_TIME_SEC              60
> > > +#define MIN_CALC_TIME_MS                          50
> > > +#define MAX_CALC_TIME_MS                       60000
> > >  
> > >  /*
> > >   * Take 1/16 pages in 1G as the maxmum sample page count
> > > @@ -44,7 +46,7 @@
> > >  
> > >  struct DirtyRateConfig {
> > >      uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> > > -    int64_t sample_period_seconds; /* time duration between two sampling 
> > > */
> > > +    int64_t calc_time_ms; /* desired calculation time (in milliseconds) 
> > > */
> > >      DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
> > >  };
> > >  
> > > @@ -73,7 +75,7 @@ typedef struct SampleVMStat {
> > >  struct DirtyRateStat {
> > >      int64_t dirty_rate; /* dirty rate in MB/s */
> > >      int64_t start_time; /* calculation start time in units of second */
> > > -    int64_t calc_time; /* time duration of two sampling in units of 
> > > second */
> > > +    int64_t calc_time_ms; /* actual calculation time (in milliseconds) */
> > >      uint64_t sample_pages; /* sample pages per GB */
> > >      union {
> > >          SampleVMStat page_sampling;
> > > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> > > index 84f1b0fb20..90fb336329 100644
> > > --- a/migration/dirtyrate.c
> > > +++ b/migration/dirtyrate.c
> > > @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t 
> > > initial_time)
> > >          msec = current_time - initial_time;
> > >      } else {
> > >          g_usleep((msec + initial_time - current_time) * 1000);
> > > +        /* g_usleep may overshoot */
> > > +        msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
> > 
> > This removes the g_usleep(), is it intended?  I thought it was the code to
> > do the delay.  What is the solution to g_usleep() then?
> > 
> It is still there -- just above the comment. What I added is
> qemu_clock_get_ms() call after the g_usleep. It sets msec variable
> to the true amount of time elapsed between measurements. Previously
> I detected that actually slept time may be slightly longer than
> requested value.

Ouch.. please ignore that.  I don't know how did I read that.

The impl is pretty much straightforward and fine by me, I just want to make
sure I still understand those numbers you provided in the commit message,
and making sure measuring in small duration can at least make sense in
majority cases.

Thanks,

-- 
Peter Xu

Re: [PATCH] migration/calc-dirty-rate: millisecond precision period

Reply via email to