Re: [PATCH v2 03/16] migration: Move setup_time to mig_stats

Juan Quintela Tue, 16 May 2023 03:07:31 -0700

David Edmondson <david.edmond...@oracle.com> wrote:
> Juan Quintela <quint...@redhat.com> writes:
>
>> It is a time that needs to be cleaned each time cancel migration.
>> Once there create migration_time_since() to calculate how time since a
>> time in the past.
>>
>> Signed-off-by: Juan Quintela <quint...@redhat.com>
>>
>> ---
>>
>> Rename to migration_time_since (cédric)
>> ---
>>  migration/migration-stats.h | 13 +++++++++++++
>>  migration/migration.h       |  1 -
>>  migration/migration-stats.c |  7 +++++++
>>  migration/migration.c       |  9 ++++-----
>>  4 files changed, 24 insertions(+), 6 deletions(-)
>>
>> diff --git a/migration/migration-stats.h b/migration/migration-stats.h
>> index e782f1b0df..21402af9e4 100644
>> --- a/migration/migration-stats.h
>> +++ b/migration/migration-stats.h
>> @@ -75,6 +75,10 @@ typedef struct {
>>       * Number of bytes sent during precopy stage.
>>       */
>>      Stat64 precopy_bytes;
>> +    /*
>> +     * How long has the setup stage took.
>> +     */
>> +    Stat64 setup_time;
>
> Is this really a Stat64? It doesn't appear to need the atomic update
> feature.


What this whole Migration Atomic Counters series try to do is that
everything becomes atomic and then we can use everything everywhere.

Before this series we had (I am simplifying here):

- transferred, precopy_bytes, postcopy_bytes, downtime_bytes -> atomic,
  you can use it anywhere

- qemu_file transferred -> you can only use it from the main migration
  thread

- qemu_file rate_limit -> you can only use it from the main migration
  thread

And we had to update the three counters in every place that we did a
write wehad to update all of them.

You can see the contorsions that we go to to update the rate_limit and
the qemu_file transferred fields.

After the series (you need to get what it is already on the tree, this
series, QEMUFileHooks cleanup, and another serie on my tree waiting for
this to be commited), you got three counters:

- qemu_file: atomic, everytime we do a qemu_file write we update it
- multifd_bytes: atomic, everytime that we do a write in a multifd
  channel, we update it.
- rdma_bytes: atomic, everytime we do a write through RDMA we update it.

And that is it.

Both rate_limit and transferred are derived from these three counters:

- at any point in time migration_transferred_bytes() returns the amount
  of bytes written since the start of the migration:
     qemu_file_bytes + multifd_bytes + rdma_bytes.

- transferred on this period:
       at_start_of_period = migration_transferred_bytes().
       trasferred_in_this_period = migration_transferred_bytes() - 
at_start_of_period;

- Similar for precopy_bytes, postcopy_bytes and downtime_bytes.  When we
  move from one stage to the next, we store what is the value of the
  previous stage.

The counters that we use to calculate the rate limit are updated around
10 times per second (can be a bit bigger at the end of periods,
iterations, ...)  So performance is not extra critical.

But as we have way less atomic operations (really one per real write),
we don't really care a lot if we do some atomic operations when a normal
operation will do.

I.e. I think we have two options:

- have the remaining counters that are only used in the main migration
  thread not be atomic.  Document them and remember to do the correct
  thing everytime we use it.  If we need to use it in another thread,
  just change it to atomic.

- Make all counters atomic. No need to document anything.  And you can
  call any operation/counter/... in migration-stats.c from anywhere.

I think that the second option is better.  But I can hear reasons from
people that think that the 1st one is better.

Comments?

Later, Juan.

Re: [PATCH v2 03/16] migration: Move setup_time to mig_stats

Reply via email to