Re: [PATCH] migration: Fix qmp_query_migrate mbps value

2024-02-25 Thread Peter Xu
On Fri, Feb 23, 2024 at 09:39:12AM -0300, Fabiano Rosas wrote:
> I've been planning to merge migration_completion() and
> migration_iteration_finish(). It's too unintuitive to do the completion
> routine deep inside migration_iteration_run(). AFAICS those are all tail
> calls, so we could bring migration_completion() up into the
> migration_thread top level.
> 
> So if you'll allow me I think I'll refrain from moving the state into
> migration_calculate_complete() for now.

Yep go ahead, I'll read the patches.

-- 
Peter Xu




Re: [PATCH] migration: Fix qmp_query_migrate mbps value

2024-02-23 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, Feb 22, 2024 at 10:49:12AM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Thu, Feb 22, 2024 at 05:40:41PM +0800, Peter Xu wrote:
>> >> On Wed, Feb 21, 2024 at 09:56:36AM -0300, Fabiano Rosas wrote:
>> >> > Peter Xu  writes:
>> >> > 
>> >> > > On Mon, Feb 19, 2024 at 04:44:57PM -0300, Fabiano Rosas wrote:
>> >> > >> The QMP command query_migrate might see incorrect throughput numbers
>> >> > >> if it runs after we've set the migration completion status but before
>> >> > >> migration_calculate_complete() has updated s->total_time and s->mbps.
>> >> > >> 
>> >> > >> The migration status would show COMPLETED, but the throughput value
>> >> > >> would be the one from the last iteration and not the one from the
>> >> > >> whole migration. This will usually be a larger value due to the time
>> >> > >> period being smaller (one iteration).
>> >> > >> 
>> >> > >> Move migration_calculate_complete() earlier so that the status
>> >> > >> MIGRATION_STATUS_COMPLETED is only emitted after the final counters
>> >> > >> update.
>> >> > >> 
>> >> > >> Signed-off-by: Fabiano Rosas 
>> >> > >> ---
>> >> > >> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1182405776
>> >> > >> ---
>> >> > >>  migration/migration.c | 10 ++
>> >> > >>  1 file changed, 6 insertions(+), 4 deletions(-)
>> >> > >> 
>> >> > >> diff --git a/migration/migration.c b/migration/migration.c
>> >> > >> index ab21de2cad..7486d59da0 100644
>> >> > >> --- a/migration/migration.c
>> >> > >> +++ b/migration/migration.c
>> >> > >> @@ -102,6 +102,7 @@ static int migration_maybe_pause(MigrationState 
>> >> > >> *s,
>> >> > >>   int new_state);
>> >> > >>  static void migrate_fd_cancel(MigrationState *s);
>> >> > >>  static bool close_return_path_on_source(MigrationState *s);
>> >> > >> +static void migration_calculate_complete(MigrationState *s);
>> >> > >>  
>> >> > >>  static void migration_downtime_start(MigrationState *s)
>> >> > >>  {
>> >> > >> @@ -2746,6 +2747,7 @@ static void 
>> >> > >> migration_completion(MigrationState *s)
>> >> > >>  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
>> >> > >>MIGRATION_STATUS_COLO);
>> >> > >>  } else {
>> >> > >> +migration_calculate_complete(s);
>> >> > >>  migrate_set_state(>state, current_active_state,
>> >> > >>MIGRATION_STATUS_COMPLETED);
>> >> > >>  }
>> >> > >> @@ -2784,6 +2786,7 @@ static void 
>> >> > >> bg_migration_completion(MigrationState *s)
>> >> > >>  goto fail;
>> >> > >>  }
>> >> > >>  
>> >> > >> +migration_calculate_complete(s);
>> >> > >>  migrate_set_state(>state, current_active_state,
>> >> > >>MIGRATION_STATUS_COMPLETED);
>> >> > >>  return;
>> >> > >> @@ -2993,12 +2996,15 @@ static void 
>> >> > >> migration_calculate_complete(MigrationState *s)
>> >> > >>  int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> >> > >>  int64_t transfer_time;
>> >> > >>  
>> >> > >> +/* QMP could read from these concurrently */
>> >> > >> +bql_lock();
>> >> > >>  migration_downtime_end(s);
>> >> > >>  s->total_time = end_time - s->start_time;
>> >> > >>  transfer_time = s->total_time - s->setup_time;
>> >> > >>  if (transfer_time) {
>> >> > >>  s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
>> >> > >>  }
>> >> > >> +bql_unlock();
>> >> > >
>> >> > > The lock is not needed?
>> >> > >
>> >> > > AFAIU that was needed because of things like runstate_set() rather 
>> >> > > than
>> >> > > setting of these fields.
>> >> > >
>> >> > 
>> >> > Don't we need to keep the total_time and mbps update atomic? Otherwise
>> >> > query-migrate might see (say) total_time=0 and mbps= or
>> >> > total_time= and mbps=.
>> >> 
>> >> I thought it wasn't a major concern, but what you said makes sense; taking
>> >> it one more time doesn't really hurt after all to provide such benefit.
>> >> 
>> >> > 
>> >> > Also, what orders s->mbps update before the s->state update? I'd say we
>> >> > should probably hold the lock around the whole total_time,mbps,state
>> >> > update.
>> >> 
>> >> IMHO that's fine; mutex unlock implies a RELEASE.  See atomic.rst:
>> >> 
>> >> - ``pthread_mutex_lock`` has acquire semantics, ``pthread_mutex_unlock`` 
>> >> has
>> >>   release semantics and synchronizes with a ``pthread_mutex_lock`` for the
>> >>   same mutex.
>> >
>> > Hmm perhaps I wrote too soon.. it should only guarantee the ordering of the
>> > update on the lock variable itself v.s. any previous R, nothing else.
>> > Only if the other side uses bql_lock() will it guarantee proper ordering.
>> >
>> > Put them in bql should work, but I hesitate such use to start using bql
>> > to protect state updates.
>> 
>> Well, on the other hand that's a major use-case of the BQL: protecting
>> state that's used by QMP.
>> 
>> >
>> > How about we drop the lock, but use an 

Re: [PATCH] migration: Fix qmp_query_migrate mbps value

2024-02-22 Thread Peter Xu
On Thu, Feb 22, 2024 at 10:49:12AM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Thu, Feb 22, 2024 at 05:40:41PM +0800, Peter Xu wrote:
> >> On Wed, Feb 21, 2024 at 09:56:36AM -0300, Fabiano Rosas wrote:
> >> > Peter Xu  writes:
> >> > 
> >> > > On Mon, Feb 19, 2024 at 04:44:57PM -0300, Fabiano Rosas wrote:
> >> > >> The QMP command query_migrate might see incorrect throughput numbers
> >> > >> if it runs after we've set the migration completion status but before
> >> > >> migration_calculate_complete() has updated s->total_time and s->mbps.
> >> > >> 
> >> > >> The migration status would show COMPLETED, but the throughput value
> >> > >> would be the one from the last iteration and not the one from the
> >> > >> whole migration. This will usually be a larger value due to the time
> >> > >> period being smaller (one iteration).
> >> > >> 
> >> > >> Move migration_calculate_complete() earlier so that the status
> >> > >> MIGRATION_STATUS_COMPLETED is only emitted after the final counters
> >> > >> update.
> >> > >> 
> >> > >> Signed-off-by: Fabiano Rosas 
> >> > >> ---
> >> > >> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1182405776
> >> > >> ---
> >> > >>  migration/migration.c | 10 ++
> >> > >>  1 file changed, 6 insertions(+), 4 deletions(-)
> >> > >> 
> >> > >> diff --git a/migration/migration.c b/migration/migration.c
> >> > >> index ab21de2cad..7486d59da0 100644
> >> > >> --- a/migration/migration.c
> >> > >> +++ b/migration/migration.c
> >> > >> @@ -102,6 +102,7 @@ static int migration_maybe_pause(MigrationState 
> >> > >> *s,
> >> > >>   int new_state);
> >> > >>  static void migrate_fd_cancel(MigrationState *s);
> >> > >>  static bool close_return_path_on_source(MigrationState *s);
> >> > >> +static void migration_calculate_complete(MigrationState *s);
> >> > >>  
> >> > >>  static void migration_downtime_start(MigrationState *s)
> >> > >>  {
> >> > >> @@ -2746,6 +2747,7 @@ static void migration_completion(MigrationState 
> >> > >> *s)
> >> > >>  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
> >> > >>MIGRATION_STATUS_COLO);
> >> > >>  } else {
> >> > >> +migration_calculate_complete(s);
> >> > >>  migrate_set_state(>state, current_active_state,
> >> > >>MIGRATION_STATUS_COMPLETED);
> >> > >>  }
> >> > >> @@ -2784,6 +2786,7 @@ static void 
> >> > >> bg_migration_completion(MigrationState *s)
> >> > >>  goto fail;
> >> > >>  }
> >> > >>  
> >> > >> +migration_calculate_complete(s);
> >> > >>  migrate_set_state(>state, current_active_state,
> >> > >>MIGRATION_STATUS_COMPLETED);
> >> > >>  return;
> >> > >> @@ -2993,12 +2996,15 @@ static void 
> >> > >> migration_calculate_complete(MigrationState *s)
> >> > >>  int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >> > >>  int64_t transfer_time;
> >> > >>  
> >> > >> +/* QMP could read from these concurrently */
> >> > >> +bql_lock();
> >> > >>  migration_downtime_end(s);
> >> > >>  s->total_time = end_time - s->start_time;
> >> > >>  transfer_time = s->total_time - s->setup_time;
> >> > >>  if (transfer_time) {
> >> > >>  s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
> >> > >>  }
> >> > >> +bql_unlock();
> >> > >
> >> > > The lock is not needed?
> >> > >
> >> > > AFAIU that was needed because of things like runstate_set() rather than
> >> > > setting of these fields.
> >> > >
> >> > 
> >> > Don't we need to keep the total_time and mbps update atomic? Otherwise
> >> > query-migrate might see (say) total_time=0 and mbps= or
> >> > total_time= and mbps=.
> >> 
> >> I thought it wasn't a major concern, but what you said makes sense; taking
> >> it one more time doesn't really hurt after all to provide such benefit.
> >> 
> >> > 
> >> > Also, what orders s->mbps update before the s->state update? I'd say we
> >> > should probably hold the lock around the whole total_time,mbps,state
> >> > update.
> >> 
> >> IMHO that's fine; mutex unlock implies a RELEASE.  See atomic.rst:
> >> 
> >> - ``pthread_mutex_lock`` has acquire semantics, ``pthread_mutex_unlock`` 
> >> has
> >>   release semantics and synchronizes with a ``pthread_mutex_lock`` for the
> >>   same mutex.
> >
> > Hmm perhaps I wrote too soon.. it should only guarantee the ordering of the
> > update on the lock variable itself v.s. any previous R, nothing else.
> > Only if the other side uses bql_lock() will it guarantee proper ordering.
> >
> > Put them in bql should work, but I hesitate such use to start using bql
> > to protect state updates.
> 
> Well, on the other hand that's a major use-case of the BQL: protecting
> state that's used by QMP.
> 
> >
> > How about we drop the lock, but use an explicit smp_mb_release()?  We may
> > also want to use smb_load_acquire() in fill_source_migration_info() to use
> > on reading >state (all 

Re: [PATCH] migration: Fix qmp_query_migrate mbps value

2024-02-22 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, Feb 22, 2024 at 05:40:41PM +0800, Peter Xu wrote:
>> On Wed, Feb 21, 2024 at 09:56:36AM -0300, Fabiano Rosas wrote:
>> > Peter Xu  writes:
>> > 
>> > > On Mon, Feb 19, 2024 at 04:44:57PM -0300, Fabiano Rosas wrote:
>> > >> The QMP command query_migrate might see incorrect throughput numbers
>> > >> if it runs after we've set the migration completion status but before
>> > >> migration_calculate_complete() has updated s->total_time and s->mbps.
>> > >> 
>> > >> The migration status would show COMPLETED, but the throughput value
>> > >> would be the one from the last iteration and not the one from the
>> > >> whole migration. This will usually be a larger value due to the time
>> > >> period being smaller (one iteration).
>> > >> 
>> > >> Move migration_calculate_complete() earlier so that the status
>> > >> MIGRATION_STATUS_COMPLETED is only emitted after the final counters
>> > >> update.
>> > >> 
>> > >> Signed-off-by: Fabiano Rosas 
>> > >> ---
>> > >> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1182405776
>> > >> ---
>> > >>  migration/migration.c | 10 ++
>> > >>  1 file changed, 6 insertions(+), 4 deletions(-)
>> > >> 
>> > >> diff --git a/migration/migration.c b/migration/migration.c
>> > >> index ab21de2cad..7486d59da0 100644
>> > >> --- a/migration/migration.c
>> > >> +++ b/migration/migration.c
>> > >> @@ -102,6 +102,7 @@ static int migration_maybe_pause(MigrationState *s,
>> > >>   int new_state);
>> > >>  static void migrate_fd_cancel(MigrationState *s);
>> > >>  static bool close_return_path_on_source(MigrationState *s);
>> > >> +static void migration_calculate_complete(MigrationState *s);
>> > >>  
>> > >>  static void migration_downtime_start(MigrationState *s)
>> > >>  {
>> > >> @@ -2746,6 +2747,7 @@ static void migration_completion(MigrationState 
>> > >> *s)
>> > >>  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
>> > >>MIGRATION_STATUS_COLO);
>> > >>  } else {
>> > >> +migration_calculate_complete(s);
>> > >>  migrate_set_state(>state, current_active_state,
>> > >>MIGRATION_STATUS_COMPLETED);
>> > >>  }
>> > >> @@ -2784,6 +2786,7 @@ static void 
>> > >> bg_migration_completion(MigrationState *s)
>> > >>  goto fail;
>> > >>  }
>> > >>  
>> > >> +migration_calculate_complete(s);
>> > >>  migrate_set_state(>state, current_active_state,
>> > >>MIGRATION_STATUS_COMPLETED);
>> > >>  return;
>> > >> @@ -2993,12 +2996,15 @@ static void 
>> > >> migration_calculate_complete(MigrationState *s)
>> > >>  int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> > >>  int64_t transfer_time;
>> > >>  
>> > >> +/* QMP could read from these concurrently */
>> > >> +bql_lock();
>> > >>  migration_downtime_end(s);
>> > >>  s->total_time = end_time - s->start_time;
>> > >>  transfer_time = s->total_time - s->setup_time;
>> > >>  if (transfer_time) {
>> > >>  s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
>> > >>  }
>> > >> +bql_unlock();
>> > >
>> > > The lock is not needed?
>> > >
>> > > AFAIU that was needed because of things like runstate_set() rather than
>> > > setting of these fields.
>> > >
>> > 
>> > Don't we need to keep the total_time and mbps update atomic? Otherwise
>> > query-migrate might see (say) total_time=0 and mbps= or
>> > total_time= and mbps=.
>> 
>> I thought it wasn't a major concern, but what you said makes sense; taking
>> it one more time doesn't really hurt after all to provide such benefit.
>> 
>> > 
>> > Also, what orders s->mbps update before the s->state update? I'd say we
>> > should probably hold the lock around the whole total_time,mbps,state
>> > update.
>> 
>> IMHO that's fine; mutex unlock implies a RELEASE.  See atomic.rst:
>> 
>> - ``pthread_mutex_lock`` has acquire semantics, ``pthread_mutex_unlock`` has
>>   release semantics and synchronizes with a ``pthread_mutex_lock`` for the
>>   same mutex.
>
> Hmm perhaps I wrote too soon.. it should only guarantee the ordering of the
> update on the lock variable itself v.s. any previous R, nothing else.
> Only if the other side uses bql_lock() will it guarantee proper ordering.
>
> Put them in bql should work, but I hesitate such use to start using bql
> to protect state updates.

Well, on the other hand that's a major use-case of the BQL: protecting
state that's used by QMP.

>
> How about we drop the lock, but use an explicit smp_mb_release()?  We may
> also want to use smb_load_acquire() in fill_source_migration_info() to use
> on reading >state (all will need some comment).  To me, making sure the
> total mbps is valid seems more important; while the other races are less
> harmful, and may not be a major concern?

That more closely reflects the problem we're trying to solve, which is
just an ordering one. However, the QMP code already holds the 

Re: [PATCH] migration: Fix qmp_query_migrate mbps value

2024-02-22 Thread Peter Xu
On Thu, Feb 22, 2024 at 05:40:41PM +0800, Peter Xu wrote:
> On Wed, Feb 21, 2024 at 09:56:36AM -0300, Fabiano Rosas wrote:
> > Peter Xu  writes:
> > 
> > > On Mon, Feb 19, 2024 at 04:44:57PM -0300, Fabiano Rosas wrote:
> > >> The QMP command query_migrate might see incorrect throughput numbers
> > >> if it runs after we've set the migration completion status but before
> > >> migration_calculate_complete() has updated s->total_time and s->mbps.
> > >> 
> > >> The migration status would show COMPLETED, but the throughput value
> > >> would be the one from the last iteration and not the one from the
> > >> whole migration. This will usually be a larger value due to the time
> > >> period being smaller (one iteration).
> > >> 
> > >> Move migration_calculate_complete() earlier so that the status
> > >> MIGRATION_STATUS_COMPLETED is only emitted after the final counters
> > >> update.
> > >> 
> > >> Signed-off-by: Fabiano Rosas 
> > >> ---
> > >> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1182405776
> > >> ---
> > >>  migration/migration.c | 10 ++
> > >>  1 file changed, 6 insertions(+), 4 deletions(-)
> > >> 
> > >> diff --git a/migration/migration.c b/migration/migration.c
> > >> index ab21de2cad..7486d59da0 100644
> > >> --- a/migration/migration.c
> > >> +++ b/migration/migration.c
> > >> @@ -102,6 +102,7 @@ static int migration_maybe_pause(MigrationState *s,
> > >>   int new_state);
> > >>  static void migrate_fd_cancel(MigrationState *s);
> > >>  static bool close_return_path_on_source(MigrationState *s);
> > >> +static void migration_calculate_complete(MigrationState *s);
> > >>  
> > >>  static void migration_downtime_start(MigrationState *s)
> > >>  {
> > >> @@ -2746,6 +2747,7 @@ static void migration_completion(MigrationState *s)
> > >>  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
> > >>MIGRATION_STATUS_COLO);
> > >>  } else {
> > >> +migration_calculate_complete(s);
> > >>  migrate_set_state(>state, current_active_state,
> > >>MIGRATION_STATUS_COMPLETED);
> > >>  }
> > >> @@ -2784,6 +2786,7 @@ static void bg_migration_completion(MigrationState 
> > >> *s)
> > >>  goto fail;
> > >>  }
> > >>  
> > >> +migration_calculate_complete(s);
> > >>  migrate_set_state(>state, current_active_state,
> > >>MIGRATION_STATUS_COMPLETED);
> > >>  return;
> > >> @@ -2993,12 +2996,15 @@ static void 
> > >> migration_calculate_complete(MigrationState *s)
> > >>  int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > >>  int64_t transfer_time;
> > >>  
> > >> +/* QMP could read from these concurrently */
> > >> +bql_lock();
> > >>  migration_downtime_end(s);
> > >>  s->total_time = end_time - s->start_time;
> > >>  transfer_time = s->total_time - s->setup_time;
> > >>  if (transfer_time) {
> > >>  s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
> > >>  }
> > >> +bql_unlock();
> > >
> > > The lock is not needed?
> > >
> > > AFAIU that was needed because of things like runstate_set() rather than
> > > setting of these fields.
> > >
> > 
> > Don't we need to keep the total_time and mbps update atomic? Otherwise
> > query-migrate might see (say) total_time=0 and mbps= or
> > total_time= and mbps=.
> 
> I thought it wasn't a major concern, but what you said makes sense; taking
> it one more time doesn't really hurt after all to provide such benefit.
> 
> > 
> > Also, what orders s->mbps update before the s->state update? I'd say we
> > should probably hold the lock around the whole total_time,mbps,state
> > update.
> 
> IMHO that's fine; mutex unlock implies a RELEASE.  See atomic.rst:
> 
> - ``pthread_mutex_lock`` has acquire semantics, ``pthread_mutex_unlock`` has
>   release semantics and synchronizes with a ``pthread_mutex_lock`` for the
>   same mutex.

Hmm perhaps I wrote too soon.. it should only guarantee the ordering of the
update on the lock variable itself v.s. any previous R, nothing else.
Only if the other side uses bql_lock() will it guarantee proper ordering.

Put them in bql should work, but I hesitate such use to start using bql
to protect state updates.

How about we drop the lock, but use an explicit smp_mb_release()?  We may
also want to use smb_load_acquire() in fill_source_migration_info() to use
on reading >state (all will need some comment).  To me, making sure the
total mbps is valid seems more important; while the other races are less
harmful, and may not be a major concern?

PS: logically I think smp_mb_release() is not needed either, because state
is updated using qatomic_cmpxchg(), which implies a full __ATOMIC_SEQ_CST.

> 
> > 
> > I'm not entirely sure, what do you think?
> > 
> > > See migration_update_counters() where it also updates mbps without holding
> > > a lock.
> > 
> > Here it might be less important since it's the middle of the 

Re: [PATCH] migration: Fix qmp_query_migrate mbps value

2024-02-22 Thread Peter Xu
On Wed, Feb 21, 2024 at 09:56:36AM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Mon, Feb 19, 2024 at 04:44:57PM -0300, Fabiano Rosas wrote:
> >> The QMP command query_migrate might see incorrect throughput numbers
> >> if it runs after we've set the migration completion status but before
> >> migration_calculate_complete() has updated s->total_time and s->mbps.
> >> 
> >> The migration status would show COMPLETED, but the throughput value
> >> would be the one from the last iteration and not the one from the
> >> whole migration. This will usually be a larger value due to the time
> >> period being smaller (one iteration).
> >> 
> >> Move migration_calculate_complete() earlier so that the status
> >> MIGRATION_STATUS_COMPLETED is only emitted after the final counters
> >> update.
> >> 
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1182405776
> >> ---
> >>  migration/migration.c | 10 ++
> >>  1 file changed, 6 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index ab21de2cad..7486d59da0 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -102,6 +102,7 @@ static int migration_maybe_pause(MigrationState *s,
> >>   int new_state);
> >>  static void migrate_fd_cancel(MigrationState *s);
> >>  static bool close_return_path_on_source(MigrationState *s);
> >> +static void migration_calculate_complete(MigrationState *s);
> >>  
> >>  static void migration_downtime_start(MigrationState *s)
> >>  {
> >> @@ -2746,6 +2747,7 @@ static void migration_completion(MigrationState *s)
> >>  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
> >>MIGRATION_STATUS_COLO);
> >>  } else {
> >> +migration_calculate_complete(s);
> >>  migrate_set_state(>state, current_active_state,
> >>MIGRATION_STATUS_COMPLETED);
> >>  }
> >> @@ -2784,6 +2786,7 @@ static void bg_migration_completion(MigrationState 
> >> *s)
> >>  goto fail;
> >>  }
> >>  
> >> +migration_calculate_complete(s);
> >>  migrate_set_state(>state, current_active_state,
> >>MIGRATION_STATUS_COMPLETED);
> >>  return;
> >> @@ -2993,12 +2996,15 @@ static void 
> >> migration_calculate_complete(MigrationState *s)
> >>  int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >>  int64_t transfer_time;
> >>  
> >> +/* QMP could read from these concurrently */
> >> +bql_lock();
> >>  migration_downtime_end(s);
> >>  s->total_time = end_time - s->start_time;
> >>  transfer_time = s->total_time - s->setup_time;
> >>  if (transfer_time) {
> >>  s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
> >>  }
> >> +bql_unlock();
> >
> > The lock is not needed?
> >
> > AFAIU that was needed because of things like runstate_set() rather than
> > setting of these fields.
> >
> 
> Don't we need to keep the total_time and mbps update atomic? Otherwise
> query-migrate might see (say) total_time=0 and mbps= or
> total_time= and mbps=.

I thought it wasn't a major concern, but what you said makes sense; taking
it one more time doesn't really hurt after all to provide such benefit.

> 
> Also, what orders s->mbps update before the s->state update? I'd say we
> should probably hold the lock around the whole total_time,mbps,state
> update.

IMHO that's fine; mutex unlock implies a RELEASE.  See atomic.rst:

- ``pthread_mutex_lock`` has acquire semantics, ``pthread_mutex_unlock`` has
  release semantics and synchronizes with a ``pthread_mutex_lock`` for the
  same mutex.

> 
> I'm not entirely sure, what do you think?
> 
> > See migration_update_counters() where it also updates mbps without holding
> > a lock.
> 
> Here it might be less important since it's the middle of the migration,
> there will proabably be more than one query-migrate which would see the
> correct values.

Yep.  I queued this.

Thanks,

-- 
Peter Xu




Re: [PATCH] migration: Fix qmp_query_migrate mbps value

2024-02-21 Thread Fabiano Rosas
Peter Xu  writes:

> On Mon, Feb 19, 2024 at 04:44:57PM -0300, Fabiano Rosas wrote:
>> The QMP command query_migrate might see incorrect throughput numbers
>> if it runs after we've set the migration completion status but before
>> migration_calculate_complete() has updated s->total_time and s->mbps.
>> 
>> The migration status would show COMPLETED, but the throughput value
>> would be the one from the last iteration and not the one from the
>> whole migration. This will usually be a larger value due to the time
>> period being smaller (one iteration).
>> 
>> Move migration_calculate_complete() earlier so that the status
>> MIGRATION_STATUS_COMPLETED is only emitted after the final counters
>> update.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1182405776
>> ---
>>  migration/migration.c | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index ab21de2cad..7486d59da0 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -102,6 +102,7 @@ static int migration_maybe_pause(MigrationState *s,
>>   int new_state);
>>  static void migrate_fd_cancel(MigrationState *s);
>>  static bool close_return_path_on_source(MigrationState *s);
>> +static void migration_calculate_complete(MigrationState *s);
>>  
>>  static void migration_downtime_start(MigrationState *s)
>>  {
>> @@ -2746,6 +2747,7 @@ static void migration_completion(MigrationState *s)
>>  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
>>MIGRATION_STATUS_COLO);
>>  } else {
>> +migration_calculate_complete(s);
>>  migrate_set_state(>state, current_active_state,
>>MIGRATION_STATUS_COMPLETED);
>>  }
>> @@ -2784,6 +2786,7 @@ static void bg_migration_completion(MigrationState *s)
>>  goto fail;
>>  }
>>  
>> +migration_calculate_complete(s);
>>  migrate_set_state(>state, current_active_state,
>>MIGRATION_STATUS_COMPLETED);
>>  return;
>> @@ -2993,12 +2996,15 @@ static void 
>> migration_calculate_complete(MigrationState *s)
>>  int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>>  int64_t transfer_time;
>>  
>> +/* QMP could read from these concurrently */
>> +bql_lock();
>>  migration_downtime_end(s);
>>  s->total_time = end_time - s->start_time;
>>  transfer_time = s->total_time - s->setup_time;
>>  if (transfer_time) {
>>  s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
>>  }
>> +bql_unlock();
>
> The lock is not needed?
>
> AFAIU that was needed because of things like runstate_set() rather than
> setting of these fields.
>

Don't we need to keep the total_time and mbps update atomic? Otherwise
query-migrate might see (say) total_time=0 and mbps= or
total_time= and mbps=.

Also, what orders s->mbps update before the s->state update? I'd say we
should probably hold the lock around the whole total_time,mbps,state
update.

I'm not entirely sure, what do you think?

> See migration_update_counters() where it also updates mbps without holding
> a lock.

Here it might be less important since it's the middle of the migration,
there will proabably be more than one query-migrate which would see the
correct values.

>
> Other than that it looks all good.
>
>>  }
>>  
>>  static void update_iteration_initial_status(MigrationState *s)
>> @@ -3145,7 +3151,6 @@ static void migration_iteration_finish(MigrationState 
>> *s)
>>  bql_lock();
>>  switch (s->state) {
>>  case MIGRATION_STATUS_COMPLETED:
>> -migration_calculate_complete(s);
>>  runstate_set(RUN_STATE_POSTMIGRATE);
>>  break;
>>  case MIGRATION_STATUS_COLO:
>> @@ -3189,9 +3194,6 @@ static void 
>> bg_migration_iteration_finish(MigrationState *s)
>>  bql_lock();
>>  switch (s->state) {
>>  case MIGRATION_STATUS_COMPLETED:
>> -migration_calculate_complete(s);
>> -break;
>> -
>>  case MIGRATION_STATUS_ACTIVE:
>>  case MIGRATION_STATUS_FAILED:
>>  case MIGRATION_STATUS_CANCELLED:
>> -- 
>> 2.35.3
>> 



Re: [PATCH] migration: Fix qmp_query_migrate mbps value

2024-02-20 Thread Peter Xu
On Mon, Feb 19, 2024 at 04:44:57PM -0300, Fabiano Rosas wrote:
> The QMP command query_migrate might see incorrect throughput numbers
> if it runs after we've set the migration completion status but before
> migration_calculate_complete() has updated s->total_time and s->mbps.
> 
> The migration status would show COMPLETED, but the throughput value
> would be the one from the last iteration and not the one from the
> whole migration. This will usually be a larger value due to the time
> period being smaller (one iteration).
> 
> Move migration_calculate_complete() earlier so that the status
> MIGRATION_STATUS_COMPLETED is only emitted after the final counters
> update.
> 
> Signed-off-by: Fabiano Rosas 
> ---
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1182405776
> ---
>  migration/migration.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index ab21de2cad..7486d59da0 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -102,6 +102,7 @@ static int migration_maybe_pause(MigrationState *s,
>   int new_state);
>  static void migrate_fd_cancel(MigrationState *s);
>  static bool close_return_path_on_source(MigrationState *s);
> +static void migration_calculate_complete(MigrationState *s);
>  
>  static void migration_downtime_start(MigrationState *s)
>  {
> @@ -2746,6 +2747,7 @@ static void migration_completion(MigrationState *s)
>  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
>MIGRATION_STATUS_COLO);
>  } else {
> +migration_calculate_complete(s);
>  migrate_set_state(>state, current_active_state,
>MIGRATION_STATUS_COMPLETED);
>  }
> @@ -2784,6 +2786,7 @@ static void bg_migration_completion(MigrationState *s)
>  goto fail;
>  }
>  
> +migration_calculate_complete(s);
>  migrate_set_state(>state, current_active_state,
>MIGRATION_STATUS_COMPLETED);
>  return;
> @@ -2993,12 +2996,15 @@ static void 
> migration_calculate_complete(MigrationState *s)
>  int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>  int64_t transfer_time;
>  
> +/* QMP could read from these concurrently */
> +bql_lock();
>  migration_downtime_end(s);
>  s->total_time = end_time - s->start_time;
>  transfer_time = s->total_time - s->setup_time;
>  if (transfer_time) {
>  s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
>  }
> +bql_unlock();

The lock is not needed?

AFAIU that was needed because of things like runstate_set() rather than
setting of these fields.

See migration_update_counters() where it also updates mbps without holding
a lock.

Other than that it looks all good.

>  }
>  
>  static void update_iteration_initial_status(MigrationState *s)
> @@ -3145,7 +3151,6 @@ static void migration_iteration_finish(MigrationState 
> *s)
>  bql_lock();
>  switch (s->state) {
>  case MIGRATION_STATUS_COMPLETED:
> -migration_calculate_complete(s);
>  runstate_set(RUN_STATE_POSTMIGRATE);
>  break;
>  case MIGRATION_STATUS_COLO:
> @@ -3189,9 +3194,6 @@ static void 
> bg_migration_iteration_finish(MigrationState *s)
>  bql_lock();
>  switch (s->state) {
>  case MIGRATION_STATUS_COMPLETED:
> -migration_calculate_complete(s);
> -break;
> -
>  case MIGRATION_STATUS_ACTIVE:
>  case MIGRATION_STATUS_FAILED:
>  case MIGRATION_STATUS_CANCELLED:
> -- 
> 2.35.3
> 

-- 
Peter Xu




[PATCH] migration: Fix qmp_query_migrate mbps value

2024-02-19 Thread Fabiano Rosas
The QMP command query_migrate might see incorrect throughput numbers
if it runs after we've set the migration completion status but before
migration_calculate_complete() has updated s->total_time and s->mbps.

The migration status would show COMPLETED, but the throughput value
would be the one from the last iteration and not the one from the
whole migration. This will usually be a larger value due to the time
period being smaller (one iteration).

Move migration_calculate_complete() earlier so that the status
MIGRATION_STATUS_COMPLETED is only emitted after the final counters
update.

Signed-off-by: Fabiano Rosas 
---
CI run: https://gitlab.com/farosas/qemu/-/pipelines/1182405776
---
 migration/migration.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ab21de2cad..7486d59da0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -102,6 +102,7 @@ static int migration_maybe_pause(MigrationState *s,
  int new_state);
 static void migrate_fd_cancel(MigrationState *s);
 static bool close_return_path_on_source(MigrationState *s);
+static void migration_calculate_complete(MigrationState *s);
 
 static void migration_downtime_start(MigrationState *s)
 {
@@ -2746,6 +2747,7 @@ static void migration_completion(MigrationState *s)
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
 } else {
+migration_calculate_complete(s);
 migrate_set_state(>state, current_active_state,
   MIGRATION_STATUS_COMPLETED);
 }
@@ -2784,6 +2786,7 @@ static void bg_migration_completion(MigrationState *s)
 goto fail;
 }
 
+migration_calculate_complete(s);
 migrate_set_state(>state, current_active_state,
   MIGRATION_STATUS_COMPLETED);
 return;
@@ -2993,12 +2996,15 @@ static void migration_calculate_complete(MigrationState 
*s)
 int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 int64_t transfer_time;
 
+/* QMP could read from these concurrently */
+bql_lock();
 migration_downtime_end(s);
 s->total_time = end_time - s->start_time;
 transfer_time = s->total_time - s->setup_time;
 if (transfer_time) {
 s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
 }
+bql_unlock();
 }
 
 static void update_iteration_initial_status(MigrationState *s)
@@ -3145,7 +3151,6 @@ static void migration_iteration_finish(MigrationState *s)
 bql_lock();
 switch (s->state) {
 case MIGRATION_STATUS_COMPLETED:
-migration_calculate_complete(s);
 runstate_set(RUN_STATE_POSTMIGRATE);
 break;
 case MIGRATION_STATUS_COLO:
@@ -3189,9 +3194,6 @@ static void bg_migration_iteration_finish(MigrationState 
*s)
 bql_lock();
 switch (s->state) {
 case MIGRATION_STATUS_COMPLETED:
-migration_calculate_complete(s);
-break;
-
 case MIGRATION_STATUS_ACTIVE:
 case MIGRATION_STATUS_FAILED:
 case MIGRATION_STATUS_CANCELLED:
-- 
2.35.3