Hi,

On Tue, 23 Dec 2025 at 21:00, Peter Xu <[email protected]> wrote:
> One thing good about exposing such status via QAPI is, it can help us
> diagnose issues by seeing CANCELLING / FAILING even looking at
> query-migrate results (as normally when bug happens we can't see the
> internal status..), so that we know either it's explicitly cancelled, or
> something went wrong.
>
> If it's a completely hidden / internal status, we may see ACTIVE even if
> something wrong happened..

* Both process state and reason(s) for the state change needs to be
visible to the user. But states like cancelling/failing are redundant,
users would derive the same conclusion from CANCELLED and CANCELLING
OR FAILED AND FAILING. Besides, migration_cleanup() does exactly the
same steps irrespective of whether migration is failing or cancelling
or failed or cancelled.

> My current hope is any mgmt should normally by default ignore new migration
> states..  If that's always achieved, it looks to me adding FAILING directly
> into migration status would still have some benefits on debugging.

* libvirtd(8) complains about unknown states multiple times:
      libvirtd[2194267]: unknown status 'failing' in migration event
      libvirtd[2194267]: unknown status 'failing' in migration event
      libvirtd[2194267]: unknown status 'failing' in migration event


> > > @@ -2907,7 +2914,7 @@ fail_closefb:
> > >      qemu_fclose(fb);
> > >  fail:
> > >      if (ms->state != MIGRATION_STATUS_CANCELLING) {
> > > -        migrate_set_state(&ms->state, ms->state, 
> > > MIGRATION_STATUS_FAILED);
> > > +        migrate_set_state(&ms->state, ms->state, 
> > > MIGRATION_STATUS_FAILING);
> > >      }
> >
> > This is a good example where having MigrationStatus makes the code more
> > complicated. This could be exiting=true, running=false, etc. It
> > shouldn't matter why this routine failed. If we reach
> > migration_cleanup() and, at the very end, state is CANCELLING, then we
> > know the cancel command has caused this and set the state to
> > CANCELLED. If the state was something else, then it's unintended and we
> > set FAILED.
>
> If it'll be an internal status, we'll still need to identify if someone
> already have cancelled it, right?
>
> Assuming we introduce stop_reason flag, making it:
>
> enum {
>     MIG_STOP_REASON_CANCEL,
>     MIG_STOP_REASON_FAIL,
> } MigrationStopReason;
>
> Then we can switch to CANCELLED / FAILED when cleanup from those reasons.
>
> Then here, logically we also need logic like:
>
>     if (stop_reason != MIG_STOP_REASON_CANCEL) {
>         stop_reason = MIG_STOP_REASON_FAIL;
>     }
>
> Because we want to make sure when the user already triggered cancel, it
> won't show FAILED but only show CANCELLED at last?

* I think the way we are setting/changing these states in as many
locations is only adding to the complications. Do we have to
explicitly set these states like this? What if migration_cleanup()
always sets the state to 'STOP'. Similarly other places set the state
to a predefined state. OR
===
    struct {
        current_state;
        old_state;
        event/trigger;
        reason[];
    } MigrationState s;

    migration_change_state(s) {
          s->old_state = s->current_state;
          if (s->current_state == START && s->trigger ==
'connections-established') {
              s->current_state = ACTIVE;
              s->reason = "connections-established, migration starting"
          } else if (s->current_state == ACTIVE && s->trigger == 'completed') {
              s->current_state = STOP
              s->reason = "migration completed"
          } else if (s->current_state == ACTIVE  && s->trigger == 'pause') {
              s->current_state = PAUSE
              s->reason = "pause, migration paused"
          } else if (s->current_state == ACTIVE && s->trigger ==
'error-occurred') {
              s->current_state = STOP
              s->reason = "Error occurred, migration failed"
          } else if (s->current_state == ACTIVE && s->trigger ==
'user-cancel') {
              s->current_state = STOP
              s->reason = "user-cancel, migration cancelled"
         } else {
              s->current_state = s->current_state;
              warn_msg("unknown combination, maybe define a new rule?");
         }
    }
===
* We define explicit rules for the state change and accordingly we
only call migration_change_state() at any point and it'll change to an
appropriate next state, recording the due reason for the change.

Wdyt...?

Thank you.
---
  - Prasad


Reply via email to