Hello Fabiano,

On Wed, 7 Jan 2026 at 18:54, Fabiano Rosas <[email protected]> wrote:
> I like this because it forces us to determine more clearly what is the
> necessary condition for a state change. This could eventually allow the
> abstraction of the qapi_event_send_migration() to a higher
> layer. Something like:
>
> void qmp_migrate() {
>     t:migrate=true
>
>     migration_setup() :: do setup, mig_setup_done=true
>     migration_advance_state() :: checks the triggers, changes state and
>                                  sends the event
>
>     migration_start() :: migrate, mig_done=true
>                          failure, mig_failed=true
>                          etc
>     migration_advance_state()
>
>     migrate_vmstate() :: device state migration, mig_device_done=true
>     migration_advance_state()
>
>  etc..
> }
>
> IOW, we could do a better job of separating what is work, what is
> migration control flow, what is error handling, etc.

* Yes, indeed. Above skeleton code conveys the plausible
segregation/stages well.

> What I'm trying to convey is that we have:
>
> 1) events API that needs to be kept stable, this list of states that
>    libvirt sees and at what moments we emit them.
===
  qemuProcessHandleMigrationStatus & qemuMigrationUpdateJobType
    -> 
https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_process.c#L1766
    -> 
https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_migration.c?ref_type=heads#L1931
===
* I was trying to see how libvirtd(8) handles QEMU migration states.
Looking at the above functions there, it seems they don't do much with
it. Only MIGRATION_STATUS_POSTCOPY_* has some handling, while other
states are not handled for anything. Interestingly, there's no _FAILED
state in there, maybe they call it _ERROR.

* While I get the importance of not breaking APIs, still, simplifying
migration states on the QEMU side should help them too.

> 2) MigrationStatus being used as an internal record of the current
>    (loosely defined) migration phase. This is "arbitrary", hence we're
>    discussing adding a new MigrationStatus "just" to make sure we don't
>    start a new migration at the wrong moment.
>
> I'm trying to understand if you want to cover 1, 2 or both.
>
> I would suggest we first take all of the internal tracking, i.e. #2, the
> "if (state==MIGRATION_STATUS)" code and convert them to use some other
> state tracking, either the triggers as you suggest, or random booleans
> sprinkled all over, it's not immediately important.
>
> Once that is done, then we could freeze the #1, MigrationStatus. It
> would only change whenever we wanted to change the API and that should
> be a well documented change.

* Yes, sounds good. We could start with the QEMU internal state/phase
tracking and then go to #1 above once we see how it all works in
practice.

> Ok, maybe I'm splittling hairs here, I was trying to understand whether
> all of these "if (s->state ...)" have the same semantics.
>
> a) For cases such as CANCELLING: that could be a simple
>    s->trigger[MIGRATE_CANCEL]=1.
>
>   (we're not removing the CANCELLING state due to the API stability, but
>   still)
>
> b) For error conditions: s->event[FAILED]=1, then (possibly at a later
>    point in migration_change_state):
>
>    if (s->event[FAILED] && !s->trigger[MIGRATE_CANCEL]) {
>       migrate_set_state(s->state, MIGRATION_STATUS_FAILED);
>    }

* Do we have to check !MIGRATE_CANCEL like this? It's not clean.
Ideally if an error/failure event occurs before the user cancels, then
cancel can be ignored, no? Because migration is anyway going to stop
or end. OTOH, if we cancel while processing an error/failure, end user
may not see that error because we report - migration was cancelled.

> b) For postcopy resume/pause, etc, maybe an actual state machine that can
>    only be in one state would be helpful.
>
> c) For "we reached this point, so set this state", most of those could
>    just be an invocation to migration_change_state() and, as you
>    suggest, that would look for the evidence elsewhere to know what
>    state to set:
>
>    if (s->trigger[MIGRATE] && s->event[COMPLETED]) {
>       migrate_set_state(s->state, MIGRATION_STATUS_COMPLETED);
>    }

* Yes, right. We need to define/differentiate between _what_ is the
state and _why_ is that state.

* How do we go from here? Next step?

Thank you.
---
  - Prasad


Reply via email to