On 26.08.2025 14:51, Juraj Marcin wrote:
From: Juraj Marcin <[email protected]>

Commit 48814111366b ("migration: Always set DEVICE state") introduced
DEVICE state to postcopy, which moved the actual state transition that
leads to POSTCOPY_ACTIVE.

However, the error handling part of the postcopy_start() function still
expects the state POSTCOPY_ACTIVE, but depending on where an error
happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but
never POSTCOPY_ACTIVE, as this transition now happens just before a
successful return from the function.

Instead, accept any state except CANCELLING when transitioning to FAILED
state.

Cc: [email protected]
Fixes: 48814111366b ("migration: Always set DEVICE state")
Signed-off-by: Juraj Marcin <[email protected]>

---
In the RFC[1] where this patch was discussed, there was also a
suggestion for a helper function migrate_set_failure() that would check
if the state is not CANCELLING and then set migration error and FAILED
state. I discussed the implementation with Peter, and we came to a
conclusion that instead of patching such clean-up on top of the current
error handling code, it might be more useful to do a larger refactor and
clean-up of all error handling in the migration code.

Such clean-up should reduce the number of places where we need to
explicitly transition to a FAILED state (ideally to one, or only a
couple of places), and instead only set an appropriate migration error
using migrate_set_error(). Additionally, it would also refactor
inappropriate uses of QEMUFile errors where the error is not really an
error of the underlying channel and migrate_set_error() should be used
instead.

[1]: https://lore.kernel.org/all/[email protected]/

Ping?  Can we apply this to the master branch, so I can pick it up for
the stable series?

Thanks,

/mjt

Reply via email to