On 26.08.2025 14:51, Juraj Marcin wrote:
From: Juraj Marcin <[email protected]>Commit 48814111366b ("migration: Always set DEVICE state") introduced DEVICE state to postcopy, which moved the actual state transition that leads to POSTCOPY_ACTIVE. However, the error handling part of the postcopy_start() function still expects the state POSTCOPY_ACTIVE, but depending on where an error happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but never POSTCOPY_ACTIVE, as this transition now happens just before a successful return from the function. Instead, accept any state except CANCELLING when transitioning to FAILED state. Cc: [email protected] Fixes: 48814111366b ("migration: Always set DEVICE state") Signed-off-by: Juraj Marcin <[email protected]> --- In the RFC[1] where this patch was discussed, there was also a suggestion for a helper function migrate_set_failure() that would check if the state is not CANCELLING and then set migration error and FAILED state. I discussed the implementation with Peter, and we came to a conclusion that instead of patching such clean-up on top of the current error handling code, it might be more useful to do a larger refactor and clean-up of all error handling in the migration code. Such clean-up should reduce the number of places where we need to explicitly transition to a FAILED state (ideally to one, or only a couple of places), and instead only set an appropriate migration error using migrate_set_error(). Additionally, it would also refactor inappropriate uses of QEMUFile errors where the error is not really an error of the underlying channel and migrate_set_error() should be used instead. [1]: https://lore.kernel.org/all/[email protected]/
Ping? Can we apply this to the master branch, so I can pick it up for the stable series? Thanks, /mjt
