On 3/11/24 20:03, Fabiano Rosas wrote:
Cédric Le Goater <c...@redhat.com> writes:
On 3/8/24 15:36, Fabiano Rosas wrote:
Cédric Le Goater <c...@redhat.com> writes:
This prepares ground for the changes coming next which add an Error**
argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
now handle the error and fail earlier setting the migration state from
MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
In qemu_savevm_state(), move the cleanup to preserve the error
reported by .save_setup() handlers.
Since the previous behavior was to ignore errors at this step of
migration, this change should be examined closely to check that
cleanups are still correctly done.
Signed-off-by: Cédric Le Goater <c...@redhat.com>
---
Changes in v4:
- Merged cleanup change in qemu_savevm_state()
Changes in v3:
- Set migration state to MIGRATION_STATUS_FAILED
- Fixed error handling to be done under lock in bg_migration_thread()
- Made sure an error is always set in case of failure in
qemu_savevm_state_setup()
migration/savevm.h | 2 +-
migration/migration.c | 27 ++++++++++++++++++++++++---
migration/savevm.c | 26 +++++++++++++++-----------
3 files changed, 40 insertions(+), 15 deletions(-)
diff --git a/migration/savevm.h b/migration/savevm.h
index
74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328
100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -32,7 +32,7 @@
bool qemu_savevm_state_blocked(Error **errp);
void qemu_savevm_non_migratable_list(strList **reasons);
int qemu_savevm_state_prepare(Error **errp);
-void qemu_savevm_state_setup(QEMUFile *f);
+int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
bool qemu_savevm_state_guest_unplug_pending(void);
int qemu_savevm_state_resume_prepare(MigrationState *s);
void qemu_savevm_state_header(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index
a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581
100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
MigThrError thr_error;
bool urgent = false;
+ Error *local_err = NULL;
+ int ret;
thread = migration_threads_add("live_migration", qemu_get_thread_id());
@@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
}
bql_lock();
- qemu_savevm_state_setup(s->to_dst_file);
+ ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
bql_unlock();
+ if (ret) {
+ migrate_set_error(s, local_err);
+ error_free(local_err);
+ migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+ MIGRATION_STATUS_FAILED);
+ goto out;
+ }
+
qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_ACTIVE);
This^ should be before the new block it seems:
GOOD:
migrate_set_state new state setup
migrate_set_state new state wait-unplug
migrate_fd_cancel
migrate_set_state new state cancelling
migrate_fd_cleanup
migrate_set_state new state cancelled
migrate_fd_cancel
ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
BAD:
migrate_set_state new state setup
migrate_fd_cancel
migrate_set_state new state cancelling
migrate_fd_cleanup
migrate_set_state new state cancelled
qemu-system-x86_64: ram_save_setup failed: Input/output error
**
ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
Otherwise migration_iteration_finish() will schedule the cleanup BH and
that will run concurrently with migrate_fd_cancel() issued by the test
and bad things happens.
This hack makes things work :
@@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
qemu_savevm_send_colo_enable(s->to_dst_file);
}
+ qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
+ MIGRATION_STATUS_SETUP);
+
Why move it all the way up here? Has moving the wait_unplug before the
'if (ret)' block not worked for you?
We could be sleeping while holding the BQL. It looked wrong.
Thanks,
C.
bql_lock();
ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
bql_unlock();
We should fix the test instead :) Unless waiting for failover devices
to unplug before the save_setup handlers and not after is ok.
commit c7e0acd5a3f8 ("migration: add new migration state wait-unplug")
is not clear about the justification.:
This patch adds a new migration state called wait-unplug. It is entered
after the SETUP state if failover devices are present. It will transition
into ACTIVE once all devices were succesfully unplugged from the guest.
This is not clear indeed, but to me it seems having the wait-unplug
after setup was important.
=====
PS: I guess the next level in our Freestyle Concurrency video-game is to
make migrate_fd_cancel() stop setting state and poking files and only
set a flag that's tested in the other parts of the code.
Is that a new item on the TODO list?
Yep, I'll add it to the wiki.