Hi Fabiano,
Thanks for the quick review and for catching the make check failure. My apologies for that oversight, it's definitely an embarrassing miss on my part. I see what happened there. I ran make check without sudo, therefore the postcopy tests were missed ; because it requires userfaultfd(). Only 62 out of the total 79 migration tests were run. In any case, I have identified the cause and rectified it in my next version. I will be sending out a new version of the patch shortly, making sure I run all the tests alongside manual testing. On Wed, Jul 16, 2025 at 04:58:52PM -0300, Fabiano Rosas wrote: > Arun Menon <arme...@redhat.com> writes: > > > Hello, > > > > Currently, when a migration of a VM with an encrypted vTPM > > fails on the destination host (e.g., due to a mismatch in secret values), > > the error message displayed on the source host is generic and unhelpful. > > > > For example, a typical error looks like this: > > "operation failed: job 'migration out' failed: Sibling indicated error 1. > > operation failed: job 'migration in' failed: load of migration failed: > > Input/output error" > > > > This message does not provide any specific indication of a vTPM failure. > > Such generic errors are logged using error_report(), which prints to > > the console/monitor but does not make the detailed error accessible via > > the QMP query-migrate command. > > > > This series addresses the issue, by ensuring that specific TPM error > > messages are propagated via the QEMU Error object. > > To make this possible, > > - A set of functions in the call stack is changed > > to incorporate an Error object as an additional parameter. > > - Also, the TPM backend makes use of a new hook called post_load_errp() > > that explicitly passes an Error object. > > > > It is organized as follows, > > - Patches 1-21 focuses on pushing Error object into the functions > > that are important in the call stack where TPM errors are observed. > > We still need to make changes in rest of the functions in savevm.c > > such that they also incorporate the errp object for propagating errors. > > - Patch 22 introduces the new variants of the hooks in VMStateDescription > > structure. These hooks should be used in future implementations. > > - Patch 23 focuses on changing the TPM backend such that the errors are > > set in the Error object. > > > > While this series focuses specifically on TPM error reporting during > > live migration, it lays the groundwork for broader improvements. > > A lot of methods in savevm.c that previously returned an integer now capture > > errors in the Error object, enabling other modules to adopt the > > post_load_errp hook in the future. > > > > One such change previously attempted: > > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01727.html > > > > Resolves: https://issues.redhat.com/browse/RHEL-82826 > > > > Signed-off-by: Arun Menon <arme...@redhat.com> > > --- > > Changes in v4: > > - Split the patches into smaller ones based on functions. Pass NULL in the > > caller until errp is made available. Every function that has an > > Error **errp object passed to it, ensures that it sets the errp object > > in case of failure. > > - A few more functions within loadvm_process_command() now handle errors > > using > > the errp object. I've converted these for consistency, taking Daniel's > > patches (link above) as a reference. > > - Along with the post_load_errp() hook, other duplicate hooks are also > > introduced. > > This will enable us to migrate to the newer versions eventually. > > - Fix some semantic errors, like using error_propagate_prepend() in places > > where > > we need to preserve existing behaviour of accumulating the error in > > local_err > > and then propagating it to errp. This can be refactored in a later commit. > > - Add more information in commit messages explaining the changes. > > - Link to v3: > > https://lore.kernel.org/qemu-devel/20250702-propagate_tpm_error-v3-0-986d94540...@redhat.com > > > > Changes in v3: > > - Split the 2nd patch into 2. Introducing post_load_with_error() hook > > has been separated from using it in the backends TPM module. This is > > so that it can be acknowledged. > > - Link to v2: > > https://lore.kernel.org/qemu-devel/20250627-propagate_tpm_error-v2-0-85990c89d...@redhat.com > > > > Changes in v2: > > - Combine the first two changes into one, focusing on passing the > > Error object (errp) consistently through functions involved in > > loading the VM's state. Other functions are not yet changed. > > - As suggested in the review comment, add null checks for errp > > before adding error messages, preventing crashes. > > We also now correctly set errors when post-copy migration fails. > > - In process_incoming_migration_co(), switch to error_prepend > > instead of error_setg. This means we now null-check local_err in > > the "fail" section before using it, preventing dereferencing issues. > > - Link to v1: > > https://lore.kernel.org/qemu-devel/20250624-propagate_tpm_error-v1-0-2171487a5...@redhat.com > > > > --- > > Arun Menon (23): > > migration: push Error **errp into vmstate_subsection_load() > > migration: push Error **errp into vmstate_load_state() > > migration: push Error **errp into qemu_loadvm_state_header() > > migration: push Error **errp into vmstate_load() > > migration: push Error **errp into qemu_loadvm_section_start_full() > > migration: push Error **errp into qemu_loadvm_section_part_end() > > migration: push Error **errp into loadvm_process_command() > > migration: push Error **errp into loadvm_handle_cmd_packaged() > > migration: push Error **errp into ram_postcopy_incoming_init() > > migration: push Error **errp into loadvm_postcopy_handle_advise() > > migration: push Error **errp into loadvm_postcopy_handle_listen() > > migration: push Error **errp into loadvm_postcopy_handle_run() > > migration: push Error **errp into loadvm_postcopy_ram_handle_discard() > > migration: make loadvm_postcopy_handle_resume() void > > migration: push Error **errp into loadvm_handle_recv_bitmap() > > migration: push Error **errp into loadvm_process_enable_colo() > > migration: push Error **errp into > > loadvm_postcopy_handle_switchover_start() > > migration: push Error **errp into qemu_loadvm_state_main() > > migration: push Error **errp into qemu_loadvm_state() > > migration: push Error **errp into qemu_load_device_state() > > migration: Capture error in postcopy_ram_listen_thread() > > migration: Add error-parameterized function variants in VMSD struct > > backends/tpm: Propagate vTPM error on migration failure > > > > backends/tpm/tpm_emulator.c | 39 +++--- > > hw/display/virtio-gpu.c | 2 +- > > hw/pci/pci.c | 2 +- > > hw/s390x/virtio-ccw.c | 2 +- > > hw/scsi/spapr_vscsi.c | 2 +- > > hw/vfio/pci.c | 2 +- > > hw/virtio/virtio-mmio.c | 2 +- > > hw/virtio/virtio-pci.c | 2 +- > > hw/virtio/virtio.c | 4 +- > > include/migration/colo.h | 2 +- > > include/migration/vmstate.h | 13 +- > > migration/colo.c | 10 +- > > migration/cpr.c | 4 +- > > migration/migration.c | 19 +-- > > migration/postcopy-ram.c | 9 +- > > migration/postcopy-ram.h | 2 +- > > migration/ram.c | 14 +-- > > migration/ram.h | 4 +- > > migration/savevm.c | 299 > > +++++++++++++++++++++++++------------------- > > migration/savevm.h | 7 +- > > migration/vmstate-types.c | 10 +- > > migration/vmstate.c | 83 ++++++++---- > > tests/unit/test-vmstate.c | 18 +-- > > ui/vdagent.c | 2 +- > > 24 files changed, 325 insertions(+), 228 deletions(-) > > --- > > base-commit: 9a4e273ddec3927920c5958d2226c6b38b543336 > > change-id: 20250624-propagate_tpm_error-bf4ae6c23d30 > > > > Best regards, > > Hi Arun, make check is failing, please take a look: > > QTEST_LOG=1 QTEST_QEMU_BINARY=./qemu-system-x86_64 \ > ./tests/qtest/migration-test \ > --full -p /x86_64/migration/postcopy/recovery/double-failures/handshake > ... > qemu-system-x86_64: ../util/error.c:65: error_setv: Assertion `*errp == > NULL' failed. > Regards, Arun