Hi Fabiano, 

Thanks for the quick review and for catching the make check failure. 
My apologies for that oversight, it's definitely an embarrassing miss on my 
part.
I see what happened there. I ran make check without sudo, therefore the 
postcopy tests
were missed ; because it requires userfaultfd(). Only 62 out of the total 79 
migration
tests were run.

In any case, I have identified the cause and rectified it in my next version.
I will be sending out a new version of the patch shortly, making sure I run all 
the tests
alongside manual testing.

On Wed, Jul 16, 2025 at 04:58:52PM -0300, Fabiano Rosas wrote:
> Arun Menon <arme...@redhat.com> writes:
> 
> > Hello,
> >
> > Currently, when a migration of a VM with an encrypted vTPM
> > fails on the destination host (e.g., due to a mismatch in secret values),
> > the error message displayed on the source host is generic and unhelpful.
> >
> > For example, a typical error looks like this:
> > "operation failed: job 'migration out' failed: Sibling indicated error 1.
> > operation failed: job 'migration in' failed: load of migration failed:
> > Input/output error"
> >
> > This message does not provide any specific indication of a vTPM failure.
> > Such generic errors are logged using error_report(), which prints to
> > the console/monitor but does not make the detailed error accessible via
> > the QMP query-migrate command.
> >
> > This series addresses the issue, by ensuring that specific TPM error
> > messages are propagated via the QEMU Error object.
> > To make this possible,
> > - A set of functions in the call stack is changed
> >   to incorporate an Error object as an additional parameter.
> > - Also, the TPM backend makes use of a new hook called post_load_errp()
> >   that explicitly passes an Error object.
> >
> > It is organized as follows,
> >  - Patches 1-21 focuses on pushing Error object into the functions
> >    that are important in the call stack where TPM errors are observed.
> >    We still need to make changes in rest of the functions in savevm.c
> >    such that they also incorporate the errp object for propagating errors.
> >  - Patch 22 introduces the new variants of the hooks in VMStateDescription
> >    structure. These hooks should be used in future implementations.
> >  - Patch 23 focuses on changing the TPM backend such that the errors are
> >    set in the Error object.
> >
> > While this series focuses specifically on TPM error reporting during
> > live migration, it lays the groundwork for broader improvements.
> > A lot of methods in savevm.c that previously returned an integer now capture
> > errors in the Error object, enabling other modules to adopt the
> > post_load_errp hook in the future.
> >
> > One such change previously attempted:
> > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01727.html
> >
> > Resolves: https://issues.redhat.com/browse/RHEL-82826
> >
> > Signed-off-by: Arun Menon <arme...@redhat.com>
> > ---
> > Changes in v4:
> > - Split the patches into smaller ones based on functions. Pass NULL in the
> >   caller until errp is made available. Every function that has an
> >   Error **errp object passed to it, ensures that it sets the errp object
> >   in case of failure.
> > - A few more functions within loadvm_process_command() now handle errors 
> > using
> >   the errp object. I've converted these for consistency, taking Daniel's
> >   patches (link above) as a reference.
> > - Along with the post_load_errp() hook, other duplicate hooks are also 
> > introduced.
> >   This will enable us to migrate to the newer versions eventually.
> > - Fix some semantic errors, like using error_propagate_prepend() in places 
> > where
> >   we need to preserve existing behaviour of accumulating the error in 
> > local_err
> >   and then propagating it to errp. This can be refactored in a later commit.
> > - Add more information in commit messages explaining the changes.
> > - Link to v3: 
> > https://lore.kernel.org/qemu-devel/20250702-propagate_tpm_error-v3-0-986d94540...@redhat.com
> >
> > Changes in v3:
> > - Split the 2nd patch into 2. Introducing post_load_with_error() hook
> >   has been separated from using it in the backends TPM module. This is
> >   so that it can be acknowledged.
> > - Link to v2: 
> > https://lore.kernel.org/qemu-devel/20250627-propagate_tpm_error-v2-0-85990c89d...@redhat.com
> >
> > Changes in v2:
> > - Combine the first two changes into one, focusing on passing the
> >   Error object (errp) consistently through functions involved in
> >   loading the VM's state. Other functions are not yet changed.
> > - As suggested in the review comment, add null checks for errp
> >   before adding error messages, preventing crashes.
> >   We also now correctly set errors when post-copy migration fails.
> > - In process_incoming_migration_co(), switch to error_prepend
> >   instead of error_setg. This means we now null-check local_err in
> >   the "fail" section before using it, preventing dereferencing issues.
> > - Link to v1: 
> > https://lore.kernel.org/qemu-devel/20250624-propagate_tpm_error-v1-0-2171487a5...@redhat.com
> >
> > ---
> > Arun Menon (23):
> >       migration: push Error **errp into vmstate_subsection_load()
> >       migration: push Error **errp into vmstate_load_state()
> >       migration: push Error **errp into qemu_loadvm_state_header()
> >       migration: push Error **errp into vmstate_load()
> >       migration: push Error **errp into qemu_loadvm_section_start_full()
> >       migration: push Error **errp into qemu_loadvm_section_part_end()
> >       migration: push Error **errp into loadvm_process_command()
> >       migration: push Error **errp into loadvm_handle_cmd_packaged()
> >       migration: push Error **errp into ram_postcopy_incoming_init()
> >       migration: push Error **errp into loadvm_postcopy_handle_advise()
> >       migration: push Error **errp into loadvm_postcopy_handle_listen()
> >       migration: push Error **errp into loadvm_postcopy_handle_run()
> >       migration: push Error **errp into loadvm_postcopy_ram_handle_discard()
> >       migration: make loadvm_postcopy_handle_resume() void
> >       migration: push Error **errp into loadvm_handle_recv_bitmap()
> >       migration: push Error **errp into loadvm_process_enable_colo()
> >       migration: push Error **errp into 
> > loadvm_postcopy_handle_switchover_start()
> >       migration: push Error **errp into qemu_loadvm_state_main()
> >       migration: push Error **errp into qemu_loadvm_state()
> >       migration: push Error **errp into qemu_load_device_state()
> >       migration: Capture error in postcopy_ram_listen_thread()
> >       migration: Add error-parameterized function variants in VMSD struct
> >       backends/tpm: Propagate vTPM error on migration failure
> >
> >  backends/tpm/tpm_emulator.c |  39 +++---
> >  hw/display/virtio-gpu.c     |   2 +-
> >  hw/pci/pci.c                |   2 +-
> >  hw/s390x/virtio-ccw.c       |   2 +-
> >  hw/scsi/spapr_vscsi.c       |   2 +-
> >  hw/vfio/pci.c               |   2 +-
> >  hw/virtio/virtio-mmio.c     |   2 +-
> >  hw/virtio/virtio-pci.c      |   2 +-
> >  hw/virtio/virtio.c          |   4 +-
> >  include/migration/colo.h    |   2 +-
> >  include/migration/vmstate.h |  13 +-
> >  migration/colo.c            |  10 +-
> >  migration/cpr.c             |   4 +-
> >  migration/migration.c       |  19 +--
> >  migration/postcopy-ram.c    |   9 +-
> >  migration/postcopy-ram.h    |   2 +-
> >  migration/ram.c             |  14 +--
> >  migration/ram.h             |   4 +-
> >  migration/savevm.c          | 299 
> > +++++++++++++++++++++++++-------------------
> >  migration/savevm.h          |   7 +-
> >  migration/vmstate-types.c   |  10 +-
> >  migration/vmstate.c         |  83 ++++++++----
> >  tests/unit/test-vmstate.c   |  18 +--
> >  ui/vdagent.c                |   2 +-
> >  24 files changed, 325 insertions(+), 228 deletions(-)
> > ---
> > base-commit: 9a4e273ddec3927920c5958d2226c6b38b543336
> > change-id: 20250624-propagate_tpm_error-bf4ae6c23d30
> >
> > Best regards,
> 
> Hi Arun, make check is failing, please take a look:
> 
> QTEST_LOG=1 QTEST_QEMU_BINARY=./qemu-system-x86_64 \
> ./tests/qtest/migration-test \
> --full -p /x86_64/migration/postcopy/recovery/double-failures/handshake
> ...
> qemu-system-x86_64: ../util/error.c:65: error_setv: Assertion `*errp ==
> NULL' failed.
> 

Regards,
Arun


Reply via email to