This change introduces support for confidential guests (SEV-ES, SEV-SNP and TDX) to reset/reboot just like other non-confidential guests. Currently, a reboot intiated from the confidential guest results in termination of the QEMU hypervisor as the CPUs are not resettable. As the initial state of the guest including private memory is locked and encrypted, the contents of that memory will not be accessible post reset. Hence a new KVM file descriptor must be opened to create a new confidential VM context closing the old one. All KVM VM specific ioctls must be called again. New VCPU file descriptors must be created against the new KVM fd and most VCPU ioctls must be called again as well.
This change perfoms closing of the old KVM fd and creating a new one. After the new KVM fd is opened, all generic and architecture specific ioctl calls are issued again. Notifiers are added to notify subsystems that: - The KVM file fd is about to be changed to state sync-ing from KVM to QEMU should be done if required. - The KVM file fd has changed, so ioctl calls to the new KVM fd has to be performed again. - That new VCPU fds are created so that VCPU ioctl calls must be called again where required. Specific subsystems use these notifiers to re-issue ioctl calls where required. Changes are made to SEV and TDX modules to reinitialize the confidential guest state and seal it again. Along the way, some bug fixes are made so that some initialization functions can be called again. Some refactoring of existing code is done so that both init and reset paths can use them. Tested on TDX and SEV-SNP. CI pipeline passes: https://gitlab.com/anisinha/qemu/-/commit/eb647d2299ba8aac62a4bffbeb470c665c831421/pipelines?ref=coco-reboot-v2 Please review and test. Changelog: v2: - Bugfixes. - Added a new machine option so that we can exercize most of the non-coco changes related to reboot on non-coco platforms. - added a new functional test. Currently its skipped on CI pipeline as KVM is not enabled (no /dev/kvm on the container)for QEMU CI tests. It can be run manually and it passes on those systems where KVM is enabled. - Addressed comments from v1 with regards to refactoring of code, code simplication by removal of redundant stuff, moved around code so that notifiers, migration blockers are added only on one place. - Added some tracepoints for future debugging on newly added functions. - Rebased. One thing I have not addressed in v2 is to combine pre and post notifiers into one with a boolean argument to differentiate them. This will be addressed as a part of v3 which is here: https://gitlab.com/anisinha/qemu/-/commits/coco-reboot-v3. The change is getting tested: https://gitlab.com/anisinha/qemu/-/commit/7b3ef489a6d45c0282c851c38c54b6a2c3e2c20d CC: [email protected] CC: [email protected] CC: [email protected] CC: [email protected] CC: [email protected] Ani Sinha (32): i386/kvm: avoid installing duplicate msr entries in msr_handlers hw/accel: add a per-accelerator callback to change VM accelerator handle system/physmem: add helper to reattach existing memory after KVM VM fd change accel/kvm: add changes required to support KVM VM file descriptor change accel/kvm: mark guest state as unprotected after vm file descriptor change accel/kvm: add a notifier to indicate KVM VM file descriptor has changed accel/kvm: add notifier to inform that the KVM VM file fd is about to be changed i386/kvm: unregister smram listeners prior to vm file descriptor change kvm/i386: implement architecture support for kvm file descriptor change hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset kvm/i386: reload firmware for confidential guest reset accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset i386/tdx: refactor TDX firmware memory initialization code into a new function i386/tdx: finalize TDX guest state upon reset i386/tdx: add a pre-vmfd change notifier to reset tdx state i386/sev: add migration blockers only once i386/sev: add notifiers only once i386/sev: free existing launch update data and kernel hashes data on init i386/sev: add support for confidential guest reset hw/vfio: generate new file fd for pseudo device and rebind existing descriptors kvm/i8254: refactor pit initialization into a helper kvm/i8254: add support for confidential guest reset hw/hyperv/vmbus: add support for confidential guest reset accel/kvm: add a per-confidential class callback to unlock guest state kvm/xen-emu: re-initialize capabilities during confidential guest reset kvm/xen_evtchn: add support for confidential guest reset ppc/openpic: create a new openpic device and reattach mem region on coco reset kvm/vcpu: add notifiers to inform vcpu file descriptor change kvm/i386/apic: set local apic after vcpu file descriptors changed kvm/clock: add support for confidential guest reset hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' tests/functional/x86_64: add functional test to exercise vm fd change on reset MAINTAINERS | 6 + accel/kvm/kvm-all.c | 365 ++++++++++++++++-- accel/kvm/trace-events | 2 + accel/stubs/kvm-stub.c | 26 ++ hw/core/machine.c | 22 ++ hw/hyperv/vmbus.c | 30 ++ hw/i386/kvm/apic.c | 13 + hw/i386/kvm/clock.c | 56 +++ hw/i386/kvm/i8254.c | 83 ++-- hw/i386/kvm/xen_evtchn.c | 100 ++++- hw/i386/x86-common.c | 50 ++- hw/intc/openpic_kvm.c | 108 ++++-- hw/vfio/helpers.c | 81 +++- include/accel/accel-ops.h | 1 + include/hw/core/boards.h | 6 + include/hw/i386/apic_internal.h | 1 + include/hw/i386/x86.h | 5 +- include/system/confidential-guest-support.h | 27 ++ include/system/kvm.h | 55 +++ include/system/physmem.h | 1 + system/physmem.c | 28 ++ system/runstate.c | 36 +- target/arm/kvm.c | 10 + target/i386/kvm/kvm.c | 209 ++++++++-- target/i386/kvm/tdx.c | 142 +++++-- target/i386/kvm/tdx.h | 1 + target/i386/kvm/trace-events | 4 + target/i386/kvm/xen-emu.c | 45 ++- target/i386/sev.c | 97 ++++- target/i386/trace-events | 1 + target/loongarch/kvm/kvm.c | 10 + target/mips/kvm.c | 10 + target/ppc/kvm.c | 10 + target/riscv/kvm/kvm-cpu.c | 10 + target/s390x/kvm/kvm.c | 10 + tests/functional/x86_64/meson.build | 1 + .../x86_64/test_vmfd_change_reboot.py | 75 ++++ 37 files changed, 1544 insertions(+), 193 deletions(-) create mode 100755 tests/functional/x86_64/test_vmfd_change_reboot.py -- 2.42.0
