To make forward progress on this series and reduce its size, I will be posting those of its patches that can be independently integrated and have some value on their own, to a reduced distribution of reviewers for each. This is what I plan to break out:
migration: fix populate_vfio_info memory: RAM_NAMED_FILE flag memory: flat section iterator oslib: qemu_clear_cloexec migration: simplify blockers migration: simplify notifiers python/machine: QEMUMachine full_args python/machine: QEMUMachine reopen_qmp_connection qapi: strList_from_string qapi: QAPI_LIST_LENGTH qapi: strv_from_strList qapi: strList unit tests - Steve On 12/7/2022 10:48 AM, Steven Sistare wrote: > This series desperately needs review in its intersection with live migration. > The code in other areas has been reviewed and revised multiple times -- thank > you! > > David, Juan, can you spare some time to review this? I have done my best to > order > the patches logically (see the labelled groups in this email), and to provide > complete and clear cover letter and commit messages. Can I do anything to > facilitate, > like doing a code walk through via zoom? > > And of course, I welcome anyone's feedback. > > Here is the original posting. > > https://lore.kernel.org/qemu-devel/[email protected]/ > > - Steve > > On 7/26/2022 12:09 PM, Steve Sistare wrote: >> This version of the live update patch series integrates live update into the >> live migration framework. The new interfaces are: >> * mode (migration parameter) >> * cpr-exec-args (migration parameter) >> * file (migration URI) >> * migrate-mode-enable (command-line argument) >> * only-cpr-capable (command-line argument) >> >> Provide the cpr-exec and cpr-reboot migration modes for live update. These >> save and restore VM state, with minimal guest pause time, so that qemu may be >> updated to a new version in between. The caller sets the mode parameter >> before invoking the migrate or migrate-incoming commands. >> >> In cpr-reboot mode, the migrate command saves state to a file, allowing >> one to quit qemu, reboot to an updated kernel, start an updated version of >> qemu, and resume via the migrate-incoming command. The caller must specify >> a migration URI that writes to and reads from a file. Unlike normal mode, >> the use of certain local storage options does not block the migration, but >> the caller must not modify guest block devices between the quit and restart. >> The guest RAM memory-backend must be shared, and the @x-ignore-shared >> migration capability must be set, to avoid saving it to the file. Guest RAM >> must be non-volatile across reboot, which can be achieved by backing it with >> a dax device, or /dev/shm PKRAM as proposed in >> https://lore.kernel.org/lkml/[email protected] >> but this is not enforced. The restarted qemu arguments must match those used >> to initially start qemu, plus the -incoming option. >> >> The reboot mode supports vfio devices if the caller first suspends the guest, >> such as by issuing guest-suspend-ram to the qemu guest agent. The guest >> drivers' suspend methods flush outstanding requests and re-initialize the >> devices, and thus there is no device state to save and restore. After >> issuing migrate-incoming, the caller must issue a system_wakeup command to >> resume. >> >> In cpr-exec mode, the migrate command saves state to a file and directly >> exec's a new version of qemu on the same host, replacing the original process >> while retaining its PID. The caller must specify a migration URI that writes >> to and reads from a file, and resumes execution via the migrate-incoming >> command. Arguments for the new qemu process are taken from the cpr-exec-args >> migration parameter, and must include the -incoming option. >> >> Guest RAM must be backed by a memory backend with share=on, but cannot be >> memory-backend-ram. The memory is re-mmap'd in the updated process, so guest >> ram is efficiently preserved in place, albeit with new virtual addresses. >> In addition, the '-migrate-mode-enable cpr-exec' option is required. This >> causes secondary guest ram blocks (those not specified on the command line) >> to be allocated by mmap'ing a memfd. The memfds are kept open across exec, >> their values are saved in special cpr state which is retrieved after exec, >> and they are re-mmap'd. Since guest RAM is not copied, and storage blocks >> are not migrated, the caller must disable all capabilities related to page >> and block copy. The implementation ignores all related parameters. >> >> The exec mode supports vfio devices by preserving the vfio container, group, >> device, and event descriptors across the qemu re-exec, and by updating DMA >> mapping virtual addresses using VFIO_DMA_UNMAP_FLAG_VADDR and >> VFIO_DMA_MAP_FLAG_VADDR as defined in >> >> https://lore.kernel.org/kvm/[email protected] >> and integrated in Linux kernel 5.12. >> >> Here is an example of updating qemu from v7.0.50 to v7.0.51 using exec mode. >> The software update is performed while the guest is running to minimize >> downtime. >> >> window 1 | window 2 >> | >> # qemu-system-$arch ... | >> -migrate-mode-enable cpr-exec | >> QEMU 7.0.50 monitor - type 'help' ... | >> (qemu) info status | >> VM status: running | >> | # yum update qemu >> (qemu) migrate_set_parameter mode cpr-exec | >> (qemu) migrate_set_parameter cpr-exec-args | >> qemu-system-$arch ... -incoming defer | >> (qemu) migrate -d file:/tmp/qemu.sav | >> QEMU 7.0.51 monitor - type 'help' ... | >> (qemu) info status | >> VM status: paused (inmigrate) | >> (qemu) migrate_incoming file:/tmp/qemu.sav | >> (qemu) info status | >> VM status: running | >> >> >> Here is an example of updating the host kernel using reboot mode. >> >> window 1 | window 2 >> | >> # qemu-system-$arch ... mem-path=/dev/dax0.0 | >> -migrate-mode-enable cpr-reboot | >> QEMU 7.0.50 monitor - type 'help' ... | >> (qemu) info status | >> VM status: running | >> | # yum update kernel-uek >> (qemu) migrate_set_parameter mode cpr-reboot | >> (qemu) migrate -d file:/tmp/qemu.sav | >> (qemu) quit | >> | >> # systemctl kexec | >> kexec_core: Starting new kernel | >> ... | >> | >> # qemu-system-$arch mem-path=/dev/dax0.0 ... | >> -incoming defer | >> QEMU 7.0.51 monitor - type 'help' ... | >> (qemu) info status | >> VM status: paused (inmigrate) | >> (qemu) migrate_incoming file:/tmp/qemu.sav | >> (qemu) info status | >> VM status: running | >> >> Changes from V8 to V9: >> vfio: >> - free all cpr state during unwind in vfio_connect_container >> - change cpr_resave_fd to return void, and avoid new unwind cases >> - delete incorrect .unmigratable=1 in vmstate handlers >> - add route batching in vfio_claim_vectors >> - simplified vfio intx cpr code >> - fix commit message for 'recover from unmap-all-vaddr failure' >> - verify suspended runstate for cpr-reboot mode >> Other: >> - delete cpr-save, cpr-exec, cpr-load >> - delete ram block vmstate handlers that were added in V8 >> - rename cpr-enable option to migrate-mode-enable >> - add file URI for migration >> - add mode and cpr-exec-args migration parameters >> - add per-mode migration blockers >> - add mode checks in migration notifiers >> - fix suspended runstate during migration >> - replace RAM_ANON flag with RAM_NAMED_FILE >> - support memory-backend-epc >> >> Steve Sistare (44): >> migration: fix populate_vfio_info --- reboot mode --- >> memory: RAM_NAMED_FILE flag >> migration: file URI >> migration: mode parameter >> migration: migrate-enable-mode option >> migration: simplify blockers >> migration: per-mode blockers >> cpr: relax some blockers >> cpr: reboot mode >> >> qdev-properties: strList --- exec mode --- >> qapi: strList_from_string >> qapi: QAPI_LIST_LENGTH >> qapi: strv_from_strList >> qapi: strList unit tests >> migration: cpr-exec-args parameter >> migration: simplify notifiers >> migration: check mode in notifiers >> memory: flat section iterator >> oslib: qemu_clear_cloexec >> vl: helper to request re-exec >> cpr: preserve extra state >> cpr: exec mode >> cpr: add exec-mode blockers >> cpr: ram block blockers >> cpr: only-cpr-capable >> cpr: Mismatched GPAs fix >> hostmem-memfd: cpr support >> hostmem-epc: cpr support >> >> pci: export msix_is_pending --- vfio for exec --- >> vfio-pci: refactor for cpr >> vfio-pci: cpr part 1 (fd and dma) >> vfio-pci: cpr part 2 (msi) >> vfio-pci: cpr part 3 (intx) >> vfio-pci: recover from unmap-all-vaddr failure >> >> chardev: cpr framework --- misc for exec --- >> chardev: cpr for simple devices >> chardev: cpr for pty >> python/machine: QEMUMachine full_args >> python/machine: QEMUMachine reopen_qmp_connection >> tests/avocado: add cpr regression test >> >> vl: start on wakeup request --- vfio for reboot --- >> migration: fix suspended runstate >> migration: notifier error reporting >> vfio: allow cpr-reboot migration if suspended >> >> Mark Kanda, Steve Sistare (2): >> vhost: reset vhost devices for cpr >> chardev: cpr for sockets >> >> MAINTAINERS | 14 ++ >> accel/xen/xen-all.c | 3 + >> backends/hostmem-epc.c | 18 +- >> backends/hostmem-file.c | 1 + >> backends/hostmem-memfd.c | 22 ++- >> backends/tpm/tpm_emulator.c | 11 +- >> block/parallels.c | 7 +- >> block/qcow.c | 7 +- >> block/vdi.c | 7 +- >> block/vhdx.c | 7 +- >> block/vmdk.c | 7 +- >> block/vpc.c | 7 +- >> block/vvfat.c | 7 +- >> chardev/char-mux.c | 1 + >> chardev/char-null.c | 1 + >> chardev/char-pty.c | 16 +- >> chardev/char-serial.c | 1 + >> chardev/char-socket.c | 48 +++++ >> chardev/char-stdio.c | 31 +++ >> chardev/char.c | 49 ++++- >> dump/dump.c | 4 +- >> gdbstub.c | 1 + >> hmp-commands.hx | 2 +- >> hw/9pfs/9p.c | 11 +- >> hw/core/qdev-properties-system.c | 12 ++ >> hw/core/qdev-properties.c | 44 +++++ >> hw/display/virtio-gpu-base.c | 8 +- >> hw/intc/arm_gic_kvm.c | 3 +- >> hw/intc/arm_gicv3_its_kvm.c | 3 +- >> hw/intc/arm_gicv3_kvm.c | 3 +- >> hw/misc/ivshmem.c | 8 +- >> hw/net/virtio-net.c | 10 +- >> hw/pci/msix.c | 2 +- >> hw/pci/pci.c | 12 ++ >> hw/ppc/pef.c | 2 +- >> hw/ppc/spapr.c | 2 +- >> hw/ppc/spapr_events.c | 2 +- >> hw/ppc/spapr_rtas.c | 2 +- >> hw/remote/proxy.c | 7 +- >> hw/s390x/s390-virtio-ccw.c | 9 +- >> hw/scsi/vhost-scsi.c | 9 +- >> hw/vfio/common.c | 235 +++++++++++++++++++---- >> hw/vfio/cpr.c | 177 ++++++++++++++++++ >> hw/vfio/meson.build | 1 + >> hw/vfio/migration.c | 23 +-- >> hw/vfio/pci.c | 336 ++++++++++++++++++++++++++++----- >> hw/vfio/trace-events | 1 + >> hw/virtio/vhost-vdpa.c | 6 +- >> hw/virtio/vhost.c | 32 +++- >> include/chardev/char-socket.h | 1 + >> include/chardev/char.h | 5 + >> include/exec/memory.h | 48 +++++ >> include/exec/ram_addr.h | 1 + >> include/exec/ramblock.h | 1 + >> include/hw/pci/msix.h | 1 + >> include/hw/qdev-properties-system.h | 4 + >> include/hw/qdev-properties.h | 3 + >> include/hw/vfio/vfio-common.h | 12 ++ >> include/hw/virtio/vhost.h | 1 + >> include/migration/blocker.h | 69 ++++++- >> include/migration/cpr-state.h | 30 +++ >> include/migration/cpr.h | 20 ++ >> include/migration/misc.h | 13 +- >> include/migration/vmstate.h | 2 + >> include/qapi/util.h | 28 +++ >> include/qemu/osdep.h | 9 + >> include/sysemu/runstate.h | 2 + >> migration/cpr-state.c | 362 >> ++++++++++++++++++++++++++++++++++++ >> migration/cpr.c | 85 +++++++++ >> migration/file.c | 62 ++++++ >> migration/file.h | 14 ++ >> migration/meson.build | 3 + >> migration/migration.c | 268 +++++++++++++++++++++++--- >> migration/ram.c | 24 ++- >> migration/target.c | 1 + >> migration/trace-events | 12 ++ >> monitor/hmp-cmds.c | 59 +++--- >> monitor/hmp.c | 3 + >> monitor/qmp.c | 4 + >> python/qemu/machine/machine.py | 14 ++ >> qapi/char.json | 7 +- >> qapi/migration.json | 68 ++++++- >> qapi/qapi-util.c | 37 ++++ >> qemu-options.hx | 50 ++++- >> replay/replay.c | 4 + >> softmmu/memory.c | 31 ++- >> softmmu/physmem.c | 100 +++++++++- >> softmmu/runstate.c | 42 ++++- >> softmmu/vl.c | 10 + >> stubs/cpr-state.c | 26 +++ >> stubs/meson.build | 2 + >> stubs/migr-blocker.c | 9 +- >> stubs/migration.c | 33 ++++ >> target/i386/kvm/kvm.c | 8 +- >> target/i386/nvmm/nvmm-all.c | 4 +- >> target/i386/sev.c | 2 +- >> target/i386/whpx/whpx-all.c | 3 +- >> tests/avocado/cpr.py | 176 ++++++++++++++++++ >> tests/unit/meson.build | 1 + >> tests/unit/test-strlist.c | 81 ++++++++ >> trace-events | 1 + >> ui/spice-core.c | 5 +- >> ui/vdagent.c | 5 +- >> util/oslib-posix.c | 9 + >> util/oslib-win32.c | 4 + >> 105 files changed, 2781 insertions(+), 330 deletions(-) >> create mode 100644 hw/vfio/cpr.c >> create mode 100644 include/migration/cpr-state.h >> create mode 100644 include/migration/cpr.h >> create mode 100644 migration/cpr-state.c >> create mode 100644 migration/cpr.c >> create mode 100644 migration/file.c >> create mode 100644 migration/file.h >> create mode 100644 stubs/cpr-state.c >> create mode 100644 stubs/migration.c >> create mode 100644 tests/avocado/cpr.py >> create mode 100644 tests/unit/test-strlist.c >>
