Overview ========= Implemented MPIPL (Memory Preserving IPL, aka fadump) on PowerNV machine in QEMU.
Fadump is an alternative dump mechanism to kdump, in which we the firmware does a memory preserving boot, and the second/crashkernel is booted fresh like a normal system reset, instead of the crashed kernel loading the second/crashkernel in case of kdump. MPIPL in PowerNV, is similar to fadump in Pseries. The idea is same, memory preserving, where in PowerNV we are assisted by SBE (Self Boot Engine) & Hostboot, while in Pseries we are assisted by PHyp (Power Hypervisor) For implementing in baremetal/powernv QEMU, we need to export a "ibm,opal/dump" node in the device tree, to tell the kernel we support MPIPL Once kernel sees the support, and "fadump=on" is passed on commandline, kernel will register memory regions to preserve with Skiboot. Kernel sends these data using OPAL calls, after which skiboot/opal saves the memory region details to MDST and MDDT tables (S-source, D-destination) Then in the event of a kernel crash, the kernel initiates MPIPL with another OPAL code (opal_cec_reboot2), this request goes to Skiboot. Skiboot then triggers the "S0 Interrupt" to the SBE (Self Boot Engine), along with OPAL's relocated base address. SBE then stops all core clocks, and only does particular ISteps for a memory preserving boot. Then, hostboot comes up, and with help of the relocated base address, it accesses MDST & MDDT tables (S-source and D-destination), and preserves the memory regions according to the data in these tables. And after preserving, it writes the preserved memory region details to MDRT tables (R-Result), for the kernel to know where/whether a memory region is preserved. Both SBE's and hostboot responsiblities are implemented in the SBE code in QEMU. Then in the second kernel/crashkernel boot, OPAL passes the "mpipl-boot" property for the kernel to know that a dump is active, which kernel then exports in /proc/vmcore Testing ==================== 1. Git tree for testing: https://gitlab.com/adi-g15-ibm/qemu/tree/fadump-powernv-v3 2. Gitlab pipeline: https://gitlab.com/adi-g15-ibm/qemu/-/pipelines/2348793709 3. Analysing generated vmcore: # ls -lh /proc/vmcore -r-------- 1 root root 4.5G Feb 25 07:33 /proc/vmcore # file /proc/vmcore /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style # crash vmlinux vmcore ... KERNEL: vmlinux-38fec10eb60d-network DUMPFILE: vmcore-powernv-25feb26 CPUS: 4 DATE: Thu Jan 1 05:30:00 IST 1970 UPTIME: 00:05:23 LOAD AVERAGE: 0.12, 0.08, 0.03 TASKS: 101 NODENAME: buildroot RELEASE: 6.14.0 VERSION: #1 SMP Thu Apr 3 08:06:13 CDT 2025 MACHINE: ppc64le (1000 Mhz) MEMORY: 6 GB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 257 COMMAND: "sh" TASK: c000000008066600 [THREAD_INFO: c000000008066600] CPU: 2 STATE: TASK_RUNNING (PANIC) crash> # ps and kmem -i works Changelog ==================== v2 -> v3: * rebase to upstream, changes in patches below * #2/10: no code change. add comment that skiboot triggers S0 * #3/10: stash command: handle invalid skiboot_base sent by guest * #4/10: s/src_len/data_len/ * #4/10: use TARGET_FMT_lx/PRIx64 instead of %lx to prevent build errors * #4/10: stop copying chunks once copying a chunk fails * #5/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write} * #5/10: add more SPRs to be saved, same set of SPRs as spapr FADump, except CR and FPSCR * #7/10: only export "mpipl-boot" property if preserving cpu states and writing MDRT was successful, otherwise continue with normal reboot * #7/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write} * #8/10: reword commit description to mention fw-load-area, no code change * #10/10: add entry in MAINTAINERS file Aditya Gupta (10): ppc/pnv: Move SBE host doorbell function to top of file ppc/mpipl: Implement S0 SBE interrupt ppc/pnv: Handle stash command in PowerNV SBE pnv/mpipl: Preserve memory regions as per MDST/MDDT tables pnv/mpipl: Preserve CPU registers after crash pnv/mpipl: Set thread entry size to be allocated by firmware pnv/mpipl: Write the preserved CPU and MDRT state pnv/mpipl: Enable MPIPL support tests/functional: Add test for MPIPL in PowerNV MAINTAINERS: Add entry for MPIPL (PowerNV) MAINTAINERS | 8 + hw/ppc/meson.build | 1 + hw/ppc/pnv.c | 104 ++++++ hw/ppc/pnv_mpipl.c | 482 ++++++++++++++++++++++++++ hw/ppc/pnv_sbe.c | 84 ++++- include/hw/ppc/pnv.h | 7 + include/hw/ppc/pnv_mpipl.h | 168 +++++++++ tests/functional/ppc64/test_fadump.py | 35 +- 8 files changed, 858 insertions(+), 31 deletions(-) create mode 100644 hw/ppc/pnv_mpipl.c create mode 100644 include/hw/ppc/pnv_mpipl.h -- 2.53.0
