This patch adds the core infrastructure to generate Kexec HandOver
metadata. Kexec HandOver is a mechanism that allows Linux to preserve
state - arbitrary properties as well as memory locations - across kexec.
It does so using 3 concepts:
1) Device Tree - Every KHO kexec carries a KHO specific flattened
device tree blob that describes the state of the system. Device
drivers can register to KHO to serialize their state before kexec.
2) Mem cache - A memblocks like structure that contains full page
ranges of reservations. These can not be part of the architectural
reservations, because they differ on every kexec.
3) Scratch Region - A CMA region that we allocate in the first kernel.
CMA gives us the guarantee that no handover pages land in that
region, because handover pages must be at a static physical memory
location. We use this region as the place to load future kexec
images into which then won't collide with any handover data.
Signed-off-by: Alexander Graf
---
Documentation/ABI/testing/sysfs-kernel-kho| 53 +++
.../admin-guide/kernel-parameters.txt | 10 +
MAINTAINERS | 1 +
include/linux/kexec.h | 24 ++
include/uapi/linux/kexec.h| 6 +
kernel/Makefile | 1 +
kernel/kexec_kho_out.c| 316 ++
7 files changed, 411 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-kernel-kho
create mode 100644 kernel/kexec_kho_out.c
diff --git a/Documentation/ABI/testing/sysfs-kernel-kho
b/Documentation/ABI/testing/sysfs-kernel-kho
new file mode 100644
index ..f69e7b81a337
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-kho
@@ -0,0 +1,53 @@
+What: /sys/kernel/kho/active
+Date: December 2023
+Contact: Alexander Graf
+Description:
+ Kexec HandOver (KHO) allows Linux to transition the state of
+ compatible drivers into the next kexec'ed kernel. To do so,
+ device drivers will serialize their current state into a DT.
+ While the state is serialized, they are unable to perform
+ any modifications to state that was serialized, such as
+ handed over memory allocations.
+
+ When this file contains "1", the system is in the transition
+ state. When contains "0", it is not. To switch between the
+ two states, echo the respective number into this file.
+
+What: /sys/kernel/kho/dt_max
+Date: December 2023
+Contact: Alexander Graf
+Description:
+ KHO needs to allocate a buffer for the DT that gets
+ generated before it knows the final size. By default, it
+ will allocate 10 MiB for it. You can write to this file
+ to modify the size of that allocation.
+
+What: /sys/kernel/kho/scratch_len
+Date: December 2023
+Contact: Alexander Graf
+Description:
+ To support continuous KHO kexecs, we need to reserve a
+ physically contiguous memory region that will always stay
+ available for future kexec allocations. This file describes
+ the length of that memory region. Kexec user space tooling
+ can use this to determine where it should place its payload
+ images.
+
+What: /sys/kernel/kho/scratch_phys
+Date: December 2023
+Contact: Alexander Graf
+Description:
+ To support continuous KHO kexecs, we need to reserve a
+ physically contiguous memory region that will always stay
+ available for future kexec allocations. This file describes
+ the physical location of that memory region. Kexec user space
+ tooling can use this to determine where it should place its
+ payload images.
+
+What: /sys/kernel/kho/dt
+Date: December 2023
+Contact: Alexander Graf
+Description:
+ When KHO is active, the kernel exposes the generated DT that
+ carries its current KHO state in this file. Kexec user space
+ tooling can use this as input file for the KHO payload image.
diff --git a/Documentation/admin-guide/kernel-parameters.txt
b/Documentation/admin-guide/kernel-parameters.txt
index 51575cd31741..efeef075617e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2504,6 +2504,16 @@
kgdbwait[KGDB] Stop kernel execution and enter the
kernel debugger at the earliest opportunity.
+ kho_scratch=n[KMG] [KEXEC] Sets the size of the KHO scratch
+ region. The KHO scratch region is a physically
+ memory range that can only be used for non-kernel
+