[PATCH 4/6] kexec: improve the readability of crash_shrink_memory()
The major adjustments are: 1. end = start + new_size. The 'end' here is not an accurate representation, because it is not the new end of crashk_res, but the start of ram_res, difference 1. So eliminate it and replace it with ram_res->start. 2. Use 'ram_res->start' and 'ram_res->end' as arguments to crash_free_reserved_phys_range() to indicate that the memory covered by 'ram_res' is released from the crashk. And keep it close to insert_resource(). 3. Replace 'if (start == end)' with 'if (!new_size)', clear indication that all crashk memory will be shrunken. No functional change. Signed-off-by: Zhen Lei --- kernel/kexec_core.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index bcc86a250ab3bf9..69fe92141b0b62d 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1108,7 +1108,6 @@ ssize_t crash_get_memory_size(void) int crash_shrink_memory(unsigned long new_size) { int ret = 0; - unsigned long start, end; unsigned long old_size; struct resource *ram_res; @@ -1119,9 +1118,7 @@ int crash_shrink_memory(unsigned long new_size) ret = -ENOENT; goto unlock; } - start = crashk_res.start; - end = crashk_res.end; - old_size = (end == 0) ? 0 : end - start + 1; + old_size = !crashk_res.end ? 0 : resource_size(&crashk_res); new_size = roundup(new_size, KEXEC_CRASH_MEM_ALIGN); if (new_size >= old_size) { ret = (new_size == old_size) ? 0 : -EINVAL; @@ -1134,22 +1131,20 @@ int crash_shrink_memory(unsigned long new_size) goto unlock; } - end = start + new_size; - crash_free_reserved_phys_range(end, crashk_res.end); - - ram_res->start = end; + ram_res->start = crashk_res.start + new_size; ram_res->end = crashk_res.end; ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM; ram_res->name = "System RAM"; - if (start == end) { + if (!new_size) { release_resource(&crashk_res); crashk_res.start = 0; crashk_res.end = 0; } else { - crashk_res.end = end - 1; + crashk_res.end = ram_res->start - 1; } + crash_free_reserved_phys_range(ram_res->start, ram_res->end); insert_resource(&iomem_resource, ram_res); unlock: -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 0/6] kexec: enable kexec_crash_size to support two crash kernel regions
When crashkernel=X fails to reserve region under 4G, it will fall back to reserve region above 4G and a region of the default size will also be reserved under 4G. Unfortunately, /sys/kernel/kexec_crash_size only supports one crash kernel region now, the user cannot sense the low memory reserved by reading /sys/kernel/kexec_crash_size. Also, low memory cannot be freed by writing this file. For example: resource_size(crashk_res) = 512M resource_size(crashk_low_res) = 256M The result of 'cat /sys/kernel/kexec_crash_size' is 512M, but it should be 768M. When we execute 'echo 0 > /sys/kernel/kexec_crash_size', the size of crashk_res becomes 0 and resource_size(crashk_low_res) is still 256 MB, which is incorrect. Since crashk_res manages the memory with high address and crashk_low_res manages the memory with low address, crashk_low_res is shrunken only when all crashk_res is shrunken. And because when there is only one crash kernel region, crashk_res is always used. Therefore, if all crashk_res is shrunken and crashk_low_res still exists, swap them. Zhen Lei (6): kexec: fix a memory leak in crash_shrink_memory() kexec: delete a useless check in crash_shrink_memory() kexec: clear crashk_res if all its memory has been released kexec: improve the readability of crash_shrink_memory() kexec: add helper __crash_shrink_memory() kexec: enable kexec_crash_size to support two crash kernel regions kernel/kexec_core.c | 92 +++-- 1 file changed, 64 insertions(+), 28 deletions(-) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/6] kexec: fix a memory leak in crash_shrink_memory()
If the value of parameter 'new_size' is in the semi-open and semi-closed interval (crashk_res.end - KEXEC_CRASH_MEM_ALIGN + 1, crashk_res.end], the calculation result of ram_res is: ram_res->start = crashk_res.end + 1 ram_res->end = crashk_res.end The operation of function insert_resource() fails, and ram_res is not added to iomem_resource. As a result, the memory of the control block ram_res is leaked. In fact, on all architectures, the start address and size of crashk_res are already aligned by KEXEC_CRASH_MEM_ALIGN. Therefore, we do not need to round up crashk_res.start again. Instead, we should round up 'new_size' in advance. Fixes: 6480e5a09237 ("kdump: add missing RAM resource in crash_shrink_memory()") Fixes: 06a7f711246b ("kexec: premit reduction of the reserved memory size") Signed-off-by: Zhen Lei --- kernel/kexec_core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 3d578c6fefee385..22acee18195a591 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1122,6 +1122,7 @@ int crash_shrink_memory(unsigned long new_size) start = crashk_res.start; end = crashk_res.end; old_size = (end == 0) ? 0 : end - start + 1; + new_size = roundup(new_size, KEXEC_CRASH_MEM_ALIGN); if (new_size >= old_size) { ret = (new_size == old_size) ? 0 : -EINVAL; goto unlock; @@ -1133,9 +1134,7 @@ int crash_shrink_memory(unsigned long new_size) goto unlock; } - start = roundup(start, KEXEC_CRASH_MEM_ALIGN); - end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN); - + end = start + new_size; crash_free_reserved_phys_range(end, crashk_res.end); if ((start == end) && (crashk_res.parent != NULL)) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 5/6] kexec: add helper __crash_shrink_memory()
No functional change, in preparation for the next patch so that it is easier to review. Signed-off-by: Zhen Lei --- kernel/kexec_core.c | 50 + 1 file changed, 28 insertions(+), 22 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 69fe92141b0b62d..e82bc6d6634136a 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1105,11 +1105,37 @@ ssize_t crash_get_memory_size(void) return size; } +int __crash_shrink_memory(struct resource *old_res, unsigned long new_size) +{ + struct resource *ram_res; + + ram_res = kzalloc(sizeof(*ram_res), GFP_KERNEL); + if (!ram_res) + return -ENOMEM; + + ram_res->start = old_res->start + new_size; + ram_res->end = old_res->end; + ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM; + ram_res->name = "System RAM"; + + if (!new_size) { + release_resource(old_res); + old_res->start = 0; + old_res->end = 0; + } else { + crashk_res.end = ram_res->start - 1; + } + + crash_free_reserved_phys_range(ram_res->start, ram_res->end); + insert_resource(&iomem_resource, ram_res); + + return 0; +} + int crash_shrink_memory(unsigned long new_size) { int ret = 0; unsigned long old_size; - struct resource *ram_res; if (!kexec_trylock()) return -EBUSY; @@ -1125,27 +1151,7 @@ int crash_shrink_memory(unsigned long new_size) goto unlock; } - ram_res = kzalloc(sizeof(*ram_res), GFP_KERNEL); - if (!ram_res) { - ret = -ENOMEM; - goto unlock; - } - - ram_res->start = crashk_res.start + new_size; - ram_res->end = crashk_res.end; - ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM; - ram_res->name = "System RAM"; - - if (!new_size) { - release_resource(&crashk_res); - crashk_res.start = 0; - crashk_res.end = 0; - } else { - crashk_res.end = ram_res->start - 1; - } - - crash_free_reserved_phys_range(ram_res->start, ram_res->end); - insert_resource(&iomem_resource, ram_res); + ret = __crash_shrink_memory(&crashk_res, new_size); unlock: kexec_unlock(); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 6/6] kexec: enable kexec_crash_size to support two crash kernel regions
The crashk_low_res should be considered by /sys/kernel/kexec_crash_size to support two crash kernel regions. Since crashk_res manages the memory with high address and crashk_low_res manages the memory with low address, crashk_low_res is shrunken only when all crashk_res is shrunken. And because when there is only one crash kernel region, crashk_res is always used. Therefore, if all crashk_res is shrunken and crashk_low_res still exists, swap them. Signed-off-by: Zhen Lei --- kernel/kexec_core.c | 43 ++- 1 file changed, 38 insertions(+), 5 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index e82bc6d6634136a..c1d50f6566300d9 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1091,6 +1091,11 @@ __bpf_kfunc void crash_kexec(struct pt_regs *regs) } } +static inline resource_size_t crash_resource_size(const struct resource *res) +{ + return !res->end ? 0 : resource_size(res); +} + ssize_t crash_get_memory_size(void) { ssize_t size = 0; @@ -1098,8 +1103,8 @@ ssize_t crash_get_memory_size(void) if (!kexec_trylock()) return -EBUSY; - if (crashk_res.end != crashk_res.start) - size = resource_size(&crashk_res); + size += crash_resource_size(&crashk_res); + size += crash_resource_size(&crashk_low_res); kexec_unlock(); return size; @@ -1135,7 +1140,7 @@ int __crash_shrink_memory(struct resource *old_res, unsigned long new_size) int crash_shrink_memory(unsigned long new_size) { int ret = 0; - unsigned long old_size; + unsigned long old_size, low_size; if (!kexec_trylock()) return -EBUSY; @@ -1144,14 +1149,42 @@ int crash_shrink_memory(unsigned long new_size) ret = -ENOENT; goto unlock; } - old_size = !crashk_res.end ? 0 : resource_size(&crashk_res); + + low_size = crash_resource_size(&crashk_low_res); + old_size = crash_resource_size(&crashk_res) + low_size; new_size = roundup(new_size, KEXEC_CRASH_MEM_ALIGN); if (new_size >= old_size) { ret = (new_size == old_size) ? 0 : -EINVAL; goto unlock; } - ret = __crash_shrink_memory(&crashk_res, new_size); + /* +* (low_size > new_size) implies that low_size is greater than zero. +* This also means that if low_size is zero, the else branch is taken. +* +* If low_size is greater than 0, (low_size > new_size) indicates that +* crashk_low_res also needs to be shrunken. Otherwise, only crashk_res +* needs to be shrunken. +*/ + if (low_size > new_size) { + ret = __crash_shrink_memory(&crashk_res, 0); + if (ret) + goto unlock; + + ret = __crash_shrink_memory(&crashk_low_res, new_size); + } else { + ret = __crash_shrink_memory(&crashk_res, new_size - low_size); + } + + /* Swap crashk_res and crashk_low_res if needed */ + if (!crashk_res.end && crashk_low_res.end) { + crashk_res.start = crashk_low_res.start; + crashk_res.end = crashk_low_res.end; + release_resource(&crashk_low_res); + crashk_low_res.start = 0; + crashk_low_res.end = 0; + insert_resource(&iomem_resource, &crashk_res); + } unlock: kexec_unlock(); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/6] kexec: delete a useless check in crash_shrink_memory()
The check '(crashk_res.parent != NULL)' is added by commit e05bd3367bd3 ("kexec: fix Oops in crash_shrink_memory()"), but it's stale now. Because if 'crashk_res' is not reserved, it will be zero in size and will be intercepted by the above 'if (new_size >= old_size)'. Ago: if (new_size >= end - start + 1) Now: old_size = (end == 0) ? 0 : end - start + 1; if (new_size >= old_size) Signed-off-by: Zhen Lei --- kernel/kexec_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 22acee18195a591..d1ab139dd49035e 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1137,7 +1137,7 @@ int crash_shrink_memory(unsigned long new_size) end = start + new_size; crash_free_reserved_phys_range(end, crashk_res.end); - if ((start == end) && (crashk_res.parent != NULL)) + if (start == end) release_resource(&crashk_res); ram_res->start = end; -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 3/6] kexec: clear crashk_res if all its memory has been released
If the resource of crashk_res has been released, it is better to clear crashk_res.start and crashk_res.end. Because 'end = start - 1' is not reasonable, and in some places the test is based on crashk_res.end, not resource_size(&crashk_res). Signed-off-by: Zhen Lei --- kernel/kexec_core.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index d1ab139dd49035e..bcc86a250ab3bf9 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1137,15 +1137,18 @@ int crash_shrink_memory(unsigned long new_size) end = start + new_size; crash_free_reserved_phys_range(end, crashk_res.end); - if (start == end) - release_resource(&crashk_res); - ram_res->start = end; ram_res->end = crashk_res.end; ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM; ram_res->name = "System RAM"; - crashk_res.end = end - 1; + if (start == end) { + release_resource(&crashk_res); + crashk_res.start = 0; + crashk_res.end = 0; + } else { + crashk_res.end = end - 1; + } insert_resource(&iomem_resource, ram_res); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 2/2] arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
For crashkernel=X without '@offset', select a region within DMA zones first, and fall back to reserve region above DMA zones. This allows users to use the same configuration on multiple platforms. Signed-off-by: Zhen Lei Acked-by: Baoquan He --- Documentation/admin-guide/kernel-parameters.txt | 2 +- arch/arm64/mm/init.c| 17 - 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a7b7147447b8bf8..ef6d922ed26b9dc 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -842,7 +842,7 @@ memory region [offset, offset + size] for that kernel image. If '@offset' is omitted, then a suitable offset is selected automatically. - [KNL, X86-64] Select a region under 4G first, and + [KNL, X86-64, ARM64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. See Documentation/admin-guide/kdump/kdump.rst for further details. diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index ba7227179822d10..58a0bb2c17f18cf 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -132,6 +132,7 @@ static void __init reserve_crashkernel(void) unsigned long long crash_max = CRASH_ADDR_LOW_MAX; char *cmdline = boot_command_line; int ret; + bool fixed_base = false; if (!IS_ENABLED(CONFIG_KEXEC_CORE)) return; @@ -163,12 +164,26 @@ static void __init reserve_crashkernel(void) crash_size = PAGE_ALIGN(crash_size); /* User specifies base address explicitly. */ - if (crash_base) + if (crash_base) { + fixed_base = true; crash_max = crash_base + crash_size; + } +retry: crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { + /* +* If the first attempt was for low memory, fall back to +* high memory, the minimum required low memory will be +* reserved later. +*/ + if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) { + crash_max = CRASH_ADDR_HIGH_MAX; + crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; + goto retry; + } + pr_warn("cannot allocate crashkernel (size:0x%llx)\n", crash_size); return; -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 1/2] arm64: kdump: Provide default size when crashkernel=Y,low is not specified
Try to allocate at least 128 MiB low memory automatically for the case that crashkernel=,high is explicitly specified, while crashkenrel=,low is omitted. This allows users to focus more on the high memory requirements of their business rather than the low memory requirements of the crash kernel booting. Signed-off-by: Zhen Lei Acked-by: Baoquan He --- Documentation/admin-guide/kernel-parameters.txt | 13 + arch/arm64/mm/init.c| 8 ++-- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7dea58f4a69cc8c..a7b7147447b8bf8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -862,26 +862,23 @@ available. It will be ignored if crashkernel=X is specified. crashkernel=size[KMG],low - [KNL, X86-64] range under 4G. When crashkernel=X,high + [KNL, X86-64, ARM64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region above 4G, that cause second kernel crash on system that require some amount of low memory, e.g. swiotlb requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate - at least 256M below 4G automatically. + default size of memory below 4G automatically. The default + size is platform dependent. + --> x86: max(swiotlb_size_or_default() + 8MiB, 256MiB) + --> arm64: 128MiB This one lets the user specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. - [KNL, ARM64] range in low memory. - This one lets the user specify a low range in the - DMA zone for the crash dump kernel. - It will be ignored when crashkernel=X,high is not used - or memory reserved is located in the DMA zones. - cryptomgr.notests [KNL] Disable crypto self-tests diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 4b4651ee47f271a..ba7227179822d10 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -96,6 +96,8 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit #define CRASH_ADDR_HIGH_MAX(PHYS_MASK + 1) +#define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) + static int __init reserve_crashkernel_low(unsigned long long low_size) { unsigned long long low_base; @@ -147,7 +149,9 @@ static void __init reserve_crashkernel(void) * is not allowed. */ ret = parse_crashkernel_low(cmdline, 0, &crash_low_size, &crash_base); - if (ret && (ret != -ENOENT)) + if (ret == -ENOENT) + crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; + else if (ret) return; crash_max = CRASH_ADDR_HIGH_MAX; @@ -170,7 +174,7 @@ static void __init reserve_crashkernel(void) return; } - if ((crash_base >= CRASH_ADDR_LOW_MAX) && + if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) && crash_low_size && reserve_crashkernel_low(crash_low_size)) { memblock_phys_free(crash_base, crash_size); return; -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 0/2] arm64: kdump: Function supplement and performance optimization
v3 --> v4: 1. Set DEFAULT_CRASH_KERNEL_LOW_SIZE to a fixed 128M. 2. Some lightweight code adjustments based on Catalin Marinas's comments v2 --> v3: 1. Discard patch 3 in v2, a cleanup patch. v1 --> v2: 1. Update the commit message of Patch 1, explicitly indicates that "crashkernel=X,high" is specified but "crashkernel=Y,low" is not specified. 2. Drop Patch 4-5. Currently, focus on function integrity, performance optimization will be considered in later versions. 3. Patch 3 is not mandatory, it's just a cleanup now, although it is a must for patch 4-5. But to avoid subsequent duplication of effort, I'm glad it was accepted. v1: After the basic functions of "support reserving crashkernel above 4G on arm64 kdump"(see https://lkml.org/lkml/2022/5/6/428) are implemented, we still have three features to be improved. 1. When crashkernel=X,high is specified but crashkernel=Y,low is not specified, the default crash low memory size is provided. 2. For crashkernel=X without '@offset', if the low memory fails to be allocated, fall back to reserve region from high memory(above DMA zones). 3. If crashkernel=X,high is used, page mapping is performed only for the crash high memory, and block mapping is still used for other linear address spaces. Compared to the previous version: (1) For crashkernel=X[@offset], the memory above 4G is not changed to block mapping, leave it to the next time. (2) The implementation method is modified. Now the implementation is simpler and clearer. Zhen Lei (2): arm64: kdump: Provide default size when crashkernel=Y,low is not specified arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones .../admin-guide/kernel-parameters.txt | 15 +-- arch/arm64/mm/init.c | 25 --- 2 files changed, 28 insertions(+), 12 deletions(-) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v22 9/9] docs: kdump: Update the crashkernel description for arm64
Now arm64 has added support for "crashkernel=X,high" and "crashkernel=Y,low", and implements "crashkernel=X[@offset]" in the same way as x86. So update the Documentation. Signed-off-by: Zhen Lei Acked-by: Baoquan He --- Documentation/admin-guide/kernel-parameters.txt | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 3f1cc5e317ed4a5..ae0aa63ffe82f59 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -795,7 +795,7 @@ memory region [offset, offset + size] for that kernel image. If '@offset' is omitted, then a suitable offset is selected automatically. - [KNL, X86-64] Select a region under 4G first, and + [KNL, X86-64, ARM64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. See Documentation/admin-guide/kdump/kdump.rst for further details. @@ -808,20 +808,20 @@ Documentation/admin-guide/kdump/kdump.rst for an example. crashkernel=size[KMG],high - [KNL, X86-64] range could be above 4G. Allow kernel + [KNL, X86-64, ARM64] range could be above 4G. Allow kernel to allocate physical memory region from top, so could be above 4G if system have more than 4G ram installed. Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. crashkernel=size[KMG],low - [KNL, X86-64] range under 4G. When crashkernel=X,high + [KNL, X86-64, ARM64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region above 4G, that cause second kernel crash on system that require some amount of low memory, e.g. swiotlb requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit - devices won't run out. Kernel would try to allocate at + devices won't run out. Kernel would try to allocate at least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v22 3/9] arm64: kdump: Remove some redundant checks in map_mem()
If the value of crashk_res.end is non-zero in this function, it indicates that function reserve_crashkernel() has been called under condition "if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))" in zone_sizes_init() before. And obviously the command line option "crashkernel=" must also exist, so crash_mem_map is also true. Signed-off-by: Zhen Lei --- arch/arm64/mm/mmu.c | 31 +-- 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 626ec32873c6c36..7666b4955e45cb3 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -528,14 +528,12 @@ static void __init map_mem(pgd_t *pgdp) memblock_mark_nomap(kernel_start, kernel_end - kernel_start); #ifdef CONFIG_KEXEC_CORE - if (crash_mem_map) { - if (IS_ENABLED(CONFIG_ZONE_DMA) || - IS_ENABLED(CONFIG_ZONE_DMA32)) - flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; - else if (crashk_res.end) - memblock_mark_nomap(crashk_res.start, - resource_size(&crashk_res)); - } + if (crash_mem_map && + (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))) + flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; + else if (crashk_res.end) + memblock_mark_nomap(crashk_res.start, + resource_size(&crashk_res)); #endif /* map all the memory banks */ @@ -571,16 +569,13 @@ static void __init map_mem(pgd_t *pgdp) * through /sys/kernel/kexec_crash_size interface. */ #ifdef CONFIG_KEXEC_CORE - if (crash_mem_map && - !IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) { - if (crashk_res.end) { - __map_memblock(pgdp, crashk_res.start, - crashk_res.end + 1, - PAGE_KERNEL, - NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS); - memblock_clear_nomap(crashk_res.start, -resource_size(&crashk_res)); - } + if (crashk_res.end) { + __map_memblock(pgdp, crashk_res.start, + crashk_res.end + 1, + PAGE_KERNEL, + NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS); + memblock_clear_nomap(crashk_res.start, +resource_size(&crashk_res)); } #endif } -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v22 8/9] of: fdt: Add memory for devices by DT property "linux, usable-memory-range"
From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei Reviewed-by: Rob Herring Tested-by: Dave Kleikamp --- drivers/of/fdt.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index ec315b060cd50d2..2f248d0acc04830 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -973,16 +973,24 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range * location from flat tree */ void __init early_init_dt_check_for_usable_mem_range(void) { - const __be32 *prop; - int len; - phys_addr_t cap_mem_addr; - phys_addr_t cap_mem_size; + struct memblock_region rgn[MAX_USABLE_RANGES] = {0}; + const __be32 *prop, *endp; + int len, i; unsigned long node = chosen_node_offset; if ((long)node < 0) @@ -991,16 +999,21 @@ void __init early_init_dt_check_for_usable_mem_range(void) pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells))) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) { + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop); - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +i, &rgn[i].base, &rgn[i].size); + } - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn[0].base, rgn[0].size); + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++) + memblock_add(rgn[i].base, rgn[i].size); } #ifdef CONFIG_SERIAL_EARLYCON -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v22 0/9] support reserving crashkernel above 4G on arm64 kdump
sted by Rob, reuse DT property "linux,usable-memory-range" to pass the low memory region. - Fix kdump broken with ZONE_DMA reintroduced. - Update chosen schema. Changes since [v7] - Move x86 CRASH_ALIGN to 2M Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M. - Update Documentation/devicetree/bindings/chosen.txt. Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd. - Add Tested-by from Jhon and pk. Changes since [v6] - Fix build errors reported by kbuild test robot. Changes since [v5] - Move reserve_crashkernel_low() into kernel/crash_core.c. - Delete crashkernel=X,high. - Modify crashkernel=X,low. If crashkernel=X,low is specified simultaneously, reserve spcified size low memory for crash kdump kernel devices firstly and then reserve memory above 4G. In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then pass to crash dump kernel by DT property "linux,low-memory-range". - Update Documentation/admin-guide/kdump/kdump.rst. Changes since [v4] - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike. Changes since [v3] - Add memblock_cap_memory_ranges back for multiple ranges. - Fix some compiling warnings. Changes since [v2] - Split patch "arm64: kdump: support reserving crashkernel above 4G" as two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate patch. Changes since [v1]: - Move common reserve_crashkernel_low() code into kernel/kexec_core.c. - Remove memblock_cap_memory_ranges() i added in v1 and implement that in fdt_enforce_memory_region(). There are at most two crash kernel regions, for two crash kernel regions case, we cap the memory range [min(regs[*].start), max(regs[*].end)] and then remove the memory range in the middle. v1: There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X. crashkernel=X tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices. So there may be two regions reserved for crash dump kernel. In order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)", and pass the low region by reusing DT property "linux,usable-memory-range". We made the low memory region as the last range of "linux,usable-memory-range" to keep compatibility with existing user-space and older kdump kernels. Besides, we need to modify kexec-tools: arm64: support more than one crash kernel regions(see [1]) Another update is document about DT property 'linux,usable-memory-range': schemas: update 'linux,usable-memory-range' node schema(see [2]) [1]: https://www.spinics.net/lists/kexec/msg28226.html [2]: https://github.com/robherring/dt-schema/pull/19 [v1]: https://lkml.org/lkml/2019/4/2/1174 [v2]: https://lkml.org/lkml/2019/4/9/86 [v3]: https://lkml.org/lkml/2019/4/9/306 [v4]: https://lkml.org/lkml/2019/4/15/273 [v5]: https://lkml.org/lkml/2019/5/6/1360 [v6]: https://lkml.org/lkml/2019/8/30/142 [v7]: https://lkml.org/lkml/2019/12/23/411 [v8]: https://lkml.org/lkml/2020/5/21/213 [v9]: https://lkml.org/lkml/2020/6/28/73 [v10]: https://lkml.org/lkml/2020/7/2/1443 [v11]: https://lkml.org/lkml/2020/8/1/150 [v12]: https://lkml.org/lkml/2020/9/7/1037 [v13]: https://lkml.org/lkml/2020/10/31/34 [v14]: https://lkml.org/lkml/2021/1/30/53 [v15]: https://lkml.org/lkml/2021/10/19/1405 [v16]: https://lkml.org/lkml/2021/11/23/435 [v17]: https://lkml.org/lkml/2021/12/10/38 [v18]: https://lkml.org/lkml/2021/12/22/424 [v19]: https://lkml.org/lkml/2021/12/28/203 [v20]: https://lkml.org/lkml/2022/1/24/167 [v21]: https://lkml.org/lkml/2022/2/26/350 Chen Zhou (2): arm64: kdump: Reimplement crashkernel=X of: fdt: Add memory for devices by DT property "linux,usable-memory-range" Zhen Lei (7): kdump: return -ENOENT if required cmdline option does not exist arm64: Use insert_resource() to simplify code arm64: kdump: Remove some redundant checks in map_mem() arm64: kdump: Don't force page-level mappings for memory above 4G arm64: kdump: Use page-level mapping for the high memory of crashkernel arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G docs: kdump: Update the crashkernel description for arm64 .../admin-guide/kern
[PATCH v22 5/9] arm64: kdump: Reimplement crashkernel=X
From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is not enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will fail to boot because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- arch/arm64/kernel/machine_kexec.c | 9 +- arch/arm64/kernel/machine_kexec_file.c | 12 ++- arch/arm64/mm/init.c | 116 +++-- 3 files changed, 124 insertions(+), 13 deletions(-) diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index e16b248699d5c3c..19c2d487cb08feb 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -329,8 +329,13 @@ bool crash_is_nosave(unsigned long pfn) /* in reserved memory? */ addr = __pfn_to_phys(pfn); - if ((addr < crashk_res.start) || (crashk_res.end < addr)) - return false; + if ((addr < crashk_res.start) || (crashk_res.end < addr)) { + if (!crashk_low_res.end) + return false; + + if ((addr < crashk_low_res.start) || (crashk_low_res.end < addr)) + return false; + } if (!kexec_crash_image) return true; diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 59c648d51848886..889951291cc0f9c 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index f670bca160992b9..99d5539c13de3b1 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -90,43 +90,138 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit; phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; #endif +/* Current arm64 boot protocol requires 2MB alignment */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXmemblock.current_limit + +/* + * This is an empirical value in x86_64 and taken here directly. Please + * refer to the code comment in reserve_crashkernel_low() of x86_64 for more + * details. + */ +#define DEFAULT_CRASH_KERNEL_LOW_SIZE \ + max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20) + +static int __init reserve_crashkernel_low(unsigned long long low_size) +{ + unsigned long long low_base; + + /* passed with crashkernel=0,low ? */ + if (!low_size) + return 0; + + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + if (!low_base) { + pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size); + return -ENOMEM; + } + + pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld MB)\n", + low_base, low_base + low_size, low_size >> 20); + + crashk_low_res.start = low_base; + crashk_low_res.end = low_base + low_size - 1; + insert_resource(&iomem_resource, &crashk_low_res); + + return 0; +} + /* * reserve_crashkernel() - reserves memory for crash kernel * * This function reserves memory area given in "crashkernel=" kernel command * line parameter. The memory reserved is used by dump capture kernel when * primary kernel is crashing. + * + * NOTE: Reservation of crashkernel,low is special since its existence + * is not independent, need rely on the existence of crashkernel,high. + * Here, four cases of crashkernel low memory reservation are summarized: + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low + *memory takes
[PATCH v22 6/9] arm64: kdump: Use page-level mapping for the high memory of crashkernel
If the crashkernel has both high memory above 4G and low memory under 4G, kexec always loads the content such as Imge and dtb to the high memory instead of the low memory. This means that only high memory requires write protection based on page-level mapping. The allocation of high memory does not depend on the DMA boundary. So we can reserve the high memory first even if the crashkernel reservation is deferred. Signed-off-by: Zhen Lei --- arch/arm64/mm/init.c | 84 arch/arm64/mm/mmu.c | 3 +- 2 files changed, 86 insertions(+), 1 deletion(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 99d5539c13de3b1..b1b40b900fae170 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -90,6 +90,9 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit; phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; #endif +static bool crash_high_mem_reserved __initdata; +static struct resource crashk_res_high; + /* Current arm64 boot protocol requires 2MB alignment */ #define CRASH_ALIGNSZ_2M @@ -128,6 +131,66 @@ static int __init reserve_crashkernel_low(unsigned long long low_size) return 0; } +static void __init reserve_crashkernel_high(void) +{ + unsigned long long crash_base, crash_size; + char *cmdline = boot_command_line; + int ret; + + if (!IS_ENABLED(CONFIG_KEXEC_CORE)) + return; + + /* crashkernel=X[@offset] */ + ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + &crash_size, &crash_base); + if (ret || !crash_size) { + ret = parse_crashkernel_high(cmdline, 0, &crash_size, &crash_base); + if (ret || !crash_size) + return; + } + + crash_size = PAGE_ALIGN(crash_size); + + /* +* For the case crashkernel=X, may fall back to reserve memory above +* 4G, make reservations here in advance. It will be released later if +* the region is successfully reserved under 4G. +*/ + if (!crash_base) { + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, + crash_base, CRASH_ADDR_HIGH_MAX); + if (!crash_base) + return; + + crash_high_mem_reserved = true; + } + + /* Mark the memory range that requires page-level mappings */ + crashk_res.start = crash_base; + crashk_res.end = crash_base + crash_size - 1; +} + +static void __init hand_over_reserved_high_mem(void) +{ + crashk_res_high.start = crashk_res.start; + crashk_res_high.end = crashk_res.end; + + crashk_res.start = 0; + crashk_res.end = 0; +} + +static void __init take_reserved_high_mem(unsigned long long *crash_base, + unsigned long long *crash_size) +{ + *crash_base = crashk_res_high.start; + *crash_size = resource_size(&crashk_res_high); +} + +static void __init free_reserved_high_mem(void) +{ + memblock_phys_free(crashk_res_high.start, resource_size(&crashk_res_high)); +} + /* * reserve_crashkernel() - reserves memory for crash kernel * @@ -159,6 +222,8 @@ static void __init reserve_crashkernel(void) if (!IS_ENABLED(CONFIG_KEXEC_CORE)) return; + hand_over_reserved_high_mem(); + /* crashkernel=X[@offset] */ ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), &crash_size, &crash_base); @@ -177,6 +242,11 @@ static void __init reserve_crashkernel(void) high = true; crash_max = CRASH_ADDR_HIGH_MAX; + + if (crash_high_mem_reserved) { + take_reserved_high_mem(&crash_base, &crash_size); + goto reserve_low; + } } fixed_base = !!crash_base; @@ -195,6 +265,11 @@ static void __init reserve_crashkernel(void) * reserved later. */ if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) { + if (crash_high_mem_reserved) { + take_reserved_high_mem(&crash_base, &crash_size); + goto reserve_low; + } + crash_max = CRASH_ADDR_HIGH_MAX; goto retry; } @@ -212,6 +287,7 @@ static void __init reserve_crashkernel(void) * condition to make sure the crash low memory will be reserved. */ if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) { +reserve_low: /* case #3 of crashkernel,low reservation */ if (!high) crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; @@ -2
[PATCH v22 2/9] arm64: Use insert_resource() to simplify code
insert_resource() traverses the subtree layer by layer from the root node until a proper location is found. Compared with request_resource(), the parent node does not need to be determined in advance. In addition, move the insertion of node 'crashk_res' into function reserve_crashkernel() to make the associated code close together. Signed-off-by: Zhen Lei Acked-by: John Donnelly Acked-by: Baoquan He --- arch/arm64/kernel/setup.c | 17 +++-- arch/arm64/mm/init.c | 1 + 2 files changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 3505789cf4bd92a..fea3223704b6339 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -225,6 +225,8 @@ static void __init request_standard_resources(void) kernel_code.end = __pa_symbol(__init_begin - 1); kernel_data.start = __pa_symbol(_sdata); kernel_data.end = __pa_symbol(_end - 1); + insert_resource(&iomem_resource, &kernel_code); + insert_resource(&iomem_resource, &kernel_data); num_standard_resources = memblock.memory.cnt; res_size = num_standard_resources * sizeof(*standard_resources); @@ -246,20 +248,7 @@ static void __init request_standard_resources(void) res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; } - request_resource(&iomem_resource, res); - - if (kernel_code.start >= res->start && - kernel_code.end <= res->end) - request_resource(res, &kernel_code); - if (kernel_data.start >= res->start && - kernel_data.end <= res->end) - request_resource(res, &kernel_data); -#ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ - if (crashk_res.end && crashk_res.start >= res->start && - crashk_res.end <= res->end) - request_resource(res, &crashk_res); -#endif + insert_resource(&iomem_resource, res); } } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 8ac25f19084e898..f670bca160992b9 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -137,6 +137,7 @@ static void __init reserve_crashkernel(void) kmemleak_ignore_phys(crash_base); crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; + insert_resource(&iomem_resource, &crashk_res); } /* -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v22 1/9] kdump: return -ENOENT if required cmdline option does not exist
According to the current crashkernel=Y,low support in other ARCHes, it's an optional command-line option. When it doesn't exist, kernel will try to allocate minimum required memory below 4G automatically. However, __parse_crashkernel() returns '-EINVAL' for all error cases. It can't distinguish the nonexistent option from invalid option. Change __parse_crashkernel() to return '-ENOENT' for the nonexistent option case. With this change, crashkernel,low memory will take the default value if crashkernel=,low is not specified; while crashkernel reservation will fail and bail out if an invalid option is specified. Signed-off-by: Zhen Lei --- kernel/crash_core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 256cf6db573cd09..4d57c03714f4e13 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -243,9 +243,8 @@ static int __init __parse_crashkernel(char *cmdline, *crash_base = 0; ck_cmdline = get_last_crashkernel(cmdline, name, suffix); - if (!ck_cmdline) - return -EINVAL; + return -ENOENT; ck_cmdline += strlen(name); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v22 7/9] arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G
For the case crashkernel=X@offset and crashkernel=X,high, we've explicitly used 'crashk_res' to mark the scope of the page-level mapping required, so NO_BLOCK_MAPPINGS should not be required for other areas. Otherwise, system performance will be affected. In fact, only the case crashkernel=X requires page-level mapping for all low memory under 4G because it attempts high memory after it fails to request low memory first, and we cannot predict its final location. Signed-off-by: Zhen Lei --- arch/arm64/include/asm/kexec.h | 2 ++ arch/arm64/mm/init.c | 3 +++ arch/arm64/mm/mmu.c| 18 +- 3 files changed, 6 insertions(+), 17 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 9839bfc163d7147..8caf64065383844 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -80,6 +80,8 @@ static inline void crash_setup_regs(struct pt_regs *newregs, } } +extern bool crash_low_mem_page_map; + #if defined(CONFIG_KEXEC_CORE) && defined(CONFIG_HIBERNATION) extern bool crash_is_nosave(unsigned long pfn); extern void crash_prepare_suspend(void); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index b1b40b900fae170..d9676e30f9b657a 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -90,6 +90,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit; phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; #endif +bool crash_low_mem_page_map __initdata; static bool crash_high_mem_reserved __initdata; static struct resource crashk_res_high; @@ -147,6 +148,8 @@ static void __init reserve_crashkernel_high(void) ret = parse_crashkernel_high(cmdline, 0, &crash_size, &crash_base); if (ret || !crash_size) return; + } else if (!crash_base) { + crash_low_mem_page_map = true; } crash_size = PAGE_ALIGN(crash_size); diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index f84eca55b103d0c..56a973cb4c9cae6 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -483,21 +483,6 @@ void __init mark_linear_text_alias_ro(void) PAGE_KERNEL_RO); } -static bool crash_mem_map __initdata; - -static int __init enable_crash_mem_map(char *arg) -{ - /* -* Proper parameter parsing is done by reserve_crashkernel(). We only -* need to know if the linear map has to avoid block mappings so that -* the crashkernel reservations can be unmapped later. -*/ - crash_mem_map = true; - - return 0; -} -early_param("crashkernel", enable_crash_mem_map); - static void __init map_mem(pgd_t *pgdp) { static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN); @@ -528,8 +513,7 @@ static void __init map_mem(pgd_t *pgdp) memblock_mark_nomap(kernel_start, kernel_end - kernel_start); #ifdef CONFIG_KEXEC_CORE - if (crash_mem_map && - (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))) + if (crash_low_mem_page_map) eflags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; if (crashk_res.end) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v22 4/9] arm64: kdump: Don't force page-level mappings for memory above 4G
If the crashkernel reservation is deferred, such boundaries are not known when the linear mapping is created. But its upper limit is fixed, cannot above 4G. Therefore, unless otherwise required, block mapping should be used for memory above 4G to improve performance. Signed-off-by: Zhen Lei --- arch/arm64/mm/mmu.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 7666b4955e45cb3..8ccbc7f2216 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -504,7 +504,7 @@ static void __init map_mem(pgd_t *pgdp) phys_addr_t kernel_start = __pa_symbol(_stext); phys_addr_t kernel_end = __pa_symbol(__init_begin); phys_addr_t start, end; - int flags = NO_EXEC_MAPPINGS; + int flags = NO_EXEC_MAPPINGS, eflags = 0; u64 i; /* @@ -530,7 +530,7 @@ static void __init map_mem(pgd_t *pgdp) #ifdef CONFIG_KEXEC_CORE if (crash_mem_map && (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))) - flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; + eflags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; else if (crashk_res.end) memblock_mark_nomap(crashk_res.start, resource_size(&crashk_res)); @@ -540,13 +540,31 @@ static void __init map_mem(pgd_t *pgdp) for_each_mem_range(i, &start, &end) { if (start >= end) break; + +#ifdef CONFIG_KEXEC_CORE + if (eflags && (end >= SZ_4G)) { + /* +* The memory block cross the 4G boundary. +* Forcibly use page-level mappings for memory under 4G. +*/ + if (start < SZ_4G) { + __map_memblock(pgdp, start, SZ_4G - 1, + pgprot_tagged(PAGE_KERNEL), flags | eflags); + start = SZ_4G; + } + + /* Page-level mappings is not mandatory for memory above 4G */ + eflags = 0; + } +#endif + /* * The linear map must allow allocation tags reading/writing * if MTE is present. Otherwise, it has the same attributes as * PAGE_KERNEL. */ __map_memblock(pgdp, start, end, pgprot_tagged(PAGE_KERNEL), - flags); + flags | eflags); } /* -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v21 4/5] of: fdt: Add memory for devices by DT property "linux, usable-memory-range"
From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei Reviewed-by: Rob Herring Tested-by: Dave Kleikamp --- drivers/of/fdt.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index ec315b060cd50d2..2f248d0acc04830 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -973,16 +973,24 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range * location from flat tree */ void __init early_init_dt_check_for_usable_mem_range(void) { - const __be32 *prop; - int len; - phys_addr_t cap_mem_addr; - phys_addr_t cap_mem_size; + struct memblock_region rgn[MAX_USABLE_RANGES] = {0}; + const __be32 *prop, *endp; + int len, i; unsigned long node = chosen_node_offset; if ((long)node < 0) @@ -991,16 +999,21 @@ void __init early_init_dt_check_for_usable_mem_range(void) pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells))) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) { + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop); - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +i, &rgn[i].base, &rgn[i].size); + } - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn[0].base, rgn[0].size); + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++) + memblock_add(rgn[i].base, rgn[i].size); } #ifdef CONFIG_SERIAL_EARLYCON -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v21 0/5] support reserving crashkernel above 4G on arm64 kdump
low memory available for allocation. To solve these issues, change the behavior of crashkernel=X. crashkernel=X tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices. So there may be two regions reserved for crash dump kernel. In order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)", and pass the low region by reusing DT property "linux,usable-memory-range". We made the low memory region as the last range of "linux,usable-memory-range" to keep compatibility with existing user-space and older kdump kernels. Besides, we need to modify kexec-tools: arm64: support more than one crash kernel regions(see [1]) Another update is document about DT property 'linux,usable-memory-range': schemas: update 'linux,usable-memory-range' node schema(see [2]) [1]: https://www.spinics.net/lists/kexec/msg28226.html [2]: https://github.com/robherring/dt-schema/pull/19 [v1]: https://lkml.org/lkml/2019/4/2/1174 [v2]: https://lkml.org/lkml/2019/4/9/86 [v3]: https://lkml.org/lkml/2019/4/9/306 [v4]: https://lkml.org/lkml/2019/4/15/273 [v5]: https://lkml.org/lkml/2019/5/6/1360 [v6]: https://lkml.org/lkml/2019/8/30/142 [v7]: https://lkml.org/lkml/2019/12/23/411 [v8]: https://lkml.org/lkml/2020/5/21/213 [v9]: https://lkml.org/lkml/2020/6/28/73 [v10]: https://lkml.org/lkml/2020/7/2/1443 [v11]: https://lkml.org/lkml/2020/8/1/150 [v12]: https://lkml.org/lkml/2020/9/7/1037 [v13]: https://lkml.org/lkml/2020/10/31/34 [v14]: https://lkml.org/lkml/2021/1/30/53 [v15]: https://lkml.org/lkml/2021/10/19/1405 [v16]: https://lkml.org/lkml/2021/11/23/435 [v17]: https://lkml.org/lkml/2021/12/10/38 [v18]: https://lkml.org/lkml/2021/12/22/424 [v19]: https://lkml.org/lkml/2021/12/28/203 [v20]: https://lkml.org/lkml/2022/1/24/167 Chen Zhou (2): arm64: kdump: reimplement crashkernel=X of: fdt: Add memory for devices by DT property "linux,usable-memory-range" Zhen Lei (3): kdump: return -ENOENT if required cmdline option does not exist arm64: Use insert_resource() to simplify code docs: kdump: Update the crashkernel description for arm64 .../admin-guide/kernel-parameters.txt | 8 +- arch/arm64/kernel/machine_kexec.c | 9 +- arch/arm64/kernel/machine_kexec_file.c| 12 +- arch/arm64/kernel/setup.c | 17 +-- arch/arm64/mm/init.c | 107 -- drivers/of/fdt.c | 33 -- kernel/crash_core.c | 3 +- 7 files changed, 147 insertions(+), 42 deletions(-) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v21 2/5] arm64: Use insert_resource() to simplify code
insert_resource() traverses the subtree layer by layer from the root node until a proper location is found. Compared with request_resource(), the parent node does not need to be determined in advance. In addition, move the insertion of node 'crashk_res' into function reserve_crashkernel() to make the associated code close together. Signed-off-by: Zhen Lei Acked-by: John Donnelly Acked-by: Baoquan He --- arch/arm64/kernel/setup.c | 17 +++-- arch/arm64/mm/init.c | 1 + 2 files changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index f70573928f1bff0..a81efcc359e4e78 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -225,6 +225,8 @@ static void __init request_standard_resources(void) kernel_code.end = __pa_symbol(__init_begin - 1); kernel_data.start = __pa_symbol(_sdata); kernel_data.end = __pa_symbol(_end - 1); + insert_resource(&iomem_resource, &kernel_code); + insert_resource(&iomem_resource, &kernel_data); num_standard_resources = memblock.memory.cnt; res_size = num_standard_resources * sizeof(*standard_resources); @@ -246,20 +248,7 @@ static void __init request_standard_resources(void) res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; } - request_resource(&iomem_resource, res); - - if (kernel_code.start >= res->start && - kernel_code.end <= res->end) - request_resource(res, &kernel_code); - if (kernel_data.start >= res->start && - kernel_data.end <= res->end) - request_resource(res, &kernel_data); -#ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ - if (crashk_res.end && crashk_res.start >= res->start && - crashk_res.end <= res->end) - request_resource(res, &crashk_res); -#endif + insert_resource(&iomem_resource, res); } } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index db63cc885771a52..90f276d46b93bc6 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -109,6 +109,7 @@ static void __init reserve_crashkernel(void) kmemleak_ignore_phys(crash_base); crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; + insert_resource(&iomem_resource, &crashk_res); } #else static void __init reserve_crashkernel(void) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v21 1/5] kdump: return -ENOENT if required cmdline option does not exist
The crashkernel=Y,low is an optional command-line option. When it doesn't exist, kernel will try to allocate minimum required memory below 4G automatically. Give it a unique error code to distinguish it from other error scenarios. Signed-off-by: Zhen Lei --- kernel/crash_core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 256cf6db573cd09..4d57c03714f4e13 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -243,9 +243,8 @@ static int __init __parse_crashkernel(char *cmdline, *crash_base = 0; ck_cmdline = get_last_crashkernel(cmdline, name, suffix); - if (!ck_cmdline) - return -EINVAL; + return -ENOENT; ck_cmdline += strlen(name); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v21 5/5] docs: kdump: Update the crashkernel description for arm64
Now arm64 has added support for "crashkernel=X,high" and "crashkernel=Y,low", and implements "crashkernel=X[@offset]" in the same way as x86. So update the Documentation. Signed-off-by: Zhen Lei --- Documentation/admin-guide/kernel-parameters.txt | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index f5a27f067db9ed9..63098786c93828c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -789,7 +789,7 @@ memory region [offset, offset + size] for that kernel image. If '@offset' is omitted, then a suitable offset is selected automatically. - [KNL, X86-64] Select a region under 4G first, and + [KNL, X86-64, ARM64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. See Documentation/admin-guide/kdump/kdump.rst for further details. @@ -802,20 +802,20 @@ Documentation/admin-guide/kdump/kdump.rst for an example. crashkernel=size[KMG],high - [KNL, X86-64] range could be above 4G. Allow kernel + [KNL, X86-64, ARM64] range could be above 4G. Allow kernel to allocate physical memory region from top, so could be above 4G if system have more than 4G ram installed. Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. crashkernel=size[KMG],low - [KNL, X86-64] range under 4G. When crashkernel=X,high + [KNL, X86-64, ARM64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region above 4G, that cause second kernel crash on system that require some amount of low memory, e.g. swiotlb requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit - devices won't run out. Kernel would try to allocate at + devices won't run out. Kernel would try to allocate at least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v21 3/5] arm64: kdump: reimplement crashkernel=X
From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- arch/arm64/kernel/machine_kexec.c | 9 ++- arch/arm64/kernel/machine_kexec_file.c | 12 ++- arch/arm64/mm/init.c | 106 +++-- 3 files changed, 115 insertions(+), 12 deletions(-) diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index e16b248699d5c3c..19c2d487cb08feb 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -329,8 +329,13 @@ bool crash_is_nosave(unsigned long pfn) /* in reserved memory? */ addr = __pfn_to_phys(pfn); - if ((addr < crashk_res.start) || (crashk_res.end < addr)) - return false; + if ((addr < crashk_res.start) || (crashk_res.end < addr)) { + if (!crashk_low_res.end) + return false; + + if ((addr < crashk_low_res.start) || (crashk_low_res.end < addr)) + return false; + } if (!kexec_crash_image) return true; diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 59c648d51848886..889951291cc0f9c 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 90f276d46b93bc6..30ae6638ff54c47 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -65,6 +65,44 @@ EXPORT_SYMBOL(memstart_addr); phys_addr_t arm64_dma_phys_limit __ro_after_init; #ifdef CONFIG_KEXEC_CORE +/* Current arm64 boot protocol requires 2MB alignment */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXmemblock.current_limit + +/* + * This is an empirical value in x86_64 and taken here directly. Please + * refer to the code comment in reserve_crashkernel_low() of x86_64 for more + * details. + */ +#define DEFAULT_CRASH_KERNEL_LOW_SIZE \ + max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20) + +static int __init reserve_crashkernel_low(unsigned long long low_size) +{ + unsigned long long low_base; + + /* passed with crashkernel=0,low ? */ + if (!low_size) + return 0; + + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + if (!low_base) { + pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size); + return -ENOMEM; + } + + pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld MB)\n", + low_base, low_base + low_size, low_size >> 20); + + crashk_low_res.start = low_base; + crashk_low_res.end = low_base + low_size - 1; + insert_resource(&iomem_resource, &crashk_low_res); + + return 0; +} + /* * reserve_crashkernel() - reserves memory for crash kernel * @@ -75,30 +113,79 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_low_size; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; + bool fixed_base, high = false; + char *cmdline = boot_command_line; - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), + /* crash
[PATCH v20 2/5] arm64: kdump: introduce some macros for crash kernel reservation
From: Chen Zhou Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for upper bound of high crash memory, use macros instead. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp --- arch/arm64/mm/init.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 90f276d46b93bc6..6c653a2c7cff052 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -65,6 +65,12 @@ EXPORT_SYMBOL(memstart_addr); phys_addr_t arm64_dma_phys_limit __ro_after_init; #ifdef CONFIG_KEXEC_CORE +/* Current arm64 boot protocol requires 2MB alignment */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE + /* * reserve_crashkernel() - reserves memory for crash kernel * @@ -75,7 +81,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -90,8 +96,7 @@ static void __init reserve_crashkernel(void) if (crash_base) crash_max = crash_base + crash_size; - /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M, + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v20 4/5] of: fdt: Add memory for devices by DT property "linux, usable-memory-range"
From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei Reviewed-by: Rob Herring Tested-by: Dave Kleikamp --- drivers/of/fdt.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index ad85ff6474ff139..df4b9d2418a13d4 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -973,16 +973,24 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range * location from flat tree */ void __init early_init_dt_check_for_usable_mem_range(void) { - const __be32 *prop; - int len; - phys_addr_t cap_mem_addr; - phys_addr_t cap_mem_size; + struct memblock_region rgn[MAX_USABLE_RANGES] = {0}; + const __be32 *prop, *endp; + int len, i; unsigned long node = chosen_node_offset; if ((long)node < 0) @@ -991,16 +999,21 @@ void __init early_init_dt_check_for_usable_mem_range(void) pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells))) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) { + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop); - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +i, &rgn[i].base, &rgn[i].size); + } - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn[0].base, rgn[0].size); + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++) + memblock_add(rgn[i].base, rgn[i].size); } #ifdef CONFIG_SERIAL_EARLYCON -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v20 0/5] support reserving crashkernel above 4G on arm64 kdump
org/lkml/2021/12/10/38 [v18]: https://lkml.org/lkml/2021/12/22/424 [v19]: https://lkml.org/lkml/2021/12/28/203 Chen Zhou (4): arm64: kdump: introduce some macros for crash kernel reservation arm64: kdump: reimplement crashkernel=X of: fdt: Add memory for devices by DT property "linux,usable-memory-range" kdump: update Documentation about crashkernel Zhen Lei (1): arm64: Use insert_resource() to simplify code Documentation/admin-guide/kdump/kdump.rst | 11 ++- .../admin-guide/kernel-parameters.txt | 11 ++- arch/arm64/kernel/machine_kexec.c | 9 ++- arch/arm64/kernel/machine_kexec_file.c| 12 ++- arch/arm64/kernel/setup.c | 17 +--- arch/arm64/mm/init.c | 80 +-- drivers/of/fdt.c | 33 +--- 7 files changed, 134 insertions(+), 39 deletions(-) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v20 5/5] kdump: update Documentation about crashkernel
From: Chen Zhou For arm64, the behavior of crashkernel=X has been changed, which tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. So update the Documentation. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei --- Documentation/admin-guide/kdump/kdump.rst | 11 +-- Documentation/admin-guide/kernel-parameters.txt | 11 +-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index cb30ca3df27c9b2..d4c287044be0c70 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -361,8 +361,15 @@ Boot into System Kernel kernel will automatically locate the crash kernel image within the first 512MB of RAM if X is not given. - On arm64, use "crashkernel=Y[@X]". Note that the start address of - the kernel, X if explicitly specified, must be aligned to 2MiB (0x20). + On arm64, use "crashkernel=X" to try low allocation in DMA zone and + fall back to high allocation if it fails. + We can also use "crashkernel=X,high" to select a high region above + DMA zone, which also tries to allocate at least 256M low memory in + DMA zone automatically. + "crashkernel=Y,low" can be used to allocate specified size low memory. + Use "crashkernel=Y@X" if you really have to reserve memory from + specified start address X. Note that the start address of the kernel, + X if explicitly specified, must be aligned to 2MiB (0x20). Load the Dump-capture Kernel diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index f5a27f067db9ed9..65780c2ca830be0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -792,6 +792,9 @@ [KNL, X86-64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. + [KNL, ARM64] Try low allocation in DMA zone and fall back + to high allocation if it fails when '@offset' hasn't been + specified. See Documentation/admin-guide/kdump/kdump.rst for further details. crashkernel=range1:size1[,range2:size2,...][@offset] @@ -808,6 +811,8 @@ Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. + [KNL, ARM64] range in high memory. + Allow kernel to allocate physical memory region from top. crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region @@ -816,13 +821,15 @@ requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate at - at least 256M below 4G automatically. + least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. - + [KNL, ARM64] range in low memory. + This one let user to specify a low range in DMA zone for + crash dump kernel. cryptomgr.notests [KNL] Disable crypto self-tests -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v20 3/5] arm64: kdump: reimplement crashkernel=X
From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- arch/arm64/kernel/machine_kexec.c | 9 +++- arch/arm64/kernel/machine_kexec_file.c | 12 - arch/arm64/mm/init.c | 68 -- 3 files changed, 81 insertions(+), 8 deletions(-) diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index e16b248699d5c3c..19c2d487cb08feb 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -329,8 +329,13 @@ bool crash_is_nosave(unsigned long pfn) /* in reserved memory? */ addr = __pfn_to_phys(pfn); - if ((addr < crashk_res.start) || (crashk_res.end < addr)) - return false; + if ((addr < crashk_res.start) || (crashk_res.end < addr)) { + if (!crashk_low_res.end) + return false; + + if ((addr < crashk_low_res.start) || (crashk_low_res.end < addr)) + return false; + } if (!kexec_crash_image) return true; diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 59c648d51848886..889951291cc0f9c 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 6c653a2c7cff052..a5d43feac0d7d96 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -71,6 +71,30 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit #define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE +static int __init reserve_crashkernel_low(unsigned long long low_size) +{ + unsigned long long low_base; + + /* passed with crashkernel=0,low ? */ + if (!low_size) + return 0; + + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + if (!low_base) { + pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size); + return -ENOMEM; + } + + pr_info("crashkernel low memory reserved: 0x%llx - 0x%llx (%lld MB)\n", + low_base, low_base + low_size, low_size >> 20); + + crashk_low_res.start = low_base; + crashk_low_res.end = low_base + low_size - 1; + insert_resource(&iomem_resource, &crashk_low_res); + + return 0; +} + /* * reserve_crashkernel() - reserves memory for crash kernel * @@ -81,29 +105,62 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; + unsigned long long crash_low_size = SZ_256M; unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; + bool fixed_base; + char *cmdline = boot_command_line; - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), + /* crashkernel=X[@offset] */ + ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), &crash_size, &crash_base); - /* no crashkernel= or invalid value specified */ - if (ret || !crash_size) - return; + if (ret || !crash_size) { + unsigned long long low_size; + /* crashkernel=X,high */ + ret = parse_crashkernel_high(cmdline, 0, &crash_size, &crash_base); + if (ret || !crash_size) + retur
[PATCH v20 1/5] arm64: Use insert_resource() to simplify code
insert_resource() traverses the subtree layer by layer from the root node until a proper location is found. Compared with request_resource(), the parent node does not need to be determined in advance. In addition, move the insertion of node 'crashk_res' into function reserve_crashkernel() to make the associated code close together. Signed-off-by: Zhen Lei --- arch/arm64/kernel/setup.c | 17 +++-- arch/arm64/mm/init.c | 1 + 2 files changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index f70573928f1bff0..a81efcc359e4e78 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -225,6 +225,8 @@ static void __init request_standard_resources(void) kernel_code.end = __pa_symbol(__init_begin - 1); kernel_data.start = __pa_symbol(_sdata); kernel_data.end = __pa_symbol(_end - 1); + insert_resource(&iomem_resource, &kernel_code); + insert_resource(&iomem_resource, &kernel_data); num_standard_resources = memblock.memory.cnt; res_size = num_standard_resources * sizeof(*standard_resources); @@ -246,20 +248,7 @@ static void __init request_standard_resources(void) res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; } - request_resource(&iomem_resource, res); - - if (kernel_code.start >= res->start && - kernel_code.end <= res->end) - request_resource(res, &kernel_code); - if (kernel_data.start >= res->start && - kernel_data.end <= res->end) - request_resource(res, &kernel_data); -#ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ - if (crashk_res.end && crashk_res.start >= res->start && - crashk_res.end <= res->end) - request_resource(res, &crashk_res); -#endif + insert_resource(&iomem_resource, res); } } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index db63cc885771a52..90f276d46b93bc6 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -109,6 +109,7 @@ static void __init reserve_crashkernel(void) kmemleak_ignore_phys(crash_base); crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; + insert_resource(&iomem_resource, &crashk_res); } #else static void __init reserve_crashkernel(void) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v5] arm64: support more than one crash kernel regions
From: Chen Zhou When crashkernel is reserved above 4G in memory, kernel should reserve some amount of low memory for swiotlb and some DMA buffers. So there may be two crash kernel regions, one is below 4G, the other is above 4G. Currently, there is only one crash kernel region on arm64, and pass "linux,usable-memory-range = " property to crash dump kernel. Now, we pass "linux,usable-memory-range = " to crash dump kernel to support two crash kernel regions and load crash kernel high. Make the low memory region as the second range "BASE2 SIZE2" to keep compatibility with existing user-space and older kdump kernels. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- kexec/arch/arm64/crashdump-arm64.c | 44 +--- kexec/arch/arm64/crashdump-arm64.h | 5 +- kexec/arch/arm64/kexec-arm64.c | 84 -- 3 files changed, 86 insertions(+), 47 deletions(-) Changes since [v4]: 1. Repalce "Crash kernel (low)" with "Crash kernel", consistent with the x86 implementation. See: e25e6e7593ca ("kdump, x86: Process multiple Crash kernel in /proc/iomem") 2. Expand fdt_setprop_range() to fdt_setprop_ranges() instead of adding the latter. Parameter 'reverse' is added to control whether to reversely output data to fdt. Changes since [v3]: - Reuse DT property "linux,usable-memory-range". Reuse DT property "linux,usable-memory-range" to pass the low memory region. Changes since [v2]: - Rebase to latest kexec-tools code. Changes since [v1]: - Add another DT property "linux,low-memory-range" to crash dump kernel's dtb to pass the low region instead of reusing "linux,usable-memory-range". [1]: http://lists.infradead.org/pipermail/kexec/2019-April/022792.html [2]: http://lists.infradead.org/pipermail/kexec/2019-August/023569.html [3]: http://lists.infradead.org/pipermail/kexec/2020-May/025128.html [4]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html diff --git a/kexec/arch/arm64/crashdump-arm64.c b/kexec/arch/arm64/crashdump-arm64.c index 03d6204c9611dab..8558f260dc2b6b4 100644 --- a/kexec/arch/arm64/crashdump-arm64.c +++ b/kexec/arch/arm64/crashdump-arm64.c @@ -27,11 +27,11 @@ static struct memory_ranges system_memory_rgns; /* memory range reserved for crashkernel */ -struct memory_range crash_reserved_mem; +struct memory_range crash_reserved_mem[CRASH_MAX_RESERVED_RANGES]; struct memory_ranges usablemem_rgns = { .size = 0, - .max_size = 1, - .ranges = &crash_reserved_mem, + .max_size = CRASH_MAX_RESERVED_RANGES, + .ranges = crash_reserved_mem, }; struct memory_range elfcorehdr_mem; @@ -111,7 +111,7 @@ int is_crashkernel_mem_reserved(void) if (!usablemem_rgns.size) kexec_iomem_for_each_line(NULL, iomem_range_callback, NULL); - return crash_reserved_mem.start != crash_reserved_mem.end; + return usablemem_rgns.size; } /* @@ -125,6 +125,8 @@ int is_crashkernel_mem_reserved(void) */ static int crash_get_memory_ranges(void) { + int i; + /* * First read all memory regions that can be considered as * system memory including the crash area. @@ -132,16 +134,19 @@ static int crash_get_memory_ranges(void) if (!usablemem_rgns.size) kexec_iomem_for_each_line(NULL, iomem_range_callback, NULL); - /* allow only a single region for crash dump kernel */ - if (usablemem_rgns.size != 1) + /* allow one or two regions for crash dump kernel */ + if (!usablemem_rgns.size) return -EINVAL; - dbgprint_mem_range("Reserved memory range", &crash_reserved_mem, 1); + dbgprint_mem_range("Reserved memory range", + usablemem_rgns.ranges, usablemem_rgns.size); - if (mem_regions_alloc_and_exclude(&system_memory_rgns, - &crash_reserved_mem)) { - fprintf(stderr, "Cannot allocate memory for ranges\n"); - return -ENOMEM; + for (i = 0; i < usablemem_rgns.size; i++) { + if (mem_regions_alloc_and_exclude(&system_memory_rgns, + &crash_reserved_mem[i])) { + fprintf(stderr, "Cannot allocate memory for ranges\n"); + return -ENOMEM; + } } /* @@ -202,7 +207,8 @@ int load_crashdump_segments(struct kexec_info *info) return EFAILED; elfcorehdr = add_buffer_phys_virt(info, buf, bufsz, bufsz, 0, - crash_reserved_mem.start, crash_reserved_mem.end, + crash_reserved_mem[usablemem_rgns.size - 1].start, + crash_reserved_mem[usablemem_rgns.size - 1].end, -1, 0); elfcorehdr_mem.sta
[PATCH v19 05/13] x86/setup: Add and use CRASH_BASE_ALIGN
Add macro CRASH_BASE_ALIGN to indicate the alignment for crash kernel fixed region, in preparation for making partial implementation of reserve_crashkernel[_low]() generic. Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 93d78aae1937db3..cb7f237a2ae0dfa 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -392,9 +392,12 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE -/* 16M alignment for crash kernel regions */ +/* alignment for crash kernel dynamic regions */ #define CRASH_ALIGNSZ_16M +/* alignment for crash kernel fixed region */ +#define CRASH_BASE_ALIGN SZ_1M + /* * Keep the crash kernel below this limit. * @@ -509,7 +512,7 @@ static void __init reserve_crashkernel(void) } else { unsigned long long start; - start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base, + start = memblock_phys_alloc_range(crash_size, CRASH_BASE_ALIGN, crash_base, crash_base + crash_size); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v19 09/13] x86/setup: Use generic reserve_crashkernel_mem[_low]()
Use generic reserve_crashkernel_mem[_low]() to replace arch-specific reserve_crashkernel_low() and a partial implementation of reserve_crashkernel(). The only difference is that "insert_resource(&iomem_resource, &crashk_low_res);" is moved into reserve_crashkernel(), no functional change. Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 93 ++--- 1 file changed, 4 insertions(+), 89 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 22d63dbf5db0a58..ee2606b3b9da662 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -391,52 +391,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) */ #ifdef CONFIG_KEXEC_CORE - -static int __init reserve_crashkernel_low(unsigned long long low_size) -{ -#ifdef CONFIG_X86_64 - unsigned long long low_base = 0; - unsigned long low_mem_limit; - - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); - - /* crashkernel=Y,low is not specified */ - if ((long)low_size < 0) { - /* -* two parts from kernel/dma/swiotlb.c: -* -swiotlb size: user-specified with swiotlb= or default. -* -* -swiotlb overflow buffer: now hardcoded to 32k. We round it -* to 8M for other buffers that may need to stay low too. Also -* make sure we allocate enough extra low memory so that we -* don't run out of DMA buffers for 32-bit devices. -*/ - low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20); - } else { - /* passed with crashkernel=0,low ? */ - if (!low_size) - return 0; - } - - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); - if (!low_base) { - pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n", - (unsigned long)(low_size >> 20)); - return -ENOMEM; - } - - pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n", - (unsigned long)(low_size >> 20), - (unsigned long)(low_base >> 20), - (unsigned long)(low_mem_limit >> 20)); - - crashk_low_res.start = low_base; - crashk_low_res.end = low_base + low_size - 1; - insert_resource(&iomem_resource, &crashk_low_res); -#endif - return 0; -} - static void __init reserve_crashkernel(void) { unsigned long long crash_size, crash_base, total_mem, low_size; @@ -460,51 +414,12 @@ static void __init reserve_crashkernel(void) return; } - /* 0 means: find the address automatically */ - if (!crash_base) { - /* -* Set CRASH_ADDR_LOW_MAX upper bound for crash memory, -* crashkernel=x,high reserves memory over 4G, also allocates -* 256M extra low memory for DMA buffers and swiotlb. -* But the extra memory is not required for all machines. -* So try low memory first and fall back to high memory -* unless "crashkernel=size[KMG],high" is specified. -*/ - if (!high) - crash_base = memblock_phys_alloc_range(crash_size, - CRASH_ALIGN, CRASH_ALIGN, - CRASH_ADDR_LOW_MAX); - if (!crash_base) - crash_base = memblock_phys_alloc_range(crash_size, - CRASH_ALIGN, CRASH_ALIGN, - CRASH_ADDR_HIGH_MAX); - if (!crash_base) { - pr_info("crashkernel reservation failed - No suitable area found.\n"); - return; - } - } else { - unsigned long long start; - - start = memblock_phys_alloc_range(crash_size, CRASH_BASE_ALIGN, crash_base, - crash_base + crash_size); - if (start != crash_base) { - pr_info("crashkernel reservation failed - memory is in use.\n"); - return; - } - } - - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low(low_size)) { - memblock_phys_free(crash_base, crash_size); + ret = reserve_crashkernel_mem(total_mem, crash_size, crash_base, low_size, high); + if (ret) return; - } - - pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n", - (unsigned long)(
[PATCH v19 10/13] arm64: kdump: introduce some macros for crash kernel reservation
From: Chen Zhou Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for upper bound of high crash memory, use macros instead. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp --- arch/arm64/include/asm/kexec.h | 6 ++ arch/arm64/mm/init.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 9839bfc163d7147..f019e78dede02dc 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -25,6 +25,12 @@ #define KEXEC_ARCH KEXEC_ARCH_AARCH64 +/* alignment for crash kernel dynamic regions */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE + #ifndef __ASSEMBLY__ /** diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index a8834434af99ae0..be4595dc7459115 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void) crash_max = crash_base + crash_size; /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M, + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v19 11/13] arm64: kdump: reimplement crashkernel=X
From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Another minor change, there may be two regions reserved for crash dump kernel, in order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)". Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- arch/arm64/kernel/machine_kexec.c | 5 +++- arch/arm64/kernel/machine_kexec_file.c | 12 ++-- arch/arm64/kernel/setup.c | 13 +++- arch/arm64/mm/init.c | 41 ++ 4 files changed, 42 insertions(+), 29 deletions(-) diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 6fb31c117ebe08c..6665bf31f6b6a19 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -327,7 +327,10 @@ bool crash_is_nosave(unsigned long pfn) /* in reserved memory? */ addr = __pfn_to_phys(pfn); - if ((addr < crashk_res.start) || (crashk_res.end < addr)) + if (((addr < crashk_res.start) || (crashk_res.end < addr)) && !crashk_low_res.end) + return false; + + if ((addr < crashk_low_res.start) || (crashk_low_res.end < addr)) return false; if (!kexec_crash_image) diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 59c648d51848886..889951291cc0f9c 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index be5f85b0a24de69..4bb2e55366be64d 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -248,7 +248,18 @@ static void __init request_standard_resources(void) kernel_data.end <= res->end) request_resource(res, &kernel_data); #ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ + /* +* Userspace will find "Crash kernel" or "Crash kernel (low)" +* region in /proc/iomem. +* In order to distinct from the high region and make no effect +* to the use of existing kexec-tools, rename the low region as +* "Crash kernel (low)". +*/ + if (crashk_low_res.end && crashk_low_res.start >= res->start && + crashk_low_res.end <= res->end) { + crashk_low_res.name = "Crash kernel (low)"; + request_resource(res, &crashk_low_res); + } if (crashk_res.end && crashk_res.start >= res->start && crashk_res.end <= res->end) request_resource(res, &crashk_res); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index be4595dc7459115..91b8038a1529068 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -74,41 +74,32 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; */ static void __init reserve_crashkernel(void) { - unsigned long long crash_base, crash_size; - unsigned long long crash_max = CRASH_ADDR_LOW_MAX; + unsigned long long crash_size, crash_base, total_mem, low_size; + bool high = false; int ret; - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), - &crash_size, &cr
[PATCH v19 12/13] of: fdt: Add memory for devices by DT property "linux, usable-memory-range"
From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei Reviewed-by: Rob Herring Tested-by: Dave Kleikamp --- drivers/of/fdt.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 65af475dfa9508f..20e6281b2201ff5 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -967,16 +967,24 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range * location from flat tree */ void __init early_init_dt_check_for_usable_mem_range(void) { - const __be32 *prop; - int len; - phys_addr_t cap_mem_addr; - phys_addr_t cap_mem_size; + struct memblock_region rgn[MAX_USABLE_RANGES] = {0}; + const __be32 *prop, *endp; + int len, i; unsigned long node = chosen_node_offset; if ((long)node < 0) @@ -985,16 +993,21 @@ void __init early_init_dt_check_for_usable_mem_range(void) pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells))) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) { + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop); - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +i, &rgn[i].base, &rgn[i].size); + } - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn[0].base, rgn[0].size); + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++) + memblock_add(rgn[i].base, rgn[i].size); } #ifdef CONFIG_SERIAL_EARLYCON -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v19 13/13] kdump: update Documentation about crashkernel
From: Chen Zhou For arm64, the behavior of crashkernel=X has been changed, which tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. So update the Documentation. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei --- Documentation/admin-guide/kdump/kdump.rst | 11 +-- Documentation/admin-guide/kernel-parameters.txt | 11 +-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index cb30ca3df27c9b2..d4c287044be0c70 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -361,8 +361,15 @@ Boot into System Kernel kernel will automatically locate the crash kernel image within the first 512MB of RAM if X is not given. - On arm64, use "crashkernel=Y[@X]". Note that the start address of - the kernel, X if explicitly specified, must be aligned to 2MiB (0x20). + On arm64, use "crashkernel=X" to try low allocation in DMA zone and + fall back to high allocation if it fails. + We can also use "crashkernel=X,high" to select a high region above + DMA zone, which also tries to allocate at least 256M low memory in + DMA zone automatically. + "crashkernel=Y,low" can be used to allocate specified size low memory. + Use "crashkernel=Y@X" if you really have to reserve memory from + specified start address X. Note that the start address of the kernel, + X if explicitly specified, must be aligned to 2MiB (0x20). Load the Dump-capture Kernel diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fc34332c8d9a6df..5fafeea70f8f14d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -783,6 +783,9 @@ [KNL, X86-64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. + [KNL, ARM64] Try low allocation in DMA zone and fall back + to high allocation if it fails when '@offset' hasn't been + specified. See Documentation/admin-guide/kdump/kdump.rst for further details. crashkernel=range1:size1[,range2:size2,...][@offset] @@ -799,6 +802,8 @@ Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. + [KNL, ARM64] range in high memory. + Allow kernel to allocate physical memory region from top. crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region @@ -807,13 +812,15 @@ requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate at - at least 256M below 4G automatically. + least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. - + [KNL, ARM64] range in low memory. + This one let user to specify a low range in DMA zone for + crash dump kernel. cryptomgr.notests [KNL] Disable crypto self-tests -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v19 08/13] x86/setup: Move CRASH[_BASE]_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to asm/kexec.h
From: Chen Zhou Move CRASH[_BASE]_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to the arch-specific header in preparation of using generic reserve_crashkernel_mem[_low](). Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- arch/x86/include/asm/kexec.h | 27 +++ arch/x86/kernel/setup.c | 27 --- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 11b7c06e2828c30..452c35ce3e3fc54 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -18,6 +18,33 @@ # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 +/* alignment for crash kernel dynamic regions */ +#define CRASH_ALIGNSZ_16M + +/* alignment for crash kernel fixed region */ +#define CRASH_BASE_ALIGN SZ_1M + +/* + * Keep the crash kernel below this limit. + * + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range + * due to mapping restrictions. + * + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is + * the upper limit of system RAM in 4-level paging mode. Since the kdump + * jump could be from 5-level paging to 4-level paging, the jump will fail if + * the kernel is put above 64 TB, and during the 1st kernel bootup there's + * no good way to detect the paging mode of the target kernel which will be + * loaded for dumping. + */ +#ifdef CONFIG_X86_32 +# define CRASH_ADDR_LOW_MAXSZ_512M +# define CRASH_ADDR_HIGH_MAX SZ_512M +#else +# define CRASH_ADDR_LOW_MAXSZ_4G +# define CRASH_ADDR_HIGH_MAX SZ_64T +#endif + #ifndef __ASSEMBLY__ #include diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index cb7f237a2ae0dfa..22d63dbf5db0a58 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -392,33 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE -/* alignment for crash kernel dynamic regions */ -#define CRASH_ALIGNSZ_16M - -/* alignment for crash kernel fixed region */ -#define CRASH_BASE_ALIGN SZ_1M - -/* - * Keep the crash kernel below this limit. - * - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range - * due to mapping restrictions. - * - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is - * the upper limit of system RAM in 4-level paging mode. Since the kdump - * jump could be from 5-level paging to 4-level paging, the jump will fail if - * the kernel is put above 64 TB, and during the 1st kernel bootup there's - * no good way to detect the paging mode of the target kernel which will be - * loaded for dumping. - */ -#ifdef CONFIG_X86_32 -# define CRASH_ADDR_LOW_MAXSZ_512M -# define CRASH_ADDR_HIGH_MAX SZ_512M -#else -# define CRASH_ADDR_LOW_MAXSZ_4G -# define CRASH_ADDR_HIGH_MAX SZ_64T -#endif - static int __init reserve_crashkernel_low(unsigned long long low_size) { #ifdef CONFIG_X86_64 -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v19 06/13] kexec: move crashk[_low]_res to crash_core module
From: Chen Zhou Move the definition and declaration of global variable crashk[_low]_res from kexec module to crash_core module, in preparation of adding generic reserve_crashkernel_mem[_low]() to crash_core.c, the latter refers to variable crashk[_low]_res. Due to the config KEXEC automatically selects CRASH_CORE, and the header crash_core.h is included by kexec.h, so there is no functional change. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei --- include/linux/crash_core.h | 4 include/linux/kexec.h | 4 kernel/crash_core.c| 16 kernel/kexec_core.c| 17 - 4 files changed, 20 insertions(+), 21 deletions(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 598fd55d83c169e..f5437c9c9411fce 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -73,6 +73,10 @@ extern unsigned char *vmcoreinfo_data; extern size_t vmcoreinfo_size; extern u32 *vmcoreinfo_note; +/* Location of a reserved region to hold the crash kernel. */ +extern struct resource crashk_res; +extern struct resource crashk_low_res; + Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len); void final_note(Elf_Word *buf); diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0c994ae37729e1e..47e784d66ea8645 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -350,10 +350,6 @@ extern int kexec_load_disabled; #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \ KEXEC_FILE_NO_INITRAMFS) -/* Location of a reserved region to hold the crash kernel. - */ -extern struct resource crashk_res; -extern struct resource crashk_low_res; extern note_buf_t __percpu *crash_notes; /* flag to track if kexec reboot is in progress */ diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b7d024eb464d0ae..686d8a65e12a337 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -22,6 +22,22 @@ u32 *vmcoreinfo_note; /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */ static unsigned char *vmcoreinfo_data_safecopy; +/* Location of the reserved area for the crash kernel */ +struct resource crashk_res = { + .name = "Crash kernel", + .start = 0, + .end = 0, + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM, + .desc = IORES_DESC_CRASH_KERNEL +}; +struct resource crashk_low_res = { + .name = "Crash kernel", + .start = 0, + .end = 0, + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM, + .desc = IORES_DESC_CRASH_KERNEL +}; + /* * parsing the "crashkernel" commandline * diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 5a5d192a89ac307..1e0d4909bbb6b77 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -54,23 +54,6 @@ note_buf_t __percpu *crash_notes; /* Flag to indicate we are going to kexec a new kernel */ bool kexec_in_progress = false; - -/* Location of the reserved area for the crash kernel */ -struct resource crashk_res = { - .name = "Crash kernel", - .start = 0, - .end = 0, - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM, - .desc = IORES_DESC_CRASH_KERNEL -}; -struct resource crashk_low_res = { - .name = "Crash kernel", - .start = 0, - .end = 0, - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM, - .desc = IORES_DESC_CRASH_KERNEL -}; - int kexec_should_crash(struct task_struct *p) { /* -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v19 00/13] support reserving crashkernel above 4G on arm64 kdump
the architecture-specific code, extend the property "linux,usable-memory-range" in the platform-agnostic FDT core code. See patch 9. - Discard the x86 description update in the document, because the description has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide"). - Change "arm64" to "ARM64" in Doc. Changes since [v13] - Rebased on top of 5.11-rc5. - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL. Since reserve_crashkernel[_low]() implementations are quite similar on other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and select this by X86 and ARM64. - Some minor cleanup. Changes since [v12] - Rebased on top of 5.10-rc1. - Keep CRASH_ALIGN as 16M suggested by Dave. - Drop patch "kdump: add threshold for the required memory". - Add Tested-by from John. Changes since [v11] - Rebased on top of 5.9-rc4. - Make the function reserve_crashkernel() of x86 generic. Suggested by Catalin, make the function reserve_crashkernel() of x86 generic and arm64 use the generic version to reimplement crashkernel=X. Changes since [v10] - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin. Changes since [v9] - Patch 1 add Acked-by from Dave. - Update patch 5 according to Dave's comments. - Update chosen schema. Changes since [v8] - Reuse DT property "linux,usable-memory-range". Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low memory region. - Fix kdump broken with ZONE_DMA reintroduced. - Update chosen schema. Changes since [v7] - Move x86 CRASH_ALIGN to 2M Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M. - Update Documentation/devicetree/bindings/chosen.txt. Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd. - Add Tested-by from Jhon and pk. Changes since [v6] - Fix build errors reported by kbuild test robot. Changes since [v5] - Move reserve_crashkernel_low() into kernel/crash_core.c. - Delete crashkernel=X,high. - Modify crashkernel=X,low. If crashkernel=X,low is specified simultaneously, reserve spcified size low memory for crash kdump kernel devices firstly and then reserve memory above 4G. In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then pass to crash dump kernel by DT property "linux,low-memory-range". - Update Documentation/admin-guide/kdump/kdump.rst. Changes since [v4] - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike. Changes since [v3] - Add memblock_cap_memory_ranges back for multiple ranges. - Fix some compiling warnings. Changes since [v2] - Split patch "arm64: kdump: support reserving crashkernel above 4G" as two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate patch. Changes since [v1]: - Move common reserve_crashkernel_low() code into kernel/kexec_core.c. - Remove memblock_cap_memory_ranges() i added in v1 and implement that in fdt_enforce_memory_region(). There are at most two crash kernel regions, for two crash kernel regions case, we cap the memory range [min(regs[*].start), max(regs[*].end)] and then remove the memory range in the middle. [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html [2]: https://github.com/robherring/dt-schema/pull/19 [v1]: https://lkml.org/lkml/2019/4/2/1174 [v2]: https://lkml.org/lkml/2019/4/9/86 [v3]: https://lkml.org/lkml/2019/4/9/306 [v4]: https://lkml.org/lkml/2019/4/15/273 [v5]: https://lkml.org/lkml/2019/5/6/1360 [v6]: https://lkml.org/lkml/2019/8/30/142 [v7]: https://lkml.org/lkml/2019/12/23/411 [v8]: https://lkml.org/lkml/2020/5/21/213 [v9]: https://lkml.org/lkml/2020/6/28/73 [v10]: https://lkml.org/lkml/2020/7/2/1443 [v11]: https://lkml.org/lkml/2020/8/1/150 [v12]: https://lkml.org/lkml/2020/9/7/1037 [v13]: https://lkml.org/lkml/2020/10/31/34 [v14]: https://lkml.org/lkml/2021/1/30/53 [v15]: https://lkml.org/lkml/2021/10/19/1405 [v16]: https://lkml.org/lkml/2021/11/23/435 [v17]: https://lkml.org/lkml/2021/12/10/38 [v18]: https://lkml.org/lkml/2021/12/22/424 Chen Zhou (6): kexec: move crashk[_low]_res to crash_core module x86/setup: Move CRASH[_BASE]_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to asm/kexec.h arm64: kdump: introduce some macros for crash kernel reservation arm64: kdump: reimplement crashkernel=X of: fdt: Add memory for devices by DT property "linux,usable-memory-range" kdump: update Documentation about crashkernel Zhen Lei (7): kdump: add helper parse_crashkernel_high_low() x86/setup: Use parse_crashkernel_high_low() to simplify code kdump: make parse_crashkernel_{high|low}() static kdump: reduce unnecessary parameters of parse_crashkernel_{high|low}() x86/setup: Add and use CRASH_BASE_ALIGN kdump: Add helper reserve_crashkernel_mem[_low]() x86/setup: Use generic reserve_crashkernel_mem[_low]() Documentation/admin-guide/
[PATCH v19 02/13] x86/setup: Use parse_crashkernel_high_low() to simplify code
Use parse_crashkernel_high_low() to bring the parsing of "crashkernel=X,high" and the parsing of "crashkernel=Y,low" together, they are strongly dependent, make code logic clear and more readable. Suggested-by: Borislav Petkov Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6a190c7f4d71b05..93d78aae1937db3 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -416,18 +416,16 @@ static void __init memblock_x86_reserve_range_setup_data(void) # define CRASH_ADDR_HIGH_MAX SZ_64T #endif -static int __init reserve_crashkernel_low(void) +static int __init reserve_crashkernel_low(unsigned long long low_size) { #ifdef CONFIG_X86_64 - unsigned long long base, low_base = 0, low_size = 0; + unsigned long long low_base = 0; unsigned long low_mem_limit; - int ret; low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base); - if (ret) { + /* crashkernel=Y,low is not specified */ + if ((long)low_size < 0) { /* * two parts from kernel/dma/swiotlb.c: * -swiotlb size: user-specified with swiotlb= or default. @@ -465,7 +463,7 @@ static int __init reserve_crashkernel_low(void) static void __init reserve_crashkernel(void) { - unsigned long long crash_size, crash_base, total_mem; + unsigned long long crash_size, crash_base, total_mem, low_size; bool high = false; int ret; @@ -474,10 +472,9 @@ static void __init reserve_crashkernel(void) /* crashkernel=XM */ ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base); if (ret != 0 || crash_size <= 0) { - /* crashkernel=X,high */ - ret = parse_crashkernel_high(boot_command_line, total_mem, -&crash_size, &crash_base); - if (ret != 0 || crash_size <= 0) + /* crashkernel=X,high and possible crashkernel=Y,low */ + ret = parse_crashkernel_high_low(boot_command_line, &crash_size, &low_size); + if (ret) return; high = true; } @@ -520,7 +517,7 @@ static void __init reserve_crashkernel(void) } } - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { + if (crash_base >= (1ULL << 32) && reserve_crashkernel_low(low_size)) { memblock_phys_free(crash_base, crash_size); return; } -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v19 01/13] kdump: add helper parse_crashkernel_high_low()
The bootup command line option crashkernel=Y,low is valid only when crashkernel=X,high is specified. Putting their parsing into a separate function makes the code logic clearer and easier to understand the strong dependencies between them. Signed-off-by: Zhen Lei --- include/linux/crash_core.h | 3 +++ kernel/crash_core.c| 35 +++ 2 files changed, 38 insertions(+) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index de62a722431e7db..2d3a64761d18998 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -83,5 +83,8 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); +int __init parse_crashkernel_high_low(char *cmdline, + unsigned long long *high_size, + unsigned long long *low_size); #endif /* LINUX_CRASH_CORE_H */ diff --git a/kernel/crash_core.c b/kernel/crash_core.c index eb53f5ec62c900f..8966beaf7c4fd52 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -295,6 +295,41 @@ int __init parse_crashkernel_low(char *cmdline, "crashkernel=", suffix_tbl[SUFFIX_LOW]); } +/** + * parse_crashkernel_high_low - Parsing "crashkernel=X,high" and possible + * "crashkernel=Y,low". + * @cmdline: The bootup command line. + * @high_size: Save the memory size specified by "crashkernel=X,high". + * @low_size: Save the memory size specified by "crashkernel=Y,low" or "-1" + * if it's not specified. + * + * Returns 0 on success, else a negative status code. + */ +int __init parse_crashkernel_high_low(char *cmdline, + unsigned long long *high_size, + unsigned long long *low_size) +{ + int ret; + unsigned long long base; + + BUG_ON(!high_size || !low_size); + + /* crashkernel=X,high */ + ret = parse_crashkernel_high(cmdline, 0, high_size, &base); + if (ret) + return ret; + + if (*high_size <= 0) + return -EINVAL; + + /* crashkernel=Y,low */ + ret = parse_crashkernel_low(cmdline, 0, low_size, &base); + if (ret) + *low_size = -1; + + return 0; +} + Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len) { -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v19 07/13] kdump: Add helper reserve_crashkernel_mem[_low]()
Add helper reserve_crashkernel_mem[_low]() to reserve high and low memory for crash kernel. The implementation of these two functions is based on function reserve_crashkernel[_low]() in arch/x86/kernel/setup.c. But the following adaptations are made: 1. To avoid compilation problems on other architectures, provide default values for macro CRASH[_BASE]_ALIGN, CRASH_ADDR_{LOW|HIGH}_MAX, and add new macro CRASH_LOW_SIZE_MIN. 2. Only functions that reserve crash memory are extracted from reserve_crashkernel(), excluding functions such as parse_crashkernel() and insert_resource(). 3. Change "return;" in reserve_crashkernel() to "return -ENOMEM;". 4. Change call reserve_crashkernel_low() to call reserve_crashkernel_mem_low(). 5. Change CONFIG_X86_64 to CONFIG_64BIT. Signed-off-by: Zhen Lei --- include/linux/crash_core.h | 6 ++ kernel/crash_core.c| 154 - 2 files changed, 159 insertions(+), 1 deletion(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index f5437c9c9411fce..2e19632f8d45a60 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -86,5 +86,11 @@ int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, int __init parse_crashkernel_high_low(char *cmdline, unsigned long long *high_size, unsigned long long *low_size); +int __init reserve_crashkernel_mem_low(unsigned long long low_size); +int __init reserve_crashkernel_mem(unsigned long long system_ram, + unsigned long long crash_size, + unsigned long long crash_base, + unsigned long long low_size, + bool high); #endif /* LINUX_CRASH_CORE_H */ diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 686d8a65e12a337..4bd30098534a184 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -5,7 +5,9 @@ */ #include -#include +#include +#include +#include #include #include @@ -345,6 +347,156 @@ int __init parse_crashkernel_high_low(char *cmdline, return 0; } +/* alignment for crash kernel dynamic regions */ +#ifndef CRASH_ALIGN +#define CRASH_ALIGNSZ_2M +#endif + +/* alignment for crash kernel fixed region */ +#ifndef CRASH_BASE_ALIGN +#define CRASH_BASE_ALIGN SZ_2M +#endif + +/* upper bound for crash low memory */ +#ifndef CRASH_ADDR_LOW_MAX +#ifdef CONFIG_PHYS_ADDR_T_64BIT +#define CRASH_ADDR_LOW_MAX SZ_4G +#else +#define CRASH_ADDR_LOW_MAX MEMBLOCK_ALLOC_ACCESSIBLE +#endif +#endif + +/* upper bound for crash high memory */ +#ifndef CRASH_ADDR_HIGH_MAX +#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE +#endif + +#ifdef CONFIG_SWIOTLB +/* + * two parts from kernel/dma/swiotlb.c: + * -swiotlb size: user-specified with swiotlb= or default. + * + * -swiotlb overflow buffer: now hardcoded to 32k. We round it + * to 8M for other buffers that may need to stay low too. Also + * make sure we allocate enough extra low memory so that we + * don't run out of DMA buffers for 32-bit devices. + */ +#define CRASH_LOW_SIZE_MIN max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20) +#else +#define CRASH_LOW_SIZE_MIN (256UL << 20) +#endif + +/** + * reserve_crashkernel_mem_low - Reserve crash kernel low memory. + * + * @low_size: The memory size specified by "crashkernel=Y,low" or "-1" + * if it's not specified. + * + * Returns 0 on success, else a negative status code. + */ +int __init reserve_crashkernel_mem_low(unsigned long long low_size) +{ +#ifdef CONFIG_64BIT + unsigned long long low_base = 0; + unsigned long low_mem_limit; + + low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); + + /* crashkernel=Y,low is not specified */ + if ((long)low_size < 0) { + low_size = CRASH_LOW_SIZE_MIN; + } else { + /* passed with crashkernel=0,low ? */ + if (!low_size) + return 0; + } + + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + if (!low_base) { + pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n", + (unsigned long)(low_size >> 20)); + return -ENOMEM; + } + + pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n", + (unsigned long)(low_size >> 20), + (unsigned long)(low_base >> 20), + (unsigned long)(low_mem_limit >> 20)); + + crashk_low_res.start = low_base; + crashk_low_res.end = low_base + low_size - 1; +#endif + return 0; +} + +/** + * reserve_crashkernel_mem - Reserve crash kernel
[PATCH v19 04/13] kdump: reduce unnecessary parameters of parse_crashkernel_{high|low}()
Delete confusing parameters 'system_ram' and 'crash_base' of parse_crashkernel_{high|low}(), they are only needed by the case of "crashkernel=X@[offset]". Signed-off-by: Zhen Lei --- kernel/crash_core.c | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 3b9e01fc450b2a4..b7d024eb464d0ae 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -278,20 +278,20 @@ int __init parse_crashkernel(char *cmdline, } static int __init parse_crashkernel_high(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) +unsigned long long *crash_size) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, + unsigned long long base; + + return __parse_crashkernel(cmdline, 0, crash_size, &base, "crashkernel=", suffix_tbl[SUFFIX_HIGH]); } static int __init parse_crashkernel_low(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) + unsigned long long *crash_size) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, + unsigned long long base; + + return __parse_crashkernel(cmdline, 0, crash_size, &base, "crashkernel=", suffix_tbl[SUFFIX_LOW]); } @@ -310,12 +310,11 @@ int __init parse_crashkernel_high_low(char *cmdline, unsigned long long *low_size) { int ret; - unsigned long long base; BUG_ON(!high_size || !low_size); /* crashkernel=X,high */ - ret = parse_crashkernel_high(cmdline, 0, high_size, &base); + ret = parse_crashkernel_high(cmdline, high_size); if (ret) return ret; @@ -323,7 +322,7 @@ int __init parse_crashkernel_high_low(char *cmdline, return -EINVAL; /* crashkernel=Y,low */ - ret = parse_crashkernel_low(cmdline, 0, low_size, &base); + ret = parse_crashkernel_low(cmdline, low_size); if (ret) *low_size = -1; -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v19 03/13] kdump: make parse_crashkernel_{high|low}() static
Make parse_crashkernel_{high|low}() static, they are only referenced by parse_crashkernel_high_low() in the same file. The latter is recommended. Signed-off-by: Zhen Lei --- include/linux/crash_core.h | 4 kernel/crash_core.c| 4 ++-- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 2d3a64761d18998..598fd55d83c169e 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -79,10 +79,6 @@ void final_note(Elf_Word *buf); int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); int __init parse_crashkernel_high_low(char *cmdline, unsigned long long *high_size, unsigned long long *low_size); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 8966beaf7c4fd52..3b9e01fc450b2a4 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -277,7 +277,7 @@ int __init parse_crashkernel(char *cmdline, "crashkernel=", NULL); } -int __init parse_crashkernel_high(char *cmdline, +static int __init parse_crashkernel_high(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base) @@ -286,7 +286,7 @@ int __init parse_crashkernel_high(char *cmdline, "crashkernel=", suffix_tbl[SUFFIX_HIGH]); } -int __init parse_crashkernel_low(char *cmdline, +static int __init parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/4] x86/setup: Use parse_crashkernel_high_low() to simplify code
Use parse_crashkernel_high_low() to bring the parsing of "crashkernel=X,high" and the parsing of "crashkernel=Y,low" together, they are strongly dependent, make code logic clear and more readable. Suggested-by: Borislav Petkov Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6a190c7f4d71b05..93d78aae1937db3 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -416,18 +416,16 @@ static void __init memblock_x86_reserve_range_setup_data(void) # define CRASH_ADDR_HIGH_MAX SZ_64T #endif -static int __init reserve_crashkernel_low(void) +static int __init reserve_crashkernel_low(unsigned long long low_size) { #ifdef CONFIG_X86_64 - unsigned long long base, low_base = 0, low_size = 0; + unsigned long long low_base = 0; unsigned long low_mem_limit; - int ret; low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base); - if (ret) { + /* crashkernel=Y,low is not specified */ + if ((long)low_size < 0) { /* * two parts from kernel/dma/swiotlb.c: * -swiotlb size: user-specified with swiotlb= or default. @@ -465,7 +463,7 @@ static int __init reserve_crashkernel_low(void) static void __init reserve_crashkernel(void) { - unsigned long long crash_size, crash_base, total_mem; + unsigned long long crash_size, crash_base, total_mem, low_size; bool high = false; int ret; @@ -474,10 +472,9 @@ static void __init reserve_crashkernel(void) /* crashkernel=XM */ ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base); if (ret != 0 || crash_size <= 0) { - /* crashkernel=X,high */ - ret = parse_crashkernel_high(boot_command_line, total_mem, -&crash_size, &crash_base); - if (ret != 0 || crash_size <= 0) + /* crashkernel=X,high and possible crashkernel=Y,low */ + ret = parse_crashkernel_high_low(boot_command_line, &crash_size, &low_size); + if (ret) return; high = true; } @@ -520,7 +517,7 @@ static void __init reserve_crashkernel(void) } } - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { + if (crash_base >= (1ULL << 32) && reserve_crashkernel_low(low_size)) { memblock_phys_free(crash_base, crash_size); return; } -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 0/4] kdump: add parse_crashkernel_high_low() to replace parse_crashkernel_{high|low}()
The bootup command line option crashkernel=Y,low is valid only when crashkernel=X,high is specified. Putting their parsing into a separate function makes the code logic clearer and easier to understand the strong dependencies between them. So add parse_crashkernel_high_low() and use it to repalce parse_crashkernel_{high|low}(). Then make the latter static, and reduce two confusing parameters 'system_ram' and 'crash_base' of them. All four patches in this series do the cleanups, no functional change. This patchset is also a preparation for supporting reserve crashkernel above 4G on arm64, and share code with x86. The main proposal was made by Borislav Petkov. > As I've already alluded to in another mail, ontop of this there should > be a patch or multiple patches which clean this up more and perhaps even > split it into separate functions doing stuff in this order: > > 1. Parse all crashkernel= cmdline options > 2. Do all crash_base, crash_size etc checks > 3. Do the memory reservations > > And all that supplied with comments explaining why stuff is being done. Zhen Lei (4): kdump: add helper parse_crashkernel_high_low() x86/setup: Use parse_crashkernel_high_low() to simplify code kdump: make parse_crashkernel_{high|low}() static kdump: reduce unnecessary parameters of parse_crashkernel_{high|low}() arch/x86/kernel/setup.c| 21 +++ include/linux/crash_core.h | 7 +++-- kernel/crash_core.c| 54 +++--- 3 files changed, 56 insertions(+), 26 deletions(-) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/4] kdump: add helper parse_crashkernel_high_low()
The bootup command line option crashkernel=Y,low is valid only when crashkernel=X,high is specified. Putting their parsing into a separate function makes the code logic clearer and easier to understand the strong dependencies between them. Signed-off-by: Zhen Lei --- include/linux/crash_core.h | 3 +++ kernel/crash_core.c| 35 +++ 2 files changed, 38 insertions(+) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index de62a722431e7db..2d3a64761d18998 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -83,5 +83,8 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); +int __init parse_crashkernel_high_low(char *cmdline, + unsigned long long *high_size, + unsigned long long *low_size); #endif /* LINUX_CRASH_CORE_H */ diff --git a/kernel/crash_core.c b/kernel/crash_core.c index eb53f5ec62c900f..8ab59a0e04f178f 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -295,6 +295,41 @@ int __init parse_crashkernel_low(char *cmdline, "crashkernel=", suffix_tbl[SUFFIX_LOW]); } +/** + * parse_crashkernel_high_low - Parsing "crashkernel=X,high" and possible + * "crashkernel=Y,low". + * @cmdline: The bootup command line. + * @high_size: Save the memory size specified by "crashkernel=X,high". + * @low_size: Save the memory size specified by "crashkernel=Y,low" or "-1" + if it's not specified. + * + * Returns 0 on success, else a negative status code. + */ +int __init parse_crashkernel_high_low(char *cmdline, + unsigned long long *high_size, + unsigned long long *low_size) +{ + int ret; + unsigned long long base; + + BUG_ON(!high_size || !low_size); + + /* crashkernel=X,high */ + ret = parse_crashkernel_high(cmdline, 0, high_size, &base); + if (ret) + return ret; + + if (*high_size <= 0) + return -EINVAL; + + /* crashkernel=Y,low */ + ret = parse_crashkernel_low(cmdline, 0, low_size, &base); + if (ret) + *low_size = -1; + + return 0; +} + Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len) { -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 4/4] kdump: reduce unnecessary parameters of parse_crashkernel_{high|low}()
Delete confusing parameters 'system_ram' and 'crash_base' of parse_crashkernel_{high|low}(), they are only needed by the case of "crashkernel=X@[offset]". Signed-off-by: Zhen Lei --- kernel/crash_core.c | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 97001820396295e..0ebf5efce3119c5 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -278,20 +278,20 @@ int __init parse_crashkernel(char *cmdline, } static int __init parse_crashkernel_high(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) +unsigned long long *crash_size) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, + unsigned long long base; + + return __parse_crashkernel(cmdline, 0, crash_size, &base, "crashkernel=", suffix_tbl[SUFFIX_HIGH]); } static int __init parse_crashkernel_low(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) + unsigned long long *crash_size) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, + unsigned long long base; + + return __parse_crashkernel(cmdline, 0, crash_size, &base, "crashkernel=", suffix_tbl[SUFFIX_LOW]); } @@ -310,12 +310,11 @@ int __init parse_crashkernel_high_low(char *cmdline, unsigned long long *low_size) { int ret; - unsigned long long base; BUG_ON(!high_size || !low_size); /* crashkernel=X,high */ - ret = parse_crashkernel_high(cmdline, 0, high_size, &base); + ret = parse_crashkernel_high(cmdline, high_size); if (ret) return ret; @@ -323,7 +322,7 @@ int __init parse_crashkernel_high_low(char *cmdline, return -EINVAL; /* crashkernel=Y,low */ - ret = parse_crashkernel_low(cmdline, 0, low_size, &base); + ret = parse_crashkernel_low(cmdline, low_size); if (ret) *low_size = -1; -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 3/4] kdump: make parse_crashkernel_{high|low}() static
Make parse_crashkernel_{high|low}() static, they are only referenced by parse_crashkernel_high_low() in the same file. The latter is recommended. Signed-off-by: Zhen Lei --- include/linux/crash_core.h | 4 kernel/crash_core.c| 4 ++-- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 2d3a64761d18998..598fd55d83c169e 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -79,10 +79,6 @@ void final_note(Elf_Word *buf); int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); int __init parse_crashkernel_high_low(char *cmdline, unsigned long long *high_size, unsigned long long *low_size); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 8ab59a0e04f178f..97001820396295e 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -277,7 +277,7 @@ int __init parse_crashkernel(char *cmdline, "crashkernel=", NULL); } -int __init parse_crashkernel_high(char *cmdline, +static int __init parse_crashkernel_high(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base) @@ -286,7 +286,7 @@ int __init parse_crashkernel_high(char *cmdline, "crashkernel=", suffix_tbl[SUFFIX_HIGH]); } -int __init parse_crashkernel_low(char *cmdline, +static int __init parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 17/17] kdump: update Documentation about crashkernel
From: Chen Zhou For arm64, the behavior of crashkernel=X has been changed, which tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. So update the Documentation. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei --- Documentation/admin-guide/kdump/kdump.rst | 11 +-- Documentation/admin-guide/kernel-parameters.txt | 13 ++--- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index cb30ca3df27c9b2..d4c287044be0c70 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -361,8 +361,15 @@ Boot into System Kernel kernel will automatically locate the crash kernel image within the first 512MB of RAM if X is not given. - On arm64, use "crashkernel=Y[@X]". Note that the start address of - the kernel, X if explicitly specified, must be aligned to 2MiB (0x20). + On arm64, use "crashkernel=X" to try low allocation in DMA zone and + fall back to high allocation if it fails. + We can also use "crashkernel=X,high" to select a high region above + DMA zone, which also tries to allocate at least 256M low memory in + DMA zone automatically. + "crashkernel=Y,low" can be used to allocate specified size low memory. + Use "crashkernel=Y@X" if you really have to reserve memory from + specified start address X. Note that the start address of the kernel, + X if explicitly specified, must be aligned to 2MiB (0x20). Load the Dump-capture Kernel diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fc34332c8d9a6df..ff5f15008707cab 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -783,6 +783,9 @@ [KNL, X86-64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. + [KNL, ARM64] Try low allocation in DMA zone and fall back + to high allocation if it fails when '@offset' hasn't been + specified. See Documentation/admin-guide/kdump/kdump.rst for further details. crashkernel=range1:size1[,range2:size2,...][@offset] @@ -798,7 +801,9 @@ be above 4G if system have more than 4G ram installed. Otherwise memory region will be allocated below 4G, if available. - It will be ignored if crashkernel=X is specified. + It will be ignored if crashkernel=X is correctly specified. + [KNL, ARM64] range in high memory. + Allow kernel to allocate physical memory region from top. crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region @@ -807,13 +812,15 @@ requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate at - at least 256M below 4G automatically. + least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. - + [KNL, ARM64] range in low memory. + This one let user to specify a low range in DMA zone for + crash dump kernel. cryptomgr.notests [KNL] Disable crypto self-tests -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 13/17] arm64: kdump: introduce some macros for crash kernel reservation
From: Chen Zhou Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for upper bound of high crash memory, use macros instead. Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound of crash kernel reservation. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp --- arch/arm64/include/asm/kexec.h | 6 ++ arch/arm64/mm/init.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 9839bfc163d7147..1b9edc69f0244ca 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -25,6 +25,12 @@ #define KEXEC_ARCH KEXEC_ARCH_AARCH64 +/* 2M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE + #ifndef __ASSEMBLY__ /** diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index a8834434af99ae0..be4595dc7459115 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void) crash_max = crash_base + crash_size; /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M, + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 16/17] of: fdt: Add memory for devices by DT property "linux, usable-memory-range"
From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei Reviewed-by: Rob Herring Tested-by: Dave Kleikamp --- drivers/of/fdt.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 37b477a51175359..f7b72fa773250ad 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -967,6 +967,15 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range * location from flat tree @@ -974,10 +983,9 @@ static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; */ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) { - const __be32 *prop; - int len; - phys_addr_t cap_mem_addr; - phys_addr_t cap_mem_size; + struct memblock_region rgn[MAX_USABLE_RANGES] = {0}; + const __be32 *prop, *endp; + int len, i; if ((long)node < 0) return; @@ -985,16 +993,21 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells))) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) { + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop); - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +i, &rgn[i].base, &rgn[i].size); + } - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn[0].base, rgn[0].size); + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++) + memblock_add(rgn[i].base, rgn[i].size); } #ifdef CONFIG_SERIAL_EARLYCON -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 15/17] of: fdt: Aggregate the processing of "linux, usable-memory-range"
Currently, we parse the "linux,usable-memory-range" property in early_init_dt_scan_chosen(), to obtain the specified memory range of the crash kernel. We then reserve the required memory after early_init_dt_scan_memory() has identified all available physical memory. Because the two pieces of code are separated far, the readability and maintainability are reduced. So bring them together. Suggested-by: Rob Herring Signed-off-by: Zhen Lei Reviewed-by: Rob Herring Tested-by: Dave Kleikamp --- drivers/of/fdt.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index bdca35284cebd56..37b477a51175359 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -965,8 +965,7 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) elfcorehdr_addr, elfcorehdr_size); } -static phys_addr_t cap_mem_addr; -static phys_addr_t cap_mem_size; +static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range @@ -977,6 +976,11 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) { const __be32 *prop; int len; + phys_addr_t cap_mem_addr; + phys_addr_t cap_mem_size; + + if ((long)node < 0) + return; pr_debug("Looking for usable-memory-range property... "); @@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, &cap_mem_size); + + memblock_cap_memory_range(cap_mem_addr, cap_mem_size); } #ifdef CONFIG_SERIAL_EARLYCON @@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0)) return 0; + chosen_node_offset = node; + early_init_dt_check_for_initrd(node); early_init_dt_check_for_elfcorehdr(node); - early_init_dt_check_for_usable_mem_range(node); /* Retrieve command line */ p = of_get_flat_dt_prop(node, "bootargs", &l); @@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void) of_scan_flat_dt(early_init_dt_scan_memory, NULL); /* Handle linux,usable-memory-range property */ - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + early_init_dt_check_for_usable_mem_range(chosen_node_offset); } bool __init early_init_dt_scan(void *params) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 14/17] arm64: kdump: reimplement crashkernel=X
From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Another minor change, there may be two regions reserved for crash dump kernel, in order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)". Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/kexec.h | 4 ++ arch/arm64/kernel/machine_kexec_file.c | 12 +- arch/arm64/kernel/setup.c | 13 +- arch/arm64/mm/init.c | 59 +- 5 files changed, 38 insertions(+), 51 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4207cf9bb17ffb..4b99efa36da3793 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,7 @@ config ARM64 select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36) select ARCH_WANT_LD_ORPHAN_WARN + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select ARCH_WANTS_NO_INSTR select ARCH_HAS_UBSAN_SANITIZE_ALL select ARM_AMBA diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 1b9edc69f0244ca..3bde0079925d771 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {} static inline void crash_post_resume(void) {} #endif +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #if defined(CONFIG_KEXEC_CORE) void cpu_soft_restart(unsigned long el2_switch, unsigned long entry, unsigned long arg0, unsigned long arg1, diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 59c648d51848886..889951291cc0f9c 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index be5f85b0a24de69..4bb2e55366be64d 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -248,7 +248,18 @@ static void __init request_standard_resources(void) kernel_data.end <= res->end) request_resource(res, &kernel_data); #ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ + /* +* Userspace will find "Crash kernel" or "Crash kernel (low)" +* region in /proc/iomem. +* In order to distinct from the high region and make no effect +* to the use of existing kexec-tools, rename the low region as +* "Crash kernel (low)". +*/ + if (crashk_low_res.end && crashk_low_res.start >= res->start && + crashk_low_res.end <= res->end) { + crashk_low_res.name = "Crash kernel (low)"; + request_resource(res, &crashk_low_res); + } if (crashk_res.end && crashk_res.start >= res->start && crashk_res.end <= res->end) request_resource(res, &crashk_res); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index be4595dc7459115..85c83e4eff2b6c4 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c
[PATCH v18 12/17] kdump: Reduce unused parameters of parse_crashkernel_{high|low}
The parameters 'system_ram' and 'crash_base' is only needed by the case of "crashkernel=X@[offset]". The argument list of parse_crashkernel_suffix() can help prove this point. Signed-off-by: Zhen Lei --- kernel/crash_core.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index a037076b89a9bb2..67f5065e3c3cfcc 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -288,19 +288,19 @@ int __init parse_crashkernel(char *cmdline, #ifdef CONFIG_64BIT static int __init parse_crashkernel_high(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) +unsigned long long *crash_size) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_HIGH); + unsigned long long base; + + return __parse_crashkernel(cmdline, 0, crash_size, &base, SUFFIX_HIGH); } static int __init parse_crashkernel_low(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) + unsigned long long *crash_size) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_LOW); + unsigned long long base; + + return __parse_crashkernel(cmdline, 0, crash_size, &base, SUFFIX_LOW); } static int __init reserve_crashkernel_low(unsigned long long low_size) @@ -368,14 +368,14 @@ static int __init parse_crashkernel_in_order(char *cmdline, #ifdef CONFIG_64BIT /* crashkernel=X,high */ - ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base); + ret = parse_crashkernel_high(cmdline, crash_size); if (ret || crash_size <= 0) return CRASHKERNEL_MEM_NONE; flag = CRASHKERNEL_MEM_HIGH; /* crashkernel=Y,low */ - ret = parse_crashkernel_low(cmdline, system_ram, low_size, crash_base); + ret = parse_crashkernel_low(cmdline, low_size); if (!ret) flag |= CRASHKERNEL_MEM_LOW; #endif -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 07/17] x86/setup: Eliminate a magic number in reserve_crashkernel()
From: Chen Zhou Replace '(1ULL << 32)' with CRASH_ADDR_LOW_MAX to improve readability, they are equal. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 52aa925877ca787..abff57ffbe92884 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -523,7 +523,7 @@ static void __init reserve_crashkernel(void) } #ifdef CONFIG_X86_64 - if (crash_base >= (1ULL << 32)) { + if (crash_base >= CRASH_ADDR_LOW_MAX) { /* * Ensure that at least 256M extra low memory is allocated for * DMA buffers and swiotlb, if low memory size is not specified. -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 09/17] x86/setup: Move reserve_crashkernel[_low]() into crash_core.c
From: Chen Zhou Make the functions reserve_crashkernel[_low]() as generic. Since reserve_crashkernel[_low]() implementations are quite similar on other architectures as well, we can have more users of this later. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- arch/x86/include/asm/kexec.h | 4 + arch/x86/kernel/setup.c | 172 -- kernel/crash_core.c | 176 ++- 3 files changed, 179 insertions(+), 173 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index a0223a6c0238a15..21fd1f2796e1057 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -216,6 +216,10 @@ typedef void crash_vmclear_fn(void); extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss; extern void kdump_nmi_shootdown_cpus(void); +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_KEXEC_H */ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index beb73cce4b8b826..ba56e410f01811d 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -386,178 +386,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) } } -/* - * - Crashkernel reservation -- - */ - -#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL - -#ifdef CONFIG_64BIT -static int __init reserve_crashkernel_low(unsigned long long low_size) -{ - unsigned long long low_base = 0; - unsigned long low_mem_limit; - - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); - - /* passed with crashkernel=0,low ? */ - if (!low_size) - return 0; - - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); - if (!low_base) { - pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n", - (unsigned long)(low_size >> 20)); - return -ENOMEM; - } - - pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n", - (unsigned long)(low_size >> 20), - (unsigned long)(low_base >> 20), - (unsigned long)(low_mem_limit >> 20)); - - crashk_low_res.start = low_base; - crashk_low_res.end = low_base + low_size - 1; - - return 0; -} -#endif - -#define CRASHKERNEL_MEM_NONE 0x0 /* crashkernel= is not exist or invalid */ -#define CRASHKERNEL_MEM_CLASSIC0x1 /* crashkernel=X[@offset] is valid */ -#define CRASHKERNEL_MEM_HIGH 0x2 /* crashkernel=X,high is valid */ -#define CRASHKERNEL_MEM_LOW0x4 /* crashkernel=X,low is valid */ - -/** - * parse_crashkernel_in_order - Parse all "crashkernel=" configurations in - * priority order until a valid combination is found. - * @cmdline: The bootup command line. - * @system_ram: Total system memory size. - * @crash_size: Save the memory size specified by "crashkernel=X[@offset]" or - * "crashkernel=X,high". - * @crash_base: Save the base address specified by "crashkernel=X@offset" - * @low_size: Save the memory size specified by "crashkernel=X,low" - * - * Returns the status flag of the parsing result of "crashkernel=", such as - * CRASHKERNEL_MEM_NONE, CRASHKERNEL_MEM_HIGH. - */ -static int __init parse_crashkernel_in_order(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base, -unsigned long long *low_size) -{ - int ret, flag = CRASHKERNEL_MEM_NONE; - - BUG_ON(!crash_size || !crash_base || !low_size); - - /* crashkernel=X[@offset] */ - ret = parse_crashkernel(cmdline, system_ram, crash_size, crash_base); - if (!ret && crash_size > 0) - return CRASHKERNEL_MEM_CLASSIC; - -#ifdef CONFIG_64BIT - /* crashkernel=X,high */ - ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base); - if (ret || crash_size <= 0) - return CRASHKERNEL_MEM_NONE; - - flag = CRASHKERNEL_MEM_HIGH; - - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(cmdline, system_ram, low_size, crash_base); - if (!ret) - flag |= CRASHKERNEL_MEM_LOW; -#endif - - return flag; -} - -static void __init reserve_crashkernel(void) -{ - unsigned long long crash_size, crash_base, total_mem, low_size; - int flag; - - total_mem = memblock_phys_mem_size(); - - flag = parse_crashkernel_in_order(boot_com
[PATCH v18 10/17] kdump: Simplify the parameters of __parse_crashkernel()
After commit adbc742bf786 ("x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low"), all kdump bootup parameters start with "crashkernel=". Therefore, it is better for __parse_crashkernel() to use it directly than for the caller to pass it. So the parameter 'name' can be omitted. Similarly, we can pass the suffix type instead of the suffix name to avoid the global variable 'suffix_tbl' appearing in multiple places. There is no change in functionality, but it makes the code look a little more concise. Signed-off-by: Zhen Lei --- kernel/crash_core.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 64ed082382f3f18..496dae2718cf026 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -233,11 +233,12 @@ static int __init __parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base, -const char *name, -const char *suffix) +int suffix_type) { char*first_colon, *first_space; char*ck_cmdline; + const char *name = "crashkernel="; + const char *suffix = suffix_tbl[suffix_type]; BUG_ON(!crash_size || !crash_base); *crash_size = 0; @@ -275,8 +276,7 @@ int __init parse_crashkernel(char *cmdline, unsigned long long *crash_size, unsigned long long *crash_base) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - "crashkernel=", NULL); + return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_NULL); } int __init parse_crashkernel_high(char *cmdline, @@ -284,8 +284,7 @@ int __init parse_crashkernel_high(char *cmdline, unsigned long long *crash_size, unsigned long long *crash_base) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - "crashkernel=", suffix_tbl[SUFFIX_HIGH]); + return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_HIGH); } int __init parse_crashkernel_low(char *cmdline, @@ -293,8 +292,7 @@ int __init parse_crashkernel_low(char *cmdline, unsigned long long *crash_size, unsigned long long *crash_base) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - "crashkernel=", suffix_tbl[SUFFIX_LOW]); + return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_LOW); } /* -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 11/17] kdump: Make parse_crashkernel_{high|low} static
Currently, parse_crashkernel_{high|low} is only referenced by parse_crashkernel_in_order(), and they are in the same file. So make it static, and move it into "#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL". Signed-off-by: Zhen Lei --- include/linux/crash_core.h | 4 kernel/crash_core.c| 19 ++- 2 files changed, 10 insertions(+), 13 deletions(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index de62a722431e7db..529a0a783190bfe 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -79,9 +79,5 @@ void final_note(Elf_Word *buf); int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); #endif /* LINUX_CRASH_CORE_H */ diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 496dae2718cf026..a037076b89a9bb2 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -279,7 +279,15 @@ int __init parse_crashkernel(char *cmdline, return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_NULL); } -int __init parse_crashkernel_high(char *cmdline, + +/* + * - Crashkernel reservation -- + */ + +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL + +#ifdef CONFIG_64BIT +static int __init parse_crashkernel_high(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base) @@ -287,7 +295,7 @@ int __init parse_crashkernel_high(char *cmdline, return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_HIGH); } -int __init parse_crashkernel_low(char *cmdline, +static int __init parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base) @@ -295,13 +303,6 @@ int __init parse_crashkernel_low(char *cmdline, return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_LOW); } -/* - * - Crashkernel reservation -- - */ - -#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL - -#ifdef CONFIG_64BIT static int __init reserve_crashkernel_low(unsigned long long low_size) { unsigned long long low_base = 0; -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 08/17] x86/setup: Add build option ARCH_WANT_RESERVE_CRASH_KERNEL
From: Chen Zhou There are multiple ARCHs that implement reserve_crashkernel(), all of them are marked as static. Currently, we want to combine the implementations on x86 and arm64 into one, and move it to public crash_core.c. To avoid symbol conflicts on other platforms, add a new build option ARCH_WANT_RESERVE_CRASH_KERNEL. And change CONFIG_X86_64 to CONFIG_64BIT, so it can be shared with arm64, or other users in future. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- arch/Kconfig| 3 +++ arch/x86/Kconfig| 2 ++ arch/x86/kernel/setup.c | 10 +- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index d3c4ab249e9c275..f53dd7852290b9a 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -11,6 +11,9 @@ source "arch/$(SRCARCH)/Kconfig" menu "General architecture-dependent options" +config ARCH_WANT_RESERVE_CRASH_KERNEL + bool + config CRASH_CORE bool diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5c2ccb85f2efb86..bd78ed8193079b9 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -12,6 +12,7 @@ config X86_32 depends on !64BIT # Options that are inherently 32-bit kernel only: select ARCH_WANT_IPC_PARSE_VERSION + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select CLKSRC_I8253 select CLONE_BACKWARDS select GENERIC_VDSO_32 @@ -28,6 +29,7 @@ config X86_64 select ARCH_HAS_GIGANTIC_PAGE select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index abff57ffbe92884..beb73cce4b8b826 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -390,9 +390,9 @@ static void __init memblock_x86_reserve_range_setup_data(void) * - Crashkernel reservation -- */ -#ifdef CONFIG_KEXEC_CORE +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL -#ifdef CONFIG_X86_64 +#ifdef CONFIG_64BIT static int __init reserve_crashkernel_low(unsigned long long low_size) { unsigned long long low_base = 0; @@ -456,7 +456,7 @@ static int __init parse_crashkernel_in_order(char *cmdline, if (!ret && crash_size > 0) return CRASHKERNEL_MEM_CLASSIC; -#ifdef CONFIG_X86_64 +#ifdef CONFIG_64BIT /* crashkernel=X,high */ ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base); if (ret || crash_size <= 0) @@ -522,7 +522,7 @@ static void __init reserve_crashkernel(void) } } -#ifdef CONFIG_X86_64 +#ifdef CONFIG_64BIT if (crash_base >= CRASH_ADDR_LOW_MAX) { /* * Ensure that at least 256M extra low memory is allocated for @@ -556,7 +556,7 @@ static void __init reserve_crashkernel(void) crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; } -#endif +#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */ static struct resource standard_io_resources[] = { { .name = "dma1", .start = 0x00, .end = 0x1f, -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 06/17] x86/setup: Update comments in reserve_crashkernel()
Add comments to describe which bootup parameters are processed by the code, and make comments close to the code being commented. Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 22 +++--- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 07a58313db5c5f7..52aa925877ca787 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -485,20 +485,20 @@ static void __init reserve_crashkernel(void) if (flag == CRASHKERNEL_MEM_NONE) return; - /* 0 means: find the address automatically */ if (!crash_base) { /* -* Set CRASH_ADDR_LOW_MAX upper bound for crash memory, -* crashkernel=x,high reserves memory over 4G, also allocates -* 256M extra low memory for DMA buffers and swiotlb. -* But the extra memory is not required for all machines. -* So try low memory first and fall back to high memory -* unless "crashkernel=size[KMG],high" is specified. +* For the case of crashkernel=X[@offset] and offset is omitted, +* try the low memory first. */ if (!(flag & CRASHKERNEL_MEM_HIGH)) crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_LOW_MAX); + + /* +* If low memory allocation failed above, or for the case of +* crashkernel=X,high, try the high memory. +*/ if (!crash_base) crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, CRASH_ALIGN, @@ -510,6 +510,10 @@ static void __init reserve_crashkernel(void) } else { unsigned long long start; + /* +* The case of crashkernel=X@offset and offset is specified. +* Only user-specified space can be reserved. +*/ start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base, crash_base + crash_size); if (start != crash_base) { @@ -520,6 +524,10 @@ static void __init reserve_crashkernel(void) #ifdef CONFIG_X86_64 if (crash_base >= (1ULL << 32)) { + /* +* Ensure that at least 256M extra low memory is allocated for +* DMA buffers and swiotlb, if low memory size is not specified. +*/ if (!(flag & CRASHKERNEL_MEM_LOW)) { /* * two parts from kernel/dma/swiotlb.c: -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 05/17] x86/setup: Use parse_crashkernel_in_order() to make code logic clear
Currently, the parsing of "crashkernel=X,high" and the parsing of "crashkernel=X,low" are not in the same function, but they are strongly dependent, which affects readability. Use parse_crashkernel_in_order() to bring them together. In addition, the operation to ensure at least 256M low memory is moved out of reserve_craskernel_low() so that it only needs to focus on low memory allocation. Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 69 ++--- 1 file changed, 30 insertions(+), 39 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index f997074d36f2484..07a58313db5c5f7 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -393,32 +393,16 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE #ifdef CONFIG_X86_64 -static int __init reserve_crashkernel_low(void) +static int __init reserve_crashkernel_low(unsigned long long low_size) { - unsigned long long base, low_base = 0, low_size = 0; + unsigned long long low_base = 0; unsigned long low_mem_limit; - int ret; low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base); - if (ret) { - /* -* two parts from kernel/dma/swiotlb.c: -* -swiotlb size: user-specified with swiotlb= or default. -* -* -swiotlb overflow buffer: now hardcoded to 32k. We round it -* to 8M for other buffers that may need to stay low too. Also -* make sure we allocate enough extra low memory so that we -* don't run out of DMA buffers for 32-bit devices. -*/ - low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20); - } else { - /* passed with crashkernel=0,low ? */ - if (!low_size) - return 0; - } + /* passed with crashkernel=0,low ? */ + if (!low_size) + return 0; low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); if (!low_base) { @@ -457,7 +441,6 @@ static int __init reserve_crashkernel_low(void) * Returns the status flag of the parsing result of "crashkernel=", such as * CRASHKERNEL_MEM_NONE, CRASHKERNEL_MEM_HIGH. */ -__maybe_unused static int __init parse_crashkernel_in_order(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, @@ -492,22 +475,15 @@ static int __init parse_crashkernel_in_order(char *cmdline, static void __init reserve_crashkernel(void) { - unsigned long long crash_size, crash_base, total_mem; - bool high = false; - int ret; + unsigned long long crash_size, crash_base, total_mem, low_size; + int flag; total_mem = memblock_phys_mem_size(); - /* crashkernel=XM */ - ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base); - if (ret != 0 || crash_size <= 0) { - /* crashkernel=X,high */ - ret = parse_crashkernel_high(boot_command_line, total_mem, -&crash_size, &crash_base); - if (ret != 0 || crash_size <= 0) - return; - high = true; - } + flag = parse_crashkernel_in_order(boot_command_line, total_mem, + &crash_size, &crash_base, &low_size); + if (flag == CRASHKERNEL_MEM_NONE) + return; /* 0 means: find the address automatically */ if (!crash_base) { @@ -519,7 +495,7 @@ static void __init reserve_crashkernel(void) * So try low memory first and fall back to high memory * unless "crashkernel=size[KMG],high" is specified. */ - if (!high) + if (!(flag & CRASHKERNEL_MEM_HIGH)) crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_LOW_MAX); @@ -543,9 +519,24 @@ static void __init reserve_crashkernel(void) } #ifdef CONFIG_X86_64 - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { - memblock_phys_free(crash_base, crash_size); - return; + if (crash_base >= (1ULL << 32)) { + if (!(flag & CRASHKERNEL_MEM_LOW)) { + /* +* two parts from kernel/dma/swiotlb.c: +* -s
[PATCH v18 04/17] x86/setup: Add helper parse_crashkernel_in_order()
Currently, there are two possible combinations of configurations. (1) crashkernel=X[@offset] (2) crashkernel=X,high, with or without crashkernel=X,low (1) has the highest priority, if it is configured correctly, (2) will be ignored. Similarly, in combination (2), crashkernel=X,low is valid only when crashkernel=X,high is valid. Putting the operations of parsing all "crashkernel=" configurations in one function helps to sort out the strong dependency. So add helper parse_crashkernel_in_order(). The "__maybe_unused" will be removed in the next patch. Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 51 + 1 file changed, 51 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index d9080bfa131a654..f997074d36f2484 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -439,6 +439,57 @@ static int __init reserve_crashkernel_low(void) } #endif +#define CRASHKERNEL_MEM_NONE 0x0 /* crashkernel= is not exist or invalid */ +#define CRASHKERNEL_MEM_CLASSIC0x1 /* crashkernel=X[@offset] is valid */ +#define CRASHKERNEL_MEM_HIGH 0x2 /* crashkernel=X,high is valid */ +#define CRASHKERNEL_MEM_LOW0x4 /* crashkernel=X,low is valid */ + +/** + * parse_crashkernel_in_order - Parse all "crashkernel=" configurations in + * priority order until a valid combination is found. + * @cmdline: The bootup command line. + * @system_ram: Total system memory size. + * @crash_size: Save the memory size specified by "crashkernel=X[@offset]" or + * "crashkernel=X,high". + * @crash_base: Save the base address specified by "crashkernel=X@offset" + * @low_size: Save the memory size specified by "crashkernel=X,low" + * + * Returns the status flag of the parsing result of "crashkernel=", such as + * CRASHKERNEL_MEM_NONE, CRASHKERNEL_MEM_HIGH. + */ +__maybe_unused +static int __init parse_crashkernel_in_order(char *cmdline, +unsigned long long system_ram, +unsigned long long *crash_size, +unsigned long long *crash_base, +unsigned long long *low_size) +{ + int ret, flag = CRASHKERNEL_MEM_NONE; + + BUG_ON(!crash_size || !crash_base || !low_size); + + /* crashkernel=X[@offset] */ + ret = parse_crashkernel(cmdline, system_ram, crash_size, crash_base); + if (!ret && crash_size > 0) + return CRASHKERNEL_MEM_CLASSIC; + +#ifdef CONFIG_X86_64 + /* crashkernel=X,high */ + ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base); + if (ret || crash_size <= 0) + return CRASHKERNEL_MEM_NONE; + + flag = CRASHKERNEL_MEM_HIGH; + + /* crashkernel=Y,low */ + ret = parse_crashkernel_low(cmdline, system_ram, low_size, crash_base); + if (!ret) + flag |= CRASHKERNEL_MEM_LOW; +#endif + + return flag; +} + static void __init reserve_crashkernel(void) { unsigned long long crash_size, crash_base, total_mem; -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 02/17] x86/setup: Move xen_pv_domain() check and insert_resource() to setup_arch()
From: Chen Zhou We will make the functions reserve_crashkernel() as generic, the xen_pv_domain() check in reserve_crashkernel() is relevant only to x86, the same as insert_resource() in reserve_crashkernel[_low](). So move xen_pv_domain() check and insert_resource() to setup_arch() to keep them in x86. Suggested-by: Mike Rapoport Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 23 +++ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index ae8f63661363e25..acf2f2eedfe3415 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -434,7 +434,6 @@ static int __init reserve_crashkernel_low(void) crashk_low_res.start = low_base; crashk_low_res.end = low_base + low_size - 1; - insert_resource(&iomem_resource, &crashk_low_res); #endif return 0; } @@ -458,11 +457,6 @@ static void __init reserve_crashkernel(void) high = true; } - if (xen_pv_domain()) { - pr_info("Ignoring crashkernel for a Xen PV domain\n"); - return; - } - /* 0 means: find the address automatically */ if (!crash_base) { /* @@ -508,11 +502,6 @@ static void __init reserve_crashkernel(void) crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; - insert_resource(&iomem_resource, &crashk_res); -} -#else -static void __init reserve_crashkernel(void) -{ } #endif @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p) * Reserve memory for crash kernel after SRAT is parsed so that it * won't consume hotpluggable memory. */ - reserve_crashkernel(); +#ifdef CONFIG_KEXEC_CORE + if (xen_pv_domain()) + pr_info("Ignoring crashkernel for a Xen PV domain\n"); + else { + reserve_crashkernel(); + if (crashk_res.end > crashk_res.start) + insert_resource(&iomem_resource, &crashk_res); + if (crashk_low_res.end > crashk_low_res.start) + insert_resource(&iomem_resource, &crashk_low_res); + } +#endif memblock_find_dma_reserve(); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 00/17] support reserving crashkernel above 4G on arm64 kdump
ave's comments. - Update chosen schema. Changes since [v8] - Reuse DT property "linux,usable-memory-range". Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low memory region. - Fix kdump broken with ZONE_DMA reintroduced. - Update chosen schema. Changes since [v7] - Move x86 CRASH_ALIGN to 2M Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M. - Update Documentation/devicetree/bindings/chosen.txt. Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd. - Add Tested-by from Jhon and pk. Changes since [v6] - Fix build errors reported by kbuild test robot. Changes since [v5] - Move reserve_crashkernel_low() into kernel/crash_core.c. - Delete crashkernel=X,high. - Modify crashkernel=X,low. If crashkernel=X,low is specified simultaneously, reserve spcified size low memory for crash kdump kernel devices firstly and then reserve memory above 4G. In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then pass to crash dump kernel by DT property "linux,low-memory-range". - Update Documentation/admin-guide/kdump/kdump.rst. Changes since [v4] - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike. Changes since [v3] - Add memblock_cap_memory_ranges back for multiple ranges. - Fix some compiling warnings. Changes since [v2] - Split patch "arm64: kdump: support reserving crashkernel above 4G" as two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate patch. Changes since [v1]: - Move common reserve_crashkernel_low() code into kernel/kexec_core.c. - Remove memblock_cap_memory_ranges() i added in v1 and implement that in fdt_enforce_memory_region(). There are at most two crash kernel regions, for two crash kernel regions case, we cap the memory range [min(regs[*].start), max(regs[*].end)] and then remove the memory range in the middle. [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html [2]: https://github.com/robherring/dt-schema/pull/19 [v1]: https://lkml.org/lkml/2019/4/2/1174 [v2]: https://lkml.org/lkml/2019/4/9/86 [v3]: https://lkml.org/lkml/2019/4/9/306 [v4]: https://lkml.org/lkml/2019/4/15/273 [v5]: https://lkml.org/lkml/2019/5/6/1360 [v6]: https://lkml.org/lkml/2019/8/30/142 [v7]: https://lkml.org/lkml/2019/12/23/411 [v8]: https://lkml.org/lkml/2020/5/21/213 [v9]: https://lkml.org/lkml/2020/6/28/73 [v10]: https://lkml.org/lkml/2020/7/2/1443 [v11]: https://lkml.org/lkml/2020/8/1/150 [v12]: https://lkml.org/lkml/2020/9/7/1037 [v13]: https://lkml.org/lkml/2020/10/31/34 [v14]: https://lkml.org/lkml/2021/1/30/53 [v15]: https://lkml.org/lkml/2021/10/19/1405 [v16]: https://lkml.org/lkml/2021/11/23/435 [v17}: https://lkml.org/lkml/2021/12/10/38 Chen Zhou (9): x86/setup: Move CRASH_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to asm/kexec.h x86/setup: Move xen_pv_domain() check and insert_resource() to setup_arch() x86/setup: Eliminate a magic number in reserve_crashkernel() x86/setup: Add build option ARCH_WANT_RESERVE_CRASH_KERNEL x86/setup: Move reserve_crashkernel[_low]() into crash_core.c arm64: kdump: introduce some macros for crash kernel reservation arm64: kdump: reimplement crashkernel=X of: fdt: Add memory for devices by DT property "linux,usable-memory-range" kdump: update Documentation about crashkernel Zhen Lei (8): x86/setup: Adjust the range of codes separated by CONFIG_X86_64 x86/setup: Add helper parse_crashkernel_in_order() x86/setup: Use parse_crashkernel_in_order() to make code logic clear x86/setup: Update comments in reserve_crashkernel() kdump: Simplify the parameters of __parse_crashkernel() kdump: Make parse_crashkernel_{high|low} static kdump: Reduce unused parameters of parse_crashkernel_{high|low} of: fdt: Aggregate the processing of "linux,usable-memory-range" Documentation/admin-guide/kdump/kdump.rst | 11 +- .../admin-guide/kernel-parameters.txt | 13 +- arch/Kconfig | 3 + arch/arm64/Kconfig| 1 + arch/arm64/include/asm/kexec.h| 10 + arch/arm64/kernel/machine_kexec_file.c| 12 +- arch/arm64/kernel/setup.c | 13 +- arch/arm64/mm/init.c | 59 + arch/x86/Kconfig | 2 + arch/x86/include/asm/kexec.h | 28 +++ arch/x86/kernel/setup.c | 166 +- drivers/of/fdt.c | 42 +++- include/linux/crash_core.h| 4 - kernel/crash_core.c | 207 -- 14 files changed, 328 insertions(+), 243 deletions(-) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 03/17] x86/setup: Adjust the range of codes separated by CONFIG_X86_64
Currently, only X86_64 requires that at least 256M low memory be reserved. X86_32 does not have this requirement. So move all the code related to reserve_crashkernel_low() into macro CONFIG_X86_64. Signed-off-by: Zhen Lei --- arch/x86/kernel/setup.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index acf2f2eedfe3415..d9080bfa131a654 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -392,9 +392,9 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE +#ifdef CONFIG_X86_64 static int __init reserve_crashkernel_low(void) { -#ifdef CONFIG_X86_64 unsigned long long base, low_base = 0, low_size = 0; unsigned long low_mem_limit; int ret; @@ -434,9 +434,10 @@ static int __init reserve_crashkernel_low(void) crashk_low_res.start = low_base; crashk_low_res.end = low_base + low_size - 1; -#endif + return 0; } +#endif static void __init reserve_crashkernel(void) { @@ -490,10 +491,12 @@ static void __init reserve_crashkernel(void) } } +#ifdef CONFIG_X86_64 if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { memblock_phys_free(crash_base, crash_size); return; } +#endif pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n", (unsigned long)(crash_size >> 20), -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v18 01/17] x86/setup: Move CRASH_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to asm/kexec.h
From: Chen Zhou We want to make function reserve_crashkernel[_low](), which is implemented by X86, available to other architectures. It references macro CRASH_ALIGN and will be moved to public crash_core.c. But the defined values of CRASH_ALIGN may be different in different architectures. So moving the definition of CRASH_ALIGN to asm/kexec.h is a good choice. The reason for moving CRASH_ADDR_{LOW|HIGH}_MAX is the same as above. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei --- arch/x86/include/asm/kexec.h | 24 arch/x86/kernel/setup.c | 24 2 files changed, 24 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 11b7c06e2828c30..a0223a6c0238a15 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -18,6 +18,30 @@ # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 +/* 16M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_16M + +/* + * Keep the crash kernel below this limit. + * + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range + * due to mapping restrictions. + * + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is + * the upper limit of system RAM in 4-level paging mode. Since the kdump + * jump could be from 5-level paging to 4-level paging, the jump will fail if + * the kernel is put above 64 TB, and during the 1st kernel bootup there's + * no good way to detect the paging mode of the target kernel which will be + * loaded for dumping. + */ +#ifdef CONFIG_X86_32 +# define CRASH_ADDR_LOW_MAXSZ_512M +# define CRASH_ADDR_HIGH_MAX SZ_512M +#else +# define CRASH_ADDR_LOW_MAXSZ_4G +# define CRASH_ADDR_HIGH_MAX SZ_64T +#endif + #ifndef __ASSEMBLY__ #include diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6a190c7f4d71b05..ae8f63661363e25 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -392,30 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE -/* 16M alignment for crash kernel regions */ -#define CRASH_ALIGNSZ_16M - -/* - * Keep the crash kernel below this limit. - * - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range - * due to mapping restrictions. - * - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is - * the upper limit of system RAM in 4-level paging mode. Since the kdump - * jump could be from 5-level paging to 4-level paging, the jump will fail if - * the kernel is put above 64 TB, and during the 1st kernel bootup there's - * no good way to detect the paging mode of the target kernel which will be - * loaded for dumping. - */ -#ifdef CONFIG_X86_32 -# define CRASH_ADDR_LOW_MAXSZ_512M -# define CRASH_ADDR_HIGH_MAX SZ_512M -#else -# define CRASH_ADDR_LOW_MAXSZ_4G -# define CRASH_ADDR_HIGH_MAX SZ_64T -#endif - static int __init reserve_crashkernel_low(void) { #ifdef CONFIG_X86_64 -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v17 09/10] of: fdt: Add memory for devices by DT property "linux, usable-memory-range"
From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: Dave Kleikamp --- drivers/of/fdt.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 37b477a51175359..f7b72fa773250ad 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -967,6 +967,15 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range * location from flat tree @@ -974,10 +983,9 @@ static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; */ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) { - const __be32 *prop; - int len; - phys_addr_t cap_mem_addr; - phys_addr_t cap_mem_size; + struct memblock_region rgn[MAX_USABLE_RANGES] = {0}; + const __be32 *prop, *endp; + int len, i; if ((long)node < 0) return; @@ -985,16 +993,21 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells))) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) { + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop); - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +i, &rgn[i].base, &rgn[i].size); + } - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn[0].base, rgn[0].size); + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++) + memblock_add(rgn[i].base, rgn[i].size); } #ifdef CONFIG_SERIAL_EARLYCON -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v17 10/10] kdump: update Documentation about crashkernel
From: Chen Zhou For arm64, the behavior of crashkernel=X has been changed, which tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. So update the Documentation. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei --- Documentation/admin-guide/kdump/kdump.rst | 11 +-- Documentation/admin-guide/kernel-parameters.txt | 11 +-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index cb30ca3df27c9b2..d4c287044be0c70 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -361,8 +361,15 @@ Boot into System Kernel kernel will automatically locate the crash kernel image within the first 512MB of RAM if X is not given. - On arm64, use "crashkernel=Y[@X]". Note that the start address of - the kernel, X if explicitly specified, must be aligned to 2MiB (0x20). + On arm64, use "crashkernel=X" to try low allocation in DMA zone and + fall back to high allocation if it fails. + We can also use "crashkernel=X,high" to select a high region above + DMA zone, which also tries to allocate at least 256M low memory in + DMA zone automatically. + "crashkernel=Y,low" can be used to allocate specified size low memory. + Use "crashkernel=Y@X" if you really have to reserve memory from + specified start address X. Note that the start address of the kernel, + X if explicitly specified, must be aligned to 2MiB (0x20). Load the Dump-capture Kernel diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9725c546a0d46db..91f3a8dc537d404 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -783,6 +783,9 @@ [KNL, X86-64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. + [KNL, ARM64] Try low allocation in DMA zone and fall back + to high allocation if it fails when '@offset' hasn't been + specified. See Documentation/admin-guide/kdump/kdump.rst for further details. crashkernel=range1:size1[,range2:size2,...][@offset] @@ -799,6 +802,8 @@ Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. + [KNL, ARM64] range in high memory. + Allow kernel to allocate physical memory region from top. crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region @@ -807,13 +812,15 @@ requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate at - at least 256M below 4G automatically. + least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. - + [KNL, ARM64] range in low memory. + This one let user to specify a low range in DMA zone for + crash dump kernel. cryptomgr.notests [KNL] Disable crypto self-tests -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v17 08/10] of: fdt: Aggregate the processing of "linux, usable-memory-range"
Currently, we parse the "linux,usable-memory-range" property in early_init_dt_scan_chosen(), to obtain the specified memory range of the crash kernel. We then reserve the required memory after early_init_dt_scan_memory() has identified all available physical memory. Because the two pieces of code are separated far, the readability and maintainability are reduced. So bring them together. Suggested-by: Rob Herring Signed-off-by: Zhen Lei Tested-by: Dave Kleikamp --- drivers/of/fdt.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index bdca35284cebd56..37b477a51175359 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -965,8 +965,7 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) elfcorehdr_addr, elfcorehdr_size); } -static phys_addr_t cap_mem_addr; -static phys_addr_t cap_mem_size; +static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range @@ -977,6 +976,11 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) { const __be32 *prop; int len; + phys_addr_t cap_mem_addr; + phys_addr_t cap_mem_size; + + if ((long)node < 0) + return; pr_debug("Looking for usable-memory-range property... "); @@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, &cap_mem_size); + + memblock_cap_memory_range(cap_mem_addr, cap_mem_size); } #ifdef CONFIG_SERIAL_EARLYCON @@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0)) return 0; + chosen_node_offset = node; + early_init_dt_check_for_initrd(node); early_init_dt_check_for_elfcorehdr(node); - early_init_dt_check_for_usable_mem_range(node); /* Retrieve command line */ p = of_get_flat_dt_prop(node, "bootargs", &l); @@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void) of_scan_flat_dt(early_init_dt_scan_memory, NULL); /* Handle linux,usable-memory-range property */ - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + early_init_dt_check_for_usable_mem_range(chosen_node_offset); } bool __init early_init_dt_scan(void *params) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
From: Chen Zhou Make the functions reserve_crashkernel[_low]() as generic. Since reserve_crashkernel[_low]() implementations are quite similar on other architectures as well, we can have more users of this later. So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and select this by X86. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp --- arch/Kconfig | 3 + arch/x86/Kconfig | 2 + arch/x86/include/asm/elf.h | 3 + arch/x86/include/asm/kexec.h | 28 ++- arch/x86/kernel/setup.c | 143 +--- include/linux/crash_core.h | 3 + include/linux/kexec.h| 2 - kernel/crash_core.c | 156 +++ kernel/kexec_core.c | 17 9 files changed, 194 insertions(+), 163 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index d3c4ab249e9c275..7bdb32c41985dc5 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -24,6 +24,9 @@ config KEXEC_ELF config HAVE_IMA_KEXEC bool +config ARCH_WANT_RESERVE_CRASH_KERNEL + bool + config SET_FS bool diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5c2ccb85f2efb86..bd78ed8193079b9 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -12,6 +12,7 @@ config X86_32 depends on !64BIT # Options that are inherently 32-bit kernel only: select ARCH_WANT_IPC_PARSE_VERSION + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select CLKSRC_I8253 select CLONE_BACKWARDS select GENERIC_VDSO_32 @@ -28,6 +29,7 @@ config X86_64 select ARCH_HAS_GIGANTIC_PAGE select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 29fea180a6658e8..7a6c36cff8331f5 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled; #define elf_check_arch(x) elf_check_arch_ia32(x) +/* We can also handle crash dumps from 64 bit kernel. */ +# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) + /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx contains a pointer to a function which might be registered using `atexit'. This provides a mean for the dynamic linker to call DT_FINI functions for diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 3a22e65262aa70b..3ff38a1353a2b86 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -21,6 +21,27 @@ /* 16M alignment for crash kernel regions */ #define CRASH_ALIGNSZ_16M +/* + * Keep the crash kernel below this limit. + * + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range + * due to mapping restrictions. + * + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is + * the upper limit of system RAM in 4-level paging mode. Since the kdump + * jump could be from 5-level paging to 4-level paging, the jump will fail if + * the kernel is put above 64 TB, and during the 1st kernel bootup there's + * no good way to detect the paging mode of the target kernel which will be + * loaded for dumping. + */ +#ifdef CONFIG_X86_32 +# define CRASH_ADDR_LOW_MAXSZ_512M +# define CRASH_ADDR_HIGH_MAX SZ_512M +#else +# define CRASH_ADDR_LOW_MAXSZ_4G +# define CRASH_ADDR_HIGH_MAX SZ_64T +#endif + #ifndef __ASSEMBLY__ #include @@ -51,9 +72,6 @@ struct kimage; /* The native architecture */ # define KEXEC_ARCH KEXEC_ARCH_386 - -/* We can also handle crash dumps from 64 bit kernel. */ -# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) #else /* Maximum physical address we can use pages from */ # define KEXEC_SOURCE_MEMORY_LIMIT (MAXMEM-1) @@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void); extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss; extern void kdump_nmi_shootdown_cpus(void); +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_KEXEC_H */ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 7ae00716a208f82..5519baa7f4b964e 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void) } } -/* - * - Crashkernel reservation -- - */ - -#ifdef CONFIG_KEXEC_CORE - -/* - * Keep the crash kernel below this limit. - * - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range - * due to mapping restrict
[PATCH v17 06/10] arm64: kdump: introduce some macros for crash kernel reservation
From: Chen Zhou Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for upper bound of high crash memory, use macros instead. Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound of crash kernel reservation. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp --- arch/arm64/include/asm/kexec.h | 6 ++ arch/arm64/mm/init.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 9839bfc163d7147..1b9edc69f0244ca 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -25,6 +25,12 @@ #define KEXEC_ARCH KEXEC_ARCH_AARCH64 +/* 2M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE + #ifndef __ASSEMBLY__ /** diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index a8834434af99ae0..be4595dc7459115 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void) crash_max = crash_base + crash_size; /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M, + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v17 07/10] arm64: kdump: reimplement crashkernel=X
From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Another minor change, there may be two regions reserved for crash dump kernel, in order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)". Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/kexec.h | 4 ++ arch/arm64/kernel/machine_kexec_file.c | 12 +- arch/arm64/kernel/setup.c | 13 +- arch/arm64/mm/init.c | 59 +- 5 files changed, 38 insertions(+), 51 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4207cf9bb17ffb..4b99efa36da3793 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,7 @@ config ARM64 select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36) select ARCH_WANT_LD_ORPHAN_WARN + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select ARCH_WANTS_NO_INSTR select ARCH_HAS_UBSAN_SANITIZE_ALL select ARM_AMBA diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 1b9edc69f0244ca..3bde0079925d771 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {} static inline void crash_post_resume(void) {} #endif +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #if defined(CONFIG_KEXEC_CORE) void cpu_soft_restart(unsigned long el2_switch, unsigned long entry, unsigned long arg0, unsigned long arg1, diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 63634b4d72c158f..6f3fa059ca4e816 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index be5f85b0a24de69..4bb2e55366be64d 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -248,7 +248,18 @@ static void __init request_standard_resources(void) kernel_data.end <= res->end) request_resource(res, &kernel_data); #ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ + /* +* Userspace will find "Crash kernel" or "Crash kernel (low)" +* region in /proc/iomem. +* In order to distinct from the high region and make no effect +* to the use of existing kexec-tools, rename the low region as +* "Crash kernel (low)". +*/ + if (crashk_low_res.end && crashk_low_res.start >= res->start && + crashk_low_res.end <= res->end) { + crashk_low_res.name = "Crash kernel (low)"; + request_resource(res, &crashk_low_res); + } if (crashk_res.end && crashk_res.start >= res->start && crashk_res.end <= res->end) request_resource(res, &crashk_res); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index be4595dc7459115..85c83e4eff2b6c4 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c
[PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
From: Chen Zhou Move CRASH_ALIGN to header asm/kexec.h for later use. Suggested-by: Dave Young Suggested-by: Baoquan He Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp --- arch/x86/include/asm/kexec.h | 3 +++ arch/x86/kernel/setup.c | 3 --- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 11b7c06e2828c30..3a22e65262aa70b 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -18,6 +18,9 @@ # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 +/* 16M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_16M + #ifndef __ASSEMBLY__ #include diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6a190c7f4d71b05..5cc60996eac56d6 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE -/* 16M alignment for crash kernel regions */ -#define CRASH_ALIGNSZ_16M - /* * Keep the crash kernel below this limit. * -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()
From: Chen Zhou To make the functions reserve_crashkernel() as generic, replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp Acked-by: Baoquan He --- arch/x86/kernel/setup.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6424ee4f23da2cf..bb2a0973b98059e 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void) if (!crash_base) { /* * Set CRASH_ADDR_LOW_MAX upper bound for crash memory, -* crashkernel=x,high reserves memory over 4G, also allocates -* 256M extra low memory for DMA buffers and swiotlb. +* crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX, +* also allocates 256M extra low memory for DMA buffers +* and swiotlb. * But the extra memory is not required for all machines. * So try low memory first and fall back to high memory * unless "crashkernel=size[KMG],high" is specified. @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) } } - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { memblock_phys_free(crash_base, crash_size); return; } -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump
ve_crashkernel_low() into kernel/crash_core.c. - Delete crashkernel=X,high. - Modify crashkernel=X,low. If crashkernel=X,low is specified simultaneously, reserve spcified size low memory for crash kdump kernel devices firstly and then reserve memory above 4G. In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then pass to crash dump kernel by DT property "linux,low-memory-range". - Update Documentation/admin-guide/kdump/kdump.rst. Changes since [v4] - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike. Changes since [v3] - Add memblock_cap_memory_ranges back for multiple ranges. - Fix some compiling warnings. Changes since [v2] - Split patch "arm64: kdump: support reserving crashkernel above 4G" as two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate patch. Changes since [v1]: - Move common reserve_crashkernel_low() code into kernel/kexec_core.c. - Remove memblock_cap_memory_ranges() i added in v1 and implement that in fdt_enforce_memory_region(). There are at most two crash kernel regions, for two crash kernel regions case, we cap the memory range [min(regs[*].start), max(regs[*].end)] and then remove the memory range in the middle. [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html [2]: https://github.com/robherring/dt-schema/pull/19 [v1]: https://lkml.org/lkml/2019/4/2/1174 [v2]: https://lkml.org/lkml/2019/4/9/86 [v3]: https://lkml.org/lkml/2019/4/9/306 [v4]: https://lkml.org/lkml/2019/4/15/273 [v5]: https://lkml.org/lkml/2019/5/6/1360 [v6]: https://lkml.org/lkml/2019/8/30/142 [v7]: https://lkml.org/lkml/2019/12/23/411 [v8]: https://lkml.org/lkml/2020/5/21/213 [v9]: https://lkml.org/lkml/2020/6/28/73 [v10]: https://lkml.org/lkml/2020/7/2/1443 [v11]: https://lkml.org/lkml/2020/8/1/150 [v12]: https://lkml.org/lkml/2020/9/7/1037 [v13]: https://lkml.org/lkml/2020/10/31/34 [v14]: https://lkml.org/lkml/2021/1/30/53 [v15]: https://lkml.org/lkml/2021/10/19/1405 [v16]: https://lkml.org/lkml/2021/11/23/435 Chen Zhou (9): x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN x86: kdump: make the lower bound of crash kernel reservation consistent x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel() x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch() x86: kdump: move reserve_crashkernel[_low]() into crash_core.c arm64: kdump: introduce some macros for crash kernel reservation arm64: kdump: reimplement crashkernel=X of: fdt: Add memory for devices by DT property "linux,usable-memory-range" kdump: update Documentation about crashkernel Zhen Lei (1): of: fdt: Aggregate the processing of "linux,usable-memory-range" Documentation/admin-guide/kdump/kdump.rst | 11 +- .../admin-guide/kernel-parameters.txt | 11 +- arch/Kconfig | 3 + arch/arm64/Kconfig| 1 + arch/arm64/include/asm/kexec.h| 10 ++ arch/arm64/kernel/machine_kexec_file.c| 12 +- arch/arm64/kernel/setup.c | 13 +- arch/arm64/mm/init.c | 59 ++- arch/x86/Kconfig | 2 + arch/x86/include/asm/elf.h| 3 + arch/x86/include/asm/kexec.h | 31 +++- arch/x86/kernel/setup.c | 163 ++ drivers/of/fdt.c | 42 +++-- include/linux/crash_core.h| 3 + include/linux/kexec.h | 2 - kernel/crash_core.c | 156 + kernel/kexec_core.c | 17 -- 17 files changed, 301 insertions(+), 238 deletions(-) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent
From: Chen Zhou The lower bounds of crash kernel reservation and crash kernel low reservation are different, use the consistent value CRASH_ALIGN. Suggested-by: Dave Young Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp --- arch/x86/kernel/setup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 5cc60996eac56d6..6424ee4f23da2cf 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void) return 0; } - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN, + CRASH_ADDR_LOW_MAX); if (!low_base) { pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n", (unsigned long)(low_size >> 20)); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()
From: Chen Zhou We will make the functions reserve_crashkernel() as generic, the xen_pv_domain() check in reserve_crashkernel() is relevant only to x86, the same as insert_resource() in reserve_crashkernel[_low](). So move xen_pv_domain() check and insert_resource() to setup_arch() to keep them in x86. Suggested-by: Mike Rapoport Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp Acked-by: Baoquan He --- arch/x86/kernel/setup.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index bb2a0973b98059e..7ae00716a208f82 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void) crashk_low_res.start = low_base; crashk_low_res.end = low_base + low_size - 1; - insert_resource(&iomem_resource, &crashk_low_res); #endif return 0; } @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void) high = true; } - if (xen_pv_domain()) { - pr_info("Ignoring crashkernel for a Xen PV domain\n"); - return; - } - /* 0 means: find the address automatically */ if (!crash_base) { /* @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void) crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; - insert_resource(&iomem_resource, &crashk_res); } #else static void __init reserve_crashkernel(void) @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p) * Reserve memory for crash kernel after SRAT is parsed so that it * won't consume hotpluggable memory. */ - reserve_crashkernel(); + if (xen_pv_domain()) + pr_info("Ignoring crashkernel for a Xen PV domain\n"); + else { + reserve_crashkernel(); +#ifdef CONFIG_KEXEC_CORE + if (crashk_res.end > crashk_res.start) + insert_resource(&iomem_resource, &crashk_res); + if (crashk_low_res.end > crashk_low_res.start) + insert_resource(&iomem_resource, &crashk_low_res); +#endif + } memblock_find_dma_reserve(); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 05/11] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
From: Chen Zhou Make the functions reserve_crashkernel[_low]() as generic. Arm64 will use these to reimplement crashkernel=X. Signed-off-by: Chen Zhou Tested-by: John Donnelly --- arch/x86/include/asm/elf.h | 3 + arch/x86/include/asm/kexec.h | 28 +- arch/x86/kernel/setup.c | 143 +-- include/linux/crash_core.h | 3 + include/linux/kexec.h| 2 - kernel/crash_core.c | 159 +++ kernel/kexec_core.c | 17 7 files changed, 192 insertions(+), 163 deletions(-) diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 29fea180a6658e8..7a6c36cff8331f5 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled; #define elf_check_arch(x) elf_check_arch_ia32(x) +/* We can also handle crash dumps from 64 bit kernel. */ +# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) + /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx contains a pointer to a function which might be registered using `atexit'. This provides a mean for the dynamic linker to call DT_FINI functions for diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 3a22e65262aa70b..3ff38a1353a2b86 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -21,6 +21,27 @@ /* 16M alignment for crash kernel regions */ #define CRASH_ALIGNSZ_16M +/* + * Keep the crash kernel below this limit. + * + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range + * due to mapping restrictions. + * + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is + * the upper limit of system RAM in 4-level paging mode. Since the kdump + * jump could be from 5-level paging to 4-level paging, the jump will fail if + * the kernel is put above 64 TB, and during the 1st kernel bootup there's + * no good way to detect the paging mode of the target kernel which will be + * loaded for dumping. + */ +#ifdef CONFIG_X86_32 +# define CRASH_ADDR_LOW_MAXSZ_512M +# define CRASH_ADDR_HIGH_MAX SZ_512M +#else +# define CRASH_ADDR_LOW_MAXSZ_4G +# define CRASH_ADDR_HIGH_MAX SZ_64T +#endif + #ifndef __ASSEMBLY__ #include @@ -51,9 +72,6 @@ struct kimage; /* The native architecture */ # define KEXEC_ARCH KEXEC_ARCH_386 - -/* We can also handle crash dumps from 64 bit kernel. */ -# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) #else /* Maximum physical address we can use pages from */ # define KEXEC_SOURCE_MEMORY_LIMIT (MAXMEM-1) @@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void); extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss; extern void kdump_nmi_shootdown_cpus(void); +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_KEXEC_H */ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 63ed089f9778fc3..4b5c75eb88b9969 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void) } } -/* - * - Crashkernel reservation -- - */ - -#ifdef CONFIG_KEXEC_CORE - -/* - * Keep the crash kernel below this limit. - * - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range - * due to mapping restrictions. - * - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is - * the upper limit of system RAM in 4-level paging mode. Since the kdump - * jump could be from 5-level paging to 4-level paging, the jump will fail if - * the kernel is put above 64 TB, and during the 1st kernel bootup there's - * no good way to detect the paging mode of the target kernel which will be - * loaded for dumping. - */ -#ifdef CONFIG_X86_32 -# define CRASH_ADDR_LOW_MAXSZ_512M -# define CRASH_ADDR_HIGH_MAX SZ_512M -#else -# define CRASH_ADDR_LOW_MAXSZ_4G -# define CRASH_ADDR_HIGH_MAX SZ_64T -#endif - -static int __init reserve_crashkernel_low(void) -{ -#ifdef CONFIG_X86_64 - unsigned long long base, low_base = 0, low_size = 0; - unsigned long low_mem_limit; - int ret; - - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); - - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base); - if (ret) { - /* -* two parts from kernel/dma/swiotlb.c: -* -swiotlb size: user-specified with swiotlb= or default. -* -* -swiotlb overflow buffer: now hardcoded to 32k. We round it -* to 8M for other buffers that may need to stay low too. Also -* make sure w
[PATCH v16 06/11] arm64: kdump: introduce some macros for crash kernel reservation
From: Chen Zhou Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for upper bound of high crash memory, use macros instead. Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound of crash kernel reservation. Signed-off-by: Chen Zhou Tested-by: John Donnelly --- arch/arm64/include/asm/kexec.h | 6 ++ arch/arm64/mm/init.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 9839bfc163d7147..1b9edc69f0244ca 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -25,6 +25,12 @@ #define KEXEC_ARCH KEXEC_ARCH_AARCH64 +/* 2M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE + #ifndef __ASSEMBLY__ /** diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index a8834434af99ae0..be4595dc7459115 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void) crash_max = crash_base + crash_size; /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M, + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 09/11] of: fdt: Aggregate the processing of "linux, usable-memory-range"
Currently, we parse the "linux,usable-memory-range" property in early_init_dt_scan_chosen(), to obtain the specified memory range of the crash kernel. We then reserve the required memory after early_init_dt_scan_memory() has identified all available physical memory. Because the two pieces of code are separated far, the readability and maintainability are reduced. So bring them together. Suggested-by: Rob Herring Signed-off-by: Zhen Lei --- drivers/of/fdt.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index bdca35284cebd56..37b477a51175359 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -965,8 +965,7 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) elfcorehdr_addr, elfcorehdr_size); } -static phys_addr_t cap_mem_addr; -static phys_addr_t cap_mem_size; +static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range @@ -977,6 +976,11 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) { const __be32 *prop; int len; + phys_addr_t cap_mem_addr; + phys_addr_t cap_mem_size; + + if ((long)node < 0) + return; pr_debug("Looking for usable-memory-range property... "); @@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, &cap_mem_size); + + memblock_cap_memory_range(cap_mem_addr, cap_mem_size); } #ifdef CONFIG_SERIAL_EARLYCON @@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0)) return 0; + chosen_node_offset = node; + early_init_dt_check_for_initrd(node); early_init_dt_check_for_elfcorehdr(node); - early_init_dt_check_for_usable_mem_range(node); /* Retrieve command line */ p = of_get_flat_dt_prop(node, "bootargs", &l); @@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void) of_scan_flat_dt(early_init_dt_scan_memory, NULL); /* Handle linux,usable-memory-range property */ - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + early_init_dt_check_for_usable_mem_range(chosen_node_offset); } bool __init early_init_dt_scan(void *params) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 10/11] of: fdt: Add memory for devices by DT property "linux, usable-memory-range"
From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei --- drivers/of/fdt.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 37b477a51175359..1ea2a0b1657e3a9 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -967,6 +967,15 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range * location from flat tree @@ -974,10 +983,9 @@ static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; */ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) { - const __be32 *prop; - int len; - phys_addr_t cap_mem_addr; - phys_addr_t cap_mem_size; + struct memblock_region rgn[MAX_USABLE_RANGES] = {0}; + const __be32 *prop, *endp; + int len, i = 0; if ((long)node < 0) return; @@ -985,16 +993,24 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + while ((endp - prop) >= (dt_root_addr_cells + dt_root_size_cells)) { + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop); + + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +i, &rgn[i].base, &rgn[i].size); - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); + if (++i >= MAX_USABLE_RANGES) + break; + } - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn[0].base, rgn[0].size); + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++) + memblock_add(rgn[i].base, rgn[i].size); } #ifdef CONFIG_SERIAL_EARLYCON -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 07/11] arm64: kdump: reimplement crashkernel=X
From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Another minor change, there may be two regions reserved for crash dump kernel, in order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)". Signed-off-by: Chen Zhou Tested-by: John Donnelly --- arch/arm64/include/asm/kexec.h | 4 ++ arch/arm64/kernel/machine_kexec_file.c | 12 +- arch/arm64/kernel/setup.c | 13 +- arch/arm64/mm/init.c | 59 +- kernel/crash_core.c| 6 +-- 5 files changed, 40 insertions(+), 54 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 1b9edc69f0244ca..3bde0079925d771 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {} static inline void crash_post_resume(void) {} #endif +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #if defined(CONFIG_KEXEC_CORE) void cpu_soft_restart(unsigned long el2_switch, unsigned long entry, unsigned long arg0, unsigned long arg1, diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 63634b4d72c158f..6f3fa059ca4e816 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index be5f85b0a24de69..4bb2e55366be64d 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -248,7 +248,18 @@ static void __init request_standard_resources(void) kernel_data.end <= res->end) request_resource(res, &kernel_data); #ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ + /* +* Userspace will find "Crash kernel" or "Crash kernel (low)" +* region in /proc/iomem. +* In order to distinct from the high region and make no effect +* to the use of existing kexec-tools, rename the low region as +* "Crash kernel (low)". +*/ + if (crashk_low_res.end && crashk_low_res.start >= res->start && + crashk_low_res.end <= res->end) { + crashk_low_res.name = "Crash kernel (low)"; + request_resource(res, &crashk_low_res); + } if (crashk_res.end && crashk_res.start >= res->start && crashk_res.end <= res->end) request_resource(res, &crashk_res); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index be4595dc7459115..85c83e4eff2b6c4 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -64,57 +65,11 @@ EXPORT_SYMBOL(memstart_addr); */ phys_addr_t arm64_dma_phys_limit __ro_after_init; -#ifdef CONFIG_KEXEC_CORE -/* - * reserve_crashkernel() - reserves memory for crash kernel - * - * This function reserves memory area given in "crashkernel=" kernel command - * line parameter. The memory reserved is used by dump capture kernel when - * primary kernel is crashing. - */ +#ifndef CONFIG_KEXEC_CORE static void __init reserve_crashkernel(void) { - unsigned long long crash_base, crash_size; - unsigned long long crash_max = CRASH_ADDR_LOW_MAX; - int ret; - - ret = parse_crashkernel(boot_command_line, memb
[PATCH v16 11/11] kdump: update Documentation about crashkernel
From: Chen Zhou For arm64, the behavior of crashkernel=X has been changed, which tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. So update the Documentation. Signed-off-by: Chen Zhou --- Documentation/admin-guide/kdump/kdump.rst | 11 +-- Documentation/admin-guide/kernel-parameters.txt | 11 +-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index cb30ca3df27c9b2..d4c287044be0c70 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -361,8 +361,15 @@ Boot into System Kernel kernel will automatically locate the crash kernel image within the first 512MB of RAM if X is not given. - On arm64, use "crashkernel=Y[@X]". Note that the start address of - the kernel, X if explicitly specified, must be aligned to 2MiB (0x20). + On arm64, use "crashkernel=X" to try low allocation in DMA zone and + fall back to high allocation if it fails. + We can also use "crashkernel=X,high" to select a high region above + DMA zone, which also tries to allocate at least 256M low memory in + DMA zone automatically. + "crashkernel=Y,low" can be used to allocate specified size low memory. + Use "crashkernel=Y@X" if you really have to reserve memory from + specified start address X. Note that the start address of the kernel, + X if explicitly specified, must be aligned to 2MiB (0x20). Load the Dump-capture Kernel diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9725c546a0d46db..91f3a8dc537d404 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -783,6 +783,9 @@ [KNL, X86-64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. + [KNL, ARM64] Try low allocation in DMA zone and fall back + to high allocation if it fails when '@offset' hasn't been + specified. See Documentation/admin-guide/kdump/kdump.rst for further details. crashkernel=range1:size1[,range2:size2,...][@offset] @@ -799,6 +802,8 @@ Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. + [KNL, ARM64] range in high memory. + Allow kernel to allocate physical memory region from top. crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region @@ -807,13 +812,15 @@ requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate at - at least 256M below 4G automatically. + least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. - + [KNL, ARM64] range in low memory. + This one let user to specify a low range in DMA zone for + crash dump kernel. cryptomgr.notests [KNL] Disable crypto self-tests -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 08/11] x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config
From: Chen Zhou We make the functions reserve_crashkernel[_low]() as generic for x86 and arm64. Since reserve_crashkernel[_low]() implementations are quite similar on other architectures as well, we can have more users of this later. So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and select this by X86 and ARM64. Suggested-by: Mike Rapoport Signed-off-by: Chen Zhou Acked-by: Baoquan He --- arch/Kconfig| 3 +++ arch/arm64/Kconfig | 1 + arch/x86/Kconfig| 2 ++ kernel/crash_core.c | 7 ++- 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 26b8ed11639da46..19256aa924c3b2c 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -24,6 +24,9 @@ config KEXEC_ELF config HAVE_IMA_KEXEC bool +config ARCH_WANT_RESERVE_CRASH_KERNEL + bool + config SET_FS bool diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4207cf9bb17ffb..4b99efa36da3793 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,7 @@ config ARM64 select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36) select ARCH_WANT_LD_ORPHAN_WARN + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select ARCH_WANTS_NO_INSTR select ARCH_HAS_UBSAN_SANITIZE_ALL select ARM_AMBA diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7399327d1eff79d..528034b4276ecf8 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -12,6 +12,7 @@ config X86_32 depends on !64BIT # Options that are inherently 32-bit kernel only: select ARCH_WANT_IPC_PARSE_VERSION + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select CLKSRC_I8253 select CLONE_BACKWARDS select GENERIC_VDSO_32 @@ -28,6 +29,7 @@ config X86_64 select ARCH_HAS_GIGANTIC_PAGE select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 4dc2643fcbccf99..b23cfc0ca8905fd 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -321,9 +321,7 @@ int __init parse_crashkernel_low(char *cmdline, * - Crashkernel reservation -- */ -#ifdef CONFIG_KEXEC_CORE - -#if defined(CONFIG_X86) || defined(CONFIG_ARM64) +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL static int __init reserve_crashkernel_low(void) { #ifdef CONFIG_64BIT @@ -451,8 +449,7 @@ void __init reserve_crashkernel(void) crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; } -#endif -#endif /* CONFIG_KEXEC_CORE */ +#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */ Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 04/11] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()
From: Chen Zhou We will make the functions reserve_crashkernel() as generic, the xen_pv_domain() check in reserve_crashkernel() is relevant only to x86, the same as insert_resource() in reserve_crashkernel[_low](). So move xen_pv_domain() check and insert_resource() to setup_arch() to keep them in x86. Suggested-by: Mike Rapoport Signed-off-by: Chen Zhou Tested-by: John Donnelly Acked-by: Baoquan He --- arch/x86/kernel/setup.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 2330dcb83e8f06a..63ed089f9778fc3 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void) crashk_low_res.start = low_base; crashk_low_res.end = low_base + low_size - 1; - insert_resource(&iomem_resource, &crashk_low_res); #endif return 0; } @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void) high = true; } - if (xen_pv_domain()) { - pr_info("Ignoring crashkernel for a Xen PV domain\n"); - return; - } - /* 0 means: find the address automatically */ if (!crash_base) { /* @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void) crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; - insert_resource(&iomem_resource, &crashk_res); } #else static void __init reserve_crashkernel(void) @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p) * Reserve memory for crash kernel after SRAT is parsed so that it * won't consume hotpluggable memory. */ - reserve_crashkernel(); + if (xen_pv_domain()) + pr_info("Ignoring crashkernel for a Xen PV domain\n"); + else { + reserve_crashkernel(); +#ifdef CONFIG_KEXEC_CORE + if (crashk_res.end > crashk_res.start) + insert_resource(&iomem_resource, &crashk_res); + if (crashk_low_res.end > crashk_low_res.start) + insert_resource(&iomem_resource, &crashk_low_res); +#endif + } memblock_find_dma_reserve(); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 02/11] x86: kdump: make the lower bound of crash kernel reservation consistent
From: Chen Zhou The lower bounds of crash kernel reservation and crash kernel low reservation are different, use the consistent value CRASH_ALIGN. Suggested-by: Dave Young Signed-off-by: Chen Zhou Tested-by: John Donnelly --- arch/x86/kernel/setup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index b7286d4c389dd33..a31352d8c404f6c 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void) return 0; } - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN, + CRASH_ADDR_LOW_MAX); if (!low_base) { pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n", (unsigned long)(low_size >> 20)); -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 00/11] support reserving crashkernel above 4G on arm64 kdump
.rst. Changes since [v4] - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike. Changes since [v3] - Add memblock_cap_memory_ranges back for multiple ranges. - Fix some compiling warnings. Changes since [v2] - Split patch "arm64: kdump: support reserving crashkernel above 4G" as two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate patch. Changes since [v1]: - Move common reserve_crashkernel_low() code into kernel/kexec_core.c. - Remove memblock_cap_memory_ranges() i added in v1 and implement that in fdt_enforce_memory_region(). There are at most two crash kernel regions, for two crash kernel regions case, we cap the memory range [min(regs[*].start), max(regs[*].end)] and then remove the memory range in the middle. [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html [2]: https://github.com/robherring/dt-schema/pull/19 [v1]: https://lkml.org/lkml/2019/4/2/1174 [v2]: https://lkml.org/lkml/2019/4/9/86 [v3]: https://lkml.org/lkml/2019/4/9/306 [v4]: https://lkml.org/lkml/2019/4/15/273 [v5]: https://lkml.org/lkml/2019/5/6/1360 [v6]: https://lkml.org/lkml/2019/8/30/142 [v7]: https://lkml.org/lkml/2019/12/23/411 [v8]: https://lkml.org/lkml/2020/5/21/213 [v9]: https://lkml.org/lkml/2020/6/28/73 [v10]: https://lkml.org/lkml/2020/7/2/1443 [v11]: https://lkml.org/lkml/2020/8/1/150 [v12]: https://lkml.org/lkml/2020/9/7/1037 [v13]: https://lkml.org/lkml/2020/10/31/34 [v14]: https://lkml.org/lkml/2021/1/30/53 [v15]: https://lkml.org/lkml/2021/10/19/1405 Chen Zhou (10): x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN x86: kdump: make the lower bound of crash kernel reservation consistent x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel() x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch() x86: kdump: move reserve_crashkernel[_low]() into crash_core.c arm64: kdump: introduce some macros for crash kernel reservation arm64: kdump: reimplement crashkernel=X x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config of: fdt: Add memory for devices by DT property "linux,usable-memory-range" kdump: update Documentation about crashkernel Zhen Lei (1): of: fdt: Aggregate the processing of "linux,usable-memory-range" Documentation/admin-guide/kdump/kdump.rst | 11 +- .../admin-guide/kernel-parameters.txt | 11 +- arch/Kconfig | 3 + arch/arm64/Kconfig| 1 + arch/arm64/include/asm/kexec.h| 10 ++ arch/arm64/kernel/machine_kexec_file.c| 12 +- arch/arm64/kernel/setup.c | 13 +- arch/arm64/mm/init.c | 59 ++- arch/x86/Kconfig | 2 + arch/x86/include/asm/elf.h| 3 + arch/x86/include/asm/kexec.h | 31 +++- arch/x86/kernel/setup.c | 163 ++ drivers/of/fdt.c | 45 +++-- include/linux/crash_core.h| 3 + include/linux/kexec.h | 2 - kernel/crash_core.c | 156 + kernel/kexec_core.c | 17 -- 17 files changed, 304 insertions(+), 238 deletions(-) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 03/11] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()
From: Chen Zhou To make the functions reserve_crashkernel() as generic, replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX. Signed-off-by: Chen Zhou Tested-by: John Donnelly Acked-by: Baoquan He --- arch/x86/kernel/setup.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index a31352d8c404f6c..2330dcb83e8f06a 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void) if (!crash_base) { /* * Set CRASH_ADDR_LOW_MAX upper bound for crash memory, -* crashkernel=x,high reserves memory over 4G, also allocates -* 256M extra low memory for DMA buffers and swiotlb. +* crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX, +* also allocates 256M extra low memory for DMA buffers +* and swiotlb. * But the extra memory is not required for all machines. * So try low memory first and fall back to high memory * unless "crashkernel=size[KMG],high" is specified. @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) } } - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { memblock_phys_free(crash_base, crash_size); return; } -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v16 01/11] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
From: Chen Zhou Move CRASH_ALIGN to header asm/kexec.h for later use. Suggested-by: Dave Young Suggested-by: Baoquan He Signed-off-by: Chen Zhou Tested-by: John Donnelly --- arch/x86/include/asm/kexec.h | 3 +++ arch/x86/kernel/setup.c | 3 --- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 11b7c06e2828c30..3a22e65262aa70b 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -18,6 +18,9 @@ # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 +/* 16M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_16M + #ifndef __ASSEMBLY__ #include diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index c410be738ae78e0..b7286d4c389dd33 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE -/* 16M alignment for crash kernel regions */ -#define CRASH_ALIGNSZ_16M - /* * Keep the crash kernel below this limit. * -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v15 07/10] arm64: kdump: reimplement crashkernel=X
From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Another minor change, there may be two regions reserved for crash dump kernel, in order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)". Signed-off-by: Chen Zhou Tested-by: John Donnelly --- arch/arm64/include/asm/kexec.h | 4 ++ arch/arm64/kernel/machine_kexec_file.c | 12 +- arch/arm64/kernel/setup.c | 13 +- arch/arm64/mm/init.c | 59 +- kernel/crash_core.c| 6 +-- 5 files changed, 40 insertions(+), 54 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index b51ceb143cbbdb0..fa17fc8a5a2701b 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {} static inline void crash_post_resume(void) {} #endif +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #define ARCH_HAS_KIMAGE_ARCH struct kimage_arch { diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 63634b4d72c158f..6f3fa059ca4e816 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index be5f85b0a24de69..4bb2e55366be64d 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -248,7 +248,18 @@ static void __init request_standard_resources(void) kernel_data.end <= res->end) request_resource(res, &kernel_data); #ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ + /* +* Userspace will find "Crash kernel" or "Crash kernel (low)" +* region in /proc/iomem. +* In order to distinct from the high region and make no effect +* to the use of existing kexec-tools, rename the low region as +* "Crash kernel (low)". +*/ + if (crashk_low_res.end && crashk_low_res.start >= res->start && + crashk_low_res.end <= res->end) { + crashk_low_res.name = "Crash kernel (low)"; + request_resource(res, &crashk_low_res); + } if (crashk_res.end && crashk_res.start >= res->start && crashk_res.end <= res->end) request_resource(res, &crashk_res); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 2c94ae13b160834..cde26d49f76cfa0 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -64,57 +65,11 @@ EXPORT_SYMBOL(memstart_addr); */ phys_addr_t arm64_dma_phys_limit __ro_after_init; -#ifdef CONFIG_KEXEC_CORE -/* - * reserve_crashkernel() - reserves memory for crash kernel - * - * This function reserves memory area given in "crashkernel=" kernel command - * line parameter. The memory reserved is used by dump capture kernel when - * primary kernel is crashing. - */ +#ifndef CONFIG_KEXEC_CORE static void __init reserve_crashkernel(void) { - unsigned long long crash_base, crash_size; - unsigned long long crash_max = CRASH_ADDR_LOW_MAX; - int ret; - - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), - &crash_size, &crash_base); - /* no crashkernel= or
[PATCH v15 10/10] kdump: update Documentation about crashkernel
From: Chen Zhou For arm64, the behavior of crashkernel=X has been changed, which tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. So update the Documentation. Signed-off-by: Chen Zhou --- Documentation/admin-guide/kdump/kdump.rst | 11 +-- Documentation/admin-guide/kernel-parameters.txt | 11 +-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index cb30ca3df27c9b2..d4c287044be0c70 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -361,8 +361,15 @@ Boot into System Kernel kernel will automatically locate the crash kernel image within the first 512MB of RAM if X is not given. - On arm64, use "crashkernel=Y[@X]". Note that the start address of - the kernel, X if explicitly specified, must be aligned to 2MiB (0x20). + On arm64, use "crashkernel=X" to try low allocation in DMA zone and + fall back to high allocation if it fails. + We can also use "crashkernel=X,high" to select a high region above + DMA zone, which also tries to allocate at least 256M low memory in + DMA zone automatically. + "crashkernel=Y,low" can be used to allocate specified size low memory. + Use "crashkernel=Y@X" if you really have to reserve memory from + specified start address X. Note that the start address of the kernel, + X if explicitly specified, must be aligned to 2MiB (0x20). Load the Dump-capture Kernel diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 43dc35fe5bc038e..98b87e82321413b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -783,6 +783,9 @@ [KNL, X86-64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. + [KNL, ARM64] Try low allocation in DMA zone and fall back + to high allocation if it fails when '@offset' hasn't been + specified. See Documentation/admin-guide/kdump/kdump.rst for further details. crashkernel=range1:size1[,range2:size2,...][@offset] @@ -799,6 +802,8 @@ Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. + [KNL, ARM64] range in high memory. + Allow kernel to allocate physical memory region from top. crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region @@ -807,13 +812,15 @@ requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate at - at least 256M below 4G automatically. + least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. - + [KNL, ARM64] range in low memory. + This one let user to specify a low range in DMA zone for + crash dump kernel. cryptomgr.notests [KNL] Disable crypto self-tests -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v15 08/10] x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config
From: Chen Zhou We make the functions reserve_crashkernel[_low]() as generic for x86 and arm64. Since reserve_crashkernel[_low]() implementations are quite similar on other architectures as well, we can have more users of this later. So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and select this by X86 and ARM64. Suggested-by: Mike Rapoport Signed-off-by: Chen Zhou Acked-by: Baoquan He --- arch/Kconfig| 3 +++ arch/arm64/Kconfig | 1 + arch/x86/Kconfig| 2 ++ kernel/crash_core.c | 7 ++- 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 8df1c71026435df..d0585ce1b81b9cb 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -24,6 +24,9 @@ config KEXEC_ELF config HAVE_IMA_KEXEC bool +config ARCH_WANT_RESERVE_CRASH_KERNEL + bool + config SET_FS bool diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fee914c716aa262..0ddf06afe625584 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -94,6 +94,7 @@ config ARM64 select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36) select ARCH_WANT_LD_ORPHAN_WARN + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select ARCH_WANTS_NO_INSTR select ARCH_HAS_UBSAN_SANITIZE_ALL select ARM_AMBA diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d9830e7e1060f7c..66eb5d088695c77 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -12,6 +12,7 @@ config X86_32 depends on !64BIT # Options that are inherently 32-bit kernel only: select ARCH_WANT_IPC_PARSE_VERSION + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select CLKSRC_I8253 select CLONE_BACKWARDS select GENERIC_VDSO_32 @@ -28,6 +29,7 @@ config X86_64 select ARCH_HAS_GIGANTIC_PAGE select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 4d81b9ff42db88b..4d5bf55ed71c253 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -321,9 +321,7 @@ int __init parse_crashkernel_low(char *cmdline, * - Crashkernel reservation -- */ -#ifdef CONFIG_KEXEC_CORE - -#if defined(CONFIG_X86) || defined(CONFIG_ARM64) +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL static int __init reserve_crashkernel_low(void) { #ifdef CONFIG_64BIT @@ -451,8 +449,7 @@ void __init reserve_crashkernel(void) crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; } -#endif -#endif /* CONFIG_KEXEC_CORE */ +#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */ Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v15 09/10] of: fdt: Add memory for devices by DT property "linux, usable-memory-range"
From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei --- drivers/of/fdt.c | 47 --- 1 file changed, 36 insertions(+), 11 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 4546572af24bbf1..cf59c847b2c28a5 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -969,8 +969,16 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) elfcorehdr_addr, elfcorehdr_size); } -static phys_addr_t cap_mem_addr; -static phys_addr_t cap_mem_size; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + +static struct memblock_region cap_mem_regions[MAX_USABLE_RANGES]; /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range @@ -979,20 +987,30 @@ static phys_addr_t cap_mem_size; */ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) { - const __be32 *prop; - int len; + const __be32 *prop, *endp; + int len, nr = 0; + struct memblock_region *rgn = &cap_mem_regions[0]; pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + while ((endp - prop) >= (dt_root_addr_cells + dt_root_size_cells)) { + rgn->base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn->size = dt_mem_next_cell(dt_root_size_cells, &prop); + + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +nr, &rgn->base, &rgn->size); + + if (++nr >= MAX_USABLE_RANGES) + break; + + rgn++; + } - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); } #ifdef CONFIG_SERIAL_EARLYCON @@ -1265,7 +1283,8 @@ bool __init early_init_dt_verify(void *params) void __init early_init_dt_scan_nodes(void) { - int rc = 0; + int i, rc = 0; + struct memblock_region *rgn = &cap_mem_regions[0]; /* Initialize {size,address}-cells info */ of_scan_flat_dt(early_init_dt_scan_root, NULL); @@ -1279,7 +1298,13 @@ void __init early_init_dt_scan_nodes(void) of_scan_flat_dt(early_init_dt_scan_memory, NULL); /* Handle linux,usable-memory-range property */ - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn->base, rgn->size); + for (i = 1; i < MAX_USABLE_RANGES; i++) { + rgn++; + + if (rgn->size) + memblock_add(rgn->base, rgn->size); + } } bool __init early_init_dt_scan(void *params) -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v15 06/10] arm64: kdump: introduce some macros for crash kernel reservation
From: Chen Zhou Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for upper bound of high crash memory, use macros instead. Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound of crash kernel reservation. Signed-off-by: Chen Zhou Tested-by: John Donnelly --- arch/arm64/include/asm/kexec.h | 6 ++ arch/arm64/mm/init.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 00dbcc71aeb2918..b51ceb143cbbdb0 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -25,6 +25,12 @@ #define KEXEC_ARCH KEXEC_ARCH_AARCH64 +/* 2M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE + #ifndef __ASSEMBLY__ /** diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 37a81754d9b61f7..2c94ae13b160834 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void) crash_max = crash_base + crash_size; /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M, + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v15 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()
From: Chen Zhou To make the functions reserve_crashkernel() as generic, replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX. Signed-off-by: Chen Zhou Tested-by: John Donnelly Acked-by: Baoquan He --- arch/x86/kernel/setup.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 5bebd46c7ce81f5..1b2c9f5c71a870e 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void) if (!crash_base) { /* * Set CRASH_ADDR_LOW_MAX upper bound for crash memory, -* crashkernel=x,high reserves memory over 4G, also allocates -* 256M extra low memory for DMA buffers and swiotlb. +* crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX, +* also allocates 256M extra low memory for DMA buffers +* and swiotlb. * But the extra memory is not required for all machines. * So try low memory first and fall back to high memory * unless "crashkernel=size[KMG],high" is specified. @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) } } - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { memblock_free(crash_base, crash_size); return; } -- 2.25.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v15 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
From: Chen Zhou Make the functions reserve_crashkernel[_low]() as generic. Arm64 will use these to reimplement crashkernel=X. Signed-off-by: Chen Zhou Tested-by: John Donnelly --- arch/x86/include/asm/elf.h | 3 + arch/x86/include/asm/kexec.h | 28 +- arch/x86/kernel/setup.c | 143 +-- include/linux/crash_core.h | 3 + include/linux/kexec.h| 2 - kernel/crash_core.c | 159 +++ kernel/kexec_core.c | 17 7 files changed, 192 insertions(+), 163 deletions(-) diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 29fea180a6658e8..7a6c36cff8331f5 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled; #define elf_check_arch(x) elf_check_arch_ia32(x) +/* We can also handle crash dumps from 64 bit kernel. */ +# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) + /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx contains a pointer to a function which might be registered using `atexit'. This provides a mean for the dynamic linker to call DT_FINI functions for diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 5f63ad6b6e74b15..3533ede83b42158 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -21,6 +21,27 @@ /* 16M alignment for crash kernel regions */ #define CRASH_ALIGNSZ_16M +/* + * Keep the crash kernel below this limit. + * + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range + * due to mapping restrictions. + * + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is + * the upper limit of system RAM in 4-level paging mode. Since the kdump + * jump could be from 5-level paging to 4-level paging, the jump will fail if + * the kernel is put above 64 TB, and during the 1st kernel bootup there's + * no good way to detect the paging mode of the target kernel which will be + * loaded for dumping. + */ +#ifdef CONFIG_X86_32 +# define CRASH_ADDR_LOW_MAXSZ_512M +# define CRASH_ADDR_HIGH_MAX SZ_512M +#else +# define CRASH_ADDR_LOW_MAXSZ_4G +# define CRASH_ADDR_HIGH_MAX SZ_64T +#endif + #ifndef __ASSEMBLY__ #include @@ -51,9 +72,6 @@ struct kimage; /* The native architecture */ # define KEXEC_ARCH KEXEC_ARCH_386 - -/* We can also handle crash dumps from 64 bit kernel. */ -# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) #else /* Maximum physical address we can use pages from */ # define KEXEC_SOURCE_MEMORY_LIMIT (MAXMEM-1) @@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void); extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss; extern void kdump_nmi_shootdown_cpus(void); +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_KEXEC_H */ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 657ec7fb62da37c..bef6340e0e32441 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void) } } -/* - * - Crashkernel reservation -- - */ - -#ifdef CONFIG_KEXEC_CORE - -/* - * Keep the crash kernel below this limit. - * - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range - * due to mapping restrictions. - * - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is - * the upper limit of system RAM in 4-level paging mode. Since the kdump - * jump could be from 5-level paging to 4-level paging, the jump will fail if - * the kernel is put above 64 TB, and during the 1st kernel bootup there's - * no good way to detect the paging mode of the target kernel which will be - * loaded for dumping. - */ -#ifdef CONFIG_X86_32 -# define CRASH_ADDR_LOW_MAXSZ_512M -# define CRASH_ADDR_HIGH_MAX SZ_512M -#else -# define CRASH_ADDR_LOW_MAXSZ_4G -# define CRASH_ADDR_HIGH_MAX SZ_64T -#endif - -static int __init reserve_crashkernel_low(void) -{ -#ifdef CONFIG_X86_64 - unsigned long long base, low_base = 0, low_size = 0; - unsigned long low_mem_limit; - int ret; - - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); - - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base); - if (ret) { - /* -* two parts from kernel/dma/swiotlb.c: -* -swiotlb size: user-specified with swiotlb= or default. -* -* -swiotlb overflow buffer: now hardcoded to 32k. We round it -* to 8M for other buffers that may need to stay low too. Also -* make sure w