Re: [PATCH v3 0/2] arm64: kdump: Function supplement and performance optimization
On 8/1/22 9:47 PM, Leizhen (ThunderTown) wrote: On 2022/8/1 16:20, Baoquan He wrote: Hi Catalin, On 07/11/22 at 05:03pm, Zhen Lei wrote: v2 --> v3: 1. Discard patch 3 in v2, a cleanup patch. v1 --> v2: 1. Update the commit message of Patch 1, explicitly indicates that "crashkernel=X,high" is specified but "crashkernel=Y,low" is not specified. 2. Drop Patch 4-5. Currently, focus on function integrity, performance optimization will be considered in later versions. 3. Patch 3 is not mandatory, it's just a cleanup now, although it is a must for patch 4-5. But to avoid subsequent duplication of effort, I'm glad it was accepted. v1: After the basic functions of "support reserving crashkernel above 4G on arm64 kdump"(see https://urldefense.com/v3/__https://lkml.org/lkml/2022/5/6/428__;!!ACWV5N9M2RV99hQ!ORBFa4UAmMss_79nuwu1kpW3D-mTela240vFo0FXOuV9QpGWy7Fp2H81ZjLPOuaufAQC_XBFEFGjAqs5njfGS6Rd4dZLhaez$ ) are implemented, we still have three features to be improved. 1. When crashkernel=X,high is specified but crashkernel=Y,low is not specified, the default crash low memory size is provided. 2. For crashkernel=X without '@offset', if the low memory fails to be allocated, fall back to reserve region from high memory(above DMA zones). 3. If crashkernel=X,high is used, page mapping is performed only for the crash high memory, and block mapping is still used for other linear address spaces. Compared to the previous version: (1) For crashkernel=X[@offset], the memory above 4G is not changed to block mapping, leave it to the next time. (2) The implementation method is modified. Now the implementation is simpler and clearer. Do you have plan to pick this series so that it can be taken into 5.20 rc-1~3? Hi, Catalin: Only function reserve_crashkernel() is modified in these two patches. The core process of the arm64 architecture is not affected. I remember you suggested that arm64 and x86 share the same kdump code, so these two subfeatures are needed. Maybe we can lay the foundation first for the people who build the road. Unifying the external interfaces of kdump on arm64 and x86 does not seem to hurt. We have back ported the basic crashkernel=high, low, support into our distros and have taken wide testing on arm64 servers, need this patchset to back port for more testing. Thanks Baoquan Zhen Lei (2): arm64: kdump: Provide default size when crashkernel=Y,low is not specified arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones .../admin-guide/kernel-parameters.txt | 10 ++- arch/arm64/mm/init.c | 28 +-- 2 files changed, 28 insertions(+), 10 deletions(-) -- 2.25.1 . Hi , What is the progress of this series ? Without this patch set we are seeing larger crashkernel=896M failures on Arm with Linux-6.0.rc7. This larger value is needed for iSCSI booted systems with certain network adapters. Thank you, John. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 0/2] arm64: kdump: Function supplement and performance optimization
On 10/14/22 11:29 AM, Catalin Marinas wrote: On Thu, Oct 13, 2022 at 06:46:35PM +0800, Baoquan He wrote: On 10/06/22 at 09:55am, john.p.donne...@oracle.com wrote: What is the progress of this series ? Without this patch set we are seeing larger crashkernel=896M failures on Arm with Linux-6.0.rc7. This larger value is needed for iSCSI booted systems with certain network adapters. This change is located in arch/arm64 folder, I have pinged arm64 maintainer to consider merging this patchset. Not sure if they are still thinking, or ignore this. Hi Catalin, Will, Ping again! Do you have plan to accept this patchset? It's very important for crashkernel setting on arm64 with a simple and default syntax. I'll look at it once the merging window closes. I saw discussions on this thread and I ignored it until you all agreed ;). Hi, Do you have a timeline for this ? This crashkernel > 4G for Arm item has been lingering for 3 years. I think it is time for it to be incorporated. Thanks, John. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/1] kernel/crash_core.c - Add crashkernel=auto for x86 and ARM
On 11/22/20 9:47 PM, Dave Young wrote: Hi Guilherme, On 11/22/20 at 12:32pm, Guilherme Piccoli wrote: Hi Dave and Kairui, thanks for your responses! OK, if that makes sense to you I'm fine with it. I'd just recommend to test recent kernels in multiple distros with the minimum "range" to see if 64M is enough for crashkernel, maybe we'd need to bump that. Giving the different kernel configs and the different userspace initramfs setup it is hard to get an uniform value for all distributions, but we can have an interface/kconfig-option for them to provide a value like this patch is doing. And it could be improved like Kairui said about some known kernel added extra values later, probably some more improvements if doable. Thanks Dave Hi. Are we going to move forward with implementing this for X86 and Arm ? If other platform maintainers want to include this CONFIG option in their configuration settings they have a starting point. Thank you, John. ( I am not currently on many of the included dist lists in this email, so hopefully key contributors are included in this exchange ) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation
On 2/11/21 12:08 PM, Saeed Mirzamohammadi wrote: This adds crashkernel=auto feature to configure reserved memory for vmcore creation. CONFIG_CRASH_AUTO_STR is defined to be set for different kernel distributions and different archs based on their needs. Signed-off-by: Saeed Mirzamohammadi Signed-off-by: John Donnelly Tested-by: John Donnelly --- Documentation/admin-guide/kdump/kdump.rst | 3 ++- .../admin-guide/kernel-parameters.txt | 6 + arch/Kconfig | 24 +++ kernel/crash_core.c | 7 ++ 4 files changed, 39 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index 2da65fef2a1c..e55cdc404c6b 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -285,7 +285,8 @@ This would mean: 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M 3) if the RAM size is larger than 2G, then reserve 128M - +Or you can use crashkernel=auto to choose the crash kernel memory size +based on the recommended configuration set for each arch. Boot into System Kernel === diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7d4e523646c3..aa2099465458 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -736,6 +736,12 @@ a memory unit (amount[KMG]). See also Documentation/admin-guide/kdump/kdump.rst for an example. + crashkernel=auto + [KNL] This parameter will set the reserved memory for + the crash kernel based on the value of the CRASH_AUTO_STR + that is the best effort estimation for each arch. See also + arch/Kconfig for further details. + crashkernel=size[KMG],high [KNL, X86-64] range could be above 4G. Allow kernel to allocate physical memory region from top, so could diff --git a/arch/Kconfig b/arch/Kconfig index af14a567b493..f87c88ffa2f8 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -14,6 +14,30 @@ menu "General architecture-dependent options" config CRASH_CORE bool +if CRASH_CORE + +config CRASH_AUTO_STR + string "Memory reserved for crash kernel" + depends on CRASH_CORE + default "1G-64G:128M,64G-1T:256M,1T-:512M" + help + This configures the reserved memory dependent + on the value of System RAM. The syntax is: + crashkernel=:[,:,...][@offset] + range=start-[end] + + For example: + crashkernel=512M-2G:64M,2G-:128M + + This would mean: + + 1) if the RAM is smaller than 512M, then don't reserve anything +(this is the "rescue" case) + 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M + 3) if the RAM size is larger than 2G, then reserve 128M + +endif # CRASH_CORE + config KEXEC_CORE select CRASH_CORE bool diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 106e4500fd53..ab0a2b4b1ffa 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -250,6 +251,12 @@ static int __init __parse_crashkernel(char *cmdline, if (suffix) return parse_crashkernel_suffix(ck_cmdline, crash_size, suffix); +#ifdef CONFIG_CRASH_AUTO_STR + if (strncmp(ck_cmdline, "auto", 4) == 0) { + ck_cmdline = CONFIG_CRASH_AUTO_STR; + pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n"); + } +#endif /* * if the commandline contains a ':', then that's the extended * syntax -- if not, it must be the classic syntax Hello. Ping. Can we get this reviewed and staged ? Thank you. John. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation
On 2/17/21 8:02 PM, Baoquan He wrote: On 02/11/21 at 10:08am, Saeed Mirzamohammadi wrote: This adds crashkernel=auto feature to configure reserved memory for vmcore creation. CONFIG_CRASH_AUTO_STR is defined to be set for different kernel distributions and different archs based on their needs. Signed-off-by: Saeed Mirzamohammadi Signed-off-by: John Donnelly Tested-by: John Donnelly --- Documentation/admin-guide/kdump/kdump.rst | 3 ++- .../admin-guide/kernel-parameters.txt | 6 + arch/Kconfig | 24 +++ kernel/crash_core.c | 7 ++ 4 files changed, 39 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index 2da65fef2a1c..e55cdc404c6b 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -285,7 +285,8 @@ This would mean: 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M 3) if the RAM size is larger than 2G, then reserve 128M - +Or you can use crashkernel=auto to choose the crash kernel memory size +based on the recommended configuration set for each arch. Boot into System Kernel === diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7d4e523646c3..aa2099465458 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -736,6 +736,12 @@ a memory unit (amount[KMG]). See also Documentation/admin-guide/kdump/kdump.rst for an example. + crashkernel=auto + [KNL] This parameter will set the reserved memory for + the crash kernel based on the value of the CRASH_AUTO_STR + that is the best effort estimation for each arch. See also + arch/Kconfig for further details. + crashkernel=size[KMG],high [KNL, X86-64] range could be above 4G. Allow kernel to allocate physical memory region from top, so could diff --git a/arch/Kconfig b/arch/Kconfig index af14a567b493..f87c88ffa2f8 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -14,6 +14,30 @@ menu "General architecture-dependent options" config CRASH_CORE bool +if CRASH_CORE + +config CRASH_AUTO_STR + string "Memory reserved for crash kernel" + depends on CRASH_CORE + default "1G-64G:128M,64G-1T:256M,1T-:512M" + help + This configures the reserved memory dependent + on the value of System RAM. The syntax is: + crashkernel=:[,:,...][@offset] + range=start-[end] + + For example: + crashkernel=512M-2G:64M,2G-:128M + + This would mean: + + 1) if the RAM is smaller than 512M, then don't reserve anything +(this is the "rescue" case) + 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M + 3) if the RAM size is larger than 2G, then reserve 128M + +endif # CRASH_CORE Wondering if this CRASH_CORE ifdeffery is a little redundent here since CRASH_CORE dependency has been added. Except of this, I like this patch. As we discussed in private threads, we can try to push it into mainline and continue improving later. Hi, Are we good to move forward with this and apply it now? Dave Young acked it. Thank you, John. (Note - I am currently not on any vger.kernel.org dlist at the moment so please cc me ). + config KEXEC_CORE select CRASH_CORE bool diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 106e4500fd53..ab0a2b4b1ffa 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -250,6 +251,12 @@ static int __init __parse_crashkernel(char *cmdline, if (suffix) return parse_crashkernel_suffix(ck_cmdline, crash_size, suffix); +#ifdef CONFIG_CRASH_AUTO_STR + if (strncmp(ck_cmdline, "auto", 4) == 0) { + ck_cmdline = CONFIG_CRASH_AUTO_STR; + pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n"); + } +#endif /* * if the commandline contains a ':', then that's the extended * syntax -- if not, it must be the classic syntax -- 2.27.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation
On 2/25/21 6:38 PM, Dave Young wrote: On 02/23/21 at 09:41am, Saeed Mirzamohammadi wrote: This adds crashkernel=auto feature to configure reserved memory for vmcore creation. CONFIG_CRASH_AUTO_STR is defined to be set for different kernel distributions and different archs based on their needs. Signed-off-by: Saeed Mirzamohammadi Signed-off-by: John Donnelly Tested-by: John Donnelly --- Documentation/admin-guide/kdump/kdump.rst | 3 ++- .../admin-guide/kernel-parameters.txt | 6 ++ arch/Kconfig | 20 +++ kernel/crash_core.c | 7 +++ 4 files changed, 35 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index 75a9dd98e76e..ae030111e22a 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -285,7 +285,8 @@ This would mean: 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M 3) if the RAM size is larger than 2G, then reserve 128M - +Or you can use crashkernel=auto to choose the crash kernel memory size +based on the recommended configuration set for each arch. Boot into System Kernel === diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9e3cdb271d06..a5deda5c85fe 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -747,6 +747,12 @@ a memory unit (amount[KMG]). See also Documentation/admin-guide/kdump/kdump.rst for an example. + crashkernel=auto + [KNL] This parameter will set the reserved memory for + the crash kernel based on the value of the CRASH_AUTO_STR + that is the best effort estimation for each arch. See also + arch/Kconfig for further details. + crashkernel=size[KMG],high [KNL, X86-64] range could be above 4G. Allow kernel to allocate physical memory region from top, so could diff --git a/arch/Kconfig b/arch/Kconfig index 24862d15f3a3..23d047548772 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -14,6 +14,26 @@ menu "General architecture-dependent options" config CRASH_CORE bool +config CRASH_AUTO_STR + string "Memory reserved for crash kernel" + depends on CRASH_CORE + default "1G-64G:128M,64G-1T:256M,1T-:512M" + help + This configures the reserved memory dependent + on the value of System RAM. The syntax is: + crashkernel=:[,:,...][@offset] + range=start-[end] + + For example: + crashkernel=512M-2G:64M,2G-:128M + + This would mean: + + 1) if the RAM is smaller than 512M, then don't reserve anything +(this is the "rescue" case) + 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M + 3) if the RAM size is larger than 2G, then reserve 128M + config KEXEC_CORE select CRASH_CORE bool diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 825284baaf46..90f9e4bb6704 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -250,6 +251,12 @@ static int __init __parse_crashkernel(char *cmdline, if (suffix) return parse_crashkernel_suffix(ck_cmdline, crash_size, suffix); +#ifdef CONFIG_CRASH_AUTO_STR + if (strncmp(ck_cmdline, "auto", 4) == 0) { + ck_cmdline = CONFIG_CRASH_AUTO_STR; + pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n"); + } +#endif /* * if the commandline contains a ':', then that's the extended * syntax -- if not, it must be the classic syntax -- 2.27.0 Acked-by: Dave Young Thanks Dave Hi, Thank you. When can we expect this to be applied in a future build ? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v16 00/11] support reserving crashkernel above 4G on arm64 kdump
On 11/23/21 6:46 AM, Zhen Lei wrote: There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X. crashkernel=X tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices. So there may be two regions reserved for crash dump kernel. In order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)", and pass the low region by reusing DT property "linux,usable-memory-range". We made the low memory region as the last range of "linux,usable-memory-range" to keep compatibility with existing user-space and older kdump kernels. Besides, we need to modify kexec-tools: arm64: support more than one crash kernel regions(see [1]) Another update is document about DT property 'linux,usable-memory-range': schemas: update 'linux,usable-memory-range' node schema(see [2]) This patchset contains the following 11 patches: 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic. 0005 makes functions reserve_crashkernel[_low]() generic. 0006-0008 reimplements arm64 crashkernel=X. 0009-0010 adds memory for devices by DT property linux,usable-memory-range. 0011 updates the doc. Changes since [v15] - Aggregate the processing of "linux,usable-memory-range" into one function. Only patch 9-10 have been updated. Changes since [v14] - Recovering the requirement that the CrashKernel memory regions on X86 only requires 1 MiB alignment. - Combine patches 5 and 6 in v14 into one. The compilation warning fixed by patch 6 was introduced by patch 5 in v14. - As with crashk_res, crashk_low_res is also processed by crash_exclude_mem_range() in patch 7. - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling") has removed the architecture-specific code, extend the property "linux,usable-memory-range" in the platform-agnostic FDT core code. See patch 9. - Discard the x86 description update in the document, because the description has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide"). - Change "arm64" to "ARM64" in Doc. Changes since [v13] - Rebased on top of 5.11-rc5. - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL. Since reserve_crashkernel[_low]() implementations are quite similar on other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and select this by X86 and ARM64. - Some minor cleanup. Changes since [v12] - Rebased on top of 5.10-rc1. - Keep CRASH_ALIGN as 16M suggested by Dave. - Drop patch "kdump: add threshold for the required memory". - Add Tested-by from John. Changes since [v11] - Rebased on top of 5.9-rc4. - Make the function reserve_crashkernel() of x86 generic. Suggested by Catalin, make the function reserve_crashkernel() of x86 generic and arm64 use the generic version to reimplement crashkernel=X. Changes since [v10] - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin. Changes since [v9] - Patch 1 add Acked-by from Dave. - Update patch 5 according to Dave's comments. - Update chosen schema. Changes since [v8] - Reuse DT property "linux,usable-memory-range". Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low memory region. - Fix kdump broken with ZONE_DMA reintroduced. - Update chosen schema. Changes since [v7] - Move x86 CRASH_ALIGN to 2M Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M. - Update Documentation/devicetree/bindings/chosen.txt. Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd. - Add Tested-by from Jhon and pk. Changes since [v6] - Fix build errors reported by kbuild test robot. Changes since [v5] - Move reserve_crashkernel_low() into kernel/crash_core.c. - Delete crashkernel=X,high. - Modify crashkernel=X,low. If crashkernel=X,low is specified simultaneously, reserve spcified size low memory for crash kdump kernel devices firstly and then reserve memory above 4G. In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then pass to crash dump kernel by DT property "linux,low-memory-range". - Update Documentation/admin-guide/kdump/kdump.rst. Changes since [v4] - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike. Changes since [v3] - Add memblock_cap_memory_ra
Re: [PATCH v16 00/11] support reserving crashkernel above 4G on arm64 kdump
On 12/8/21 11:13 AM, Catalin Marinas wrote: On Tue, Nov 23, 2021 at 08:46:35PM +0800, Zhen Lei wrote: Chen Zhou (10): x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN x86: kdump: make the lower bound of crash kernel reservation consistent x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel() x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch() x86: kdump: move reserve_crashkernel[_low]() into crash_core.c arm64: kdump: introduce some macros for crash kernel reservation arm64: kdump: reimplement crashkernel=X x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config of: fdt: Add memory for devices by DT property "linux,usable-memory-range" kdump: update Documentation about crashkernel Zhen Lei (1): of: fdt: Aggregate the processing of "linux,usable-memory-range" Apart from a minor comment I made on patch 8 and some comments from Rob that need addressing, the rest looks fine to me. Ingo stated in the past that he's happy to ack the x86 changes as long as there's no functional change (and that's the case AFAICT). Ingo, does your conditional ack still stand? In terms of merging, I'm happy to take it all through the arm64 tree with acks from the x86 maintainers. Alternatively, with the change I mentioned for patch 8, the first 5 patches could be queued via the tip tree on a stable branch and I can base the rest of the arm64 on top. Thomas, Ingo, Peter, any preference? Thanks. Hi, If you notice the trend over the past year , some of additional review requests are because the submitter had to rebase to the next version. Can we get this acked and placed in a build so others can test and start using it ? Thank you, JD ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 2/5] dma-pool: allow user to disable atomic pool
On 12/13/21 6:27 AM, Baoquan He wrote: In the current code, three atomic memory pools are always created, atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is specified in kernel command line. In fact, atomic pool is only necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y which are needed on few ARCHes. So change code to allow user to disable atomic pool by specifying 'coherent_pool=0'. Meanwhile, update the relevant document in kernel-parameter.txt. Signed-off-by: Baoquan He > Acked-by: John Donnelly Tested-by: John Donnelly --- Documentation/admin-guide/kernel-parameters.txt | 3 ++- kernel/dma/pool.c | 7 +-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index ec4d25e854a8..d7015309614b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -664,7 +664,8 @@ coherent_pool=nn[KMG] [ARM,KNL] Sets the size of memory pool for coherent, atomic dma - allocations. Otherwise the default size will be scaled + allocations. A value of 0 disables the three atomic + memory pool. Otherwise the default size will be scaled with memory capacity, while clamped between 128K and 1 << (PAGE_SHIFT + MAX_ORDER-1). diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 5f84e6cdb78e..5a85804b5beb 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -21,7 +21,7 @@ static struct gen_pool *atomic_pool_kernel __ro_after_init; static unsigned long pool_size_kernel; /* Size can be defined by the coherent_pool command line */ -static size_t atomic_pool_size; +static unsigned long atomic_pool_size = -1; /* Dynamic background expansion when the atomic pool is near capacity */ static struct work_struct atomic_pool_work; @@ -188,11 +188,14 @@ static int __init dma_atomic_pool_init(void) { int ret = 0; + if (!atomic_pool_size) + return 0; + /* * If coherent_pool was not used on the command line, default the pool * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1. */ - if (!atomic_pool_size) { + if (atomic_pool_size == -1) { unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K); pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES); atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/5] docs: kernel-parameters: Update to reflect the current default size of atomic pool
On 12/13/21 6:27 AM, Baoquan He wrote: Since commit 1d659236fb43("dma-pool: scale the default DMA coherent pool size with memory capacity"), the default size of atomic pool has been changed to take by scaling with system memory capacity. So update the document in kerenl-parameter.txt accordingly. Signed-off-by: Baoquan He > Acked-by: John Donnelly Tested-by: John Donnelly --- Documentation/admin-guide/kernel-parameters.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9725c546a0d4..ec4d25e854a8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -664,7 +664,9 @@ coherent_pool=nn[KMG] [ARM,KNL] Sets the size of memory pool for coherent, atomic dma - allocations, by default set to 256K. + allocations. Otherwise the default size will be scaled + with memory capacity, while clamped between 128K and + 1 << (PAGE_SHIFT + MAX_ORDER-1). com20020= [HW,NET] ARCnet - COM20020 chipset Format: ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 3/5] mm_zone: add function to check if managed dma zone exists
On 12/13/21 6:27 AM, Baoquan He wrote: In some places of the current kernel, it assumes that dma zone must have managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true. E.g in kdump kernel of x86_64, only low 1M is presented and locked down at very early stage of boot, so that there's no managed pages at all in DMA zone. This exception will always cause page allocation failure if page is requested from DMA zone. Here add function has_managed_dma() and the relevant helper functions to check if there's DMA zone with managed pages. It will be used in later patches. Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Cc: sta...@vger.kernel.org Signed-off-by: Baoquan He > Acked-by: John Donnelly Tested-by: John Donnelly --- v2->v3: Rewrite has_managed_dma() in a simpler and more efficient way which is sugggested by DavidH. include/linux/mmzone.h | 9 + mm/page_alloc.c| 15 +++ 2 files changed, 24 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 58e744b78c2c..6e1b726e9adf 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1046,6 +1046,15 @@ static inline int is_highmem_idx(enum zone_type idx) #endif } +#ifdef CONFIG_ZONE_DMA +bool has_managed_dma(void); +#else +static inline bool has_managed_dma(void) +{ + return false; +} +#endif + /** * is_highmem - helper function to quickly check if a struct zone is a * highmem zone or not. This is an attempt to keep references diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..7c7a0b5de2ff 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -9460,3 +9460,18 @@ bool take_page_off_buddy(struct page *page) return ret; } #endif + +#ifdef CONFIG_ZONE_DMA +bool has_managed_dma(void) +{ + struct pglist_data *pgdat; + + for_each_online_pgdat(pgdat) { + struct zone *zone = &pgdat->node_zones[ZONE_DMA]; + + if (managed_zone(zone)) + return true; + } + return false; +} +#endif /* CONFIG_ZONE_DMA */ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages
On 12/13/21 6:27 AM, Baoquan He wrote: Currently three dma atomic pools are initialized as long as the relevant kernel codes are built in. While in kdump kernel of x86_64, this is not right when trying to create atomic_pool_dma, because there's no managed pages in DMA zone. In the case, DMA zone only has low 1M memory presented and locked down by memblock allocator. So no pages are added into buddy of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve the first 1M of RAM"). Then in kdump kernel of x86_64, it always prints below failure message: DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1 Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018 Call Trace: dump_stack+0x7f/0xa1 warn_alloc.cold+0x72/0xd6 ? _raw_spin_unlock_irq+0x24/0x40 ? __alloc_pages_direct_compact+0x90/0x1b0 __alloc_pages_slowpath.constprop.0+0xf29/0xf50 ? __cond_resched+0x16/0x50 ? prepare_alloc_pages.constprop.0+0x19d/0x1b0 __alloc_pages+0x24d/0x2c0 ? __dma_atomic_pool_init+0x93/0x93 alloc_page_interleave+0x13/0xb0 atomic_pool_expand+0x118/0x210 ? __dma_atomic_pool_init+0x93/0x93 __dma_atomic_pool_init+0x45/0x93 dma_atomic_pool_init+0xdb/0x176 do_one_initcall+0x67/0x320 ? rcu_read_lock_sched_held+0x3f/0x80 kernel_init_freeable+0x290/0x2dc ? rest_init+0x24f/0x24f kernel_init+0xa/0x111 ret_from_fork+0x22/0x30 Mem-Info: .. DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations Here, let's check if DMA zone has managed pages, then create atomic_pool_dma if yes. Otherwise just skip it. Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Cc: sta...@vger.kernel.org Signed-off-by: Baoquan He Acked-by: John Donnelly Tested-by: John Donnelly Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: io...@lists.linux-foundation.org --- kernel/dma/pool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 5a85804b5beb..00df3edd6c5d 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void) GFP_KERNEL); if (!atomic_pool_kernel) ret = -ENOMEM; - if (IS_ENABLED(CONFIG_ZONE_DMA)) { + if (has_managed_dma()) { atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size, GFP_KERNEL | GFP_DMA); if (!atomic_pool_dma) @@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp) if (prev == NULL) { if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32)) return atomic_pool_dma32; - if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA)) + if (atomic_pool_dma && (gfp & GFP_DMA)) return atomic_pool_dma; return atomic_pool_kernel; } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
On 12/13/21 6:27 AM, Baoquan He wrote: Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled. However, it will fail if DMA zone has no managed pages. The failure can be seen in kdump kernel of x86_64 as below: CPU: 0 PID: 65 Comm: kworker/u2:1 Not tainted 5.14.0-rc2+ #9 Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS RMLSDP.86I.R2.28.D690.1306271008 06/27/2013 Workqueue: events_unbound async_run_entry_fn Call Trace: dump_stack_lvl+0x57/0x72 warn_alloc.cold+0x72/0xd6 __alloc_pages_slowpath.constprop.0+0xf56/0xf70 __alloc_pages+0x23b/0x2b0 allocate_slab+0x406/0x630 ___slab_alloc+0x4b1/0x7e0 ? sr_probe+0x200/0x600 ? lock_acquire+0xc4/0x2e0 ? fs_reclaim_acquire+0x4d/0xe0 ? lock_is_held_type+0xa7/0x120 ? sr_probe+0x200/0x600 ? __slab_alloc+0x67/0x90 __slab_alloc+0x67/0x90 ? sr_probe+0x200/0x600 ? sr_probe+0x200/0x600 kmem_cache_alloc_trace+0x259/0x270 sr_probe+0x200/0x600 .. bus_probe_device+0x9f/0xb0 device_add+0x3d2/0x970 .. __scsi_add_device+0xea/0x100 ata_scsi_scan_host+0x97/0x1d0 async_run_entry_fn+0x30/0x130 process_one_work+0x2b0/0x5c0 worker_thread+0x55/0x3c0 ? process_one_work+0x5c0/0x5c0 kthread+0x149/0x170 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 Mem-Info: .. The above failure happened when calling kmalloc() to allocate buffer with GFP_DMA. It requests to allocate slab page from DMA zone while no managed pages in there. sr_probe() --> get_capabilities() --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA); The DMA zone should be checked if it has managed pages, then try to create dma-kmalloc. Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Cc: sta...@vger.kernel.org Signed-off-by: Baoquan He Acked-by: John Donnelly Tested-by: John Donnelly Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Vlastimil Babka --- mm/slab_common.c | 9 + 1 file changed, 9 insertions(+) diff --git a/mm/slab_common.c b/mm/slab_common.c index e5d080a93009..ae4ef0f8903a 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -878,6 +878,9 @@ void __init create_kmalloc_caches(slab_flags_t flags) { int i; enum kmalloc_cache_type type; +#ifdef CONFIG_ZONE_DMA + bool managed_dma; +#endif /* * Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined @@ -905,10 +908,16 @@ void __init create_kmalloc_caches(slab_flags_t flags) slab_state = UP; #ifdef CONFIG_ZONE_DMA + managed_dma = has_managed_dma(); + for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) { struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { + if (!managed_dma) { + kmalloc_caches[KMALLOC_DMA][i] = kmalloc_caches[KMALLOC_NORMAL][i]; + continue; + } kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( kmalloc_info[i].name[KMALLOC_DMA], kmalloc_info[i].size, ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
On 12/10/21 12:55 AM, Zhen Lei wrote: From: Chen Zhou Move CRASH_ALIGN to header asm/kexec.h for later use. Suggested-by: Dave Young Suggested-by: Baoquan He Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp > Acked-by: John Donnelly --- arch/x86/include/asm/kexec.h | 3 +++ arch/x86/kernel/setup.c | 3 --- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 11b7c06e2828c30..3a22e65262aa70b 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -18,6 +18,9 @@ # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 +/* 16M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_16M + #ifndef __ASSEMBLY__ #include diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6a190c7f4d71b05..5cc60996eac56d6 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE -/* 16M alignment for crash kernel regions */ -#define CRASH_ALIGNSZ_16M - /* * Keep the crash kernel below this limit. * ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()
On 12/10/21 12:55 AM, Zhen Lei wrote: From: Chen Zhou To make the functions reserve_crashkernel() as generic, replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp Acked-by: Baoquan He Acked-by: John Donnelly --- arch/x86/kernel/setup.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6424ee4f23da2cf..bb2a0973b98059e 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void) if (!crash_base) { /* * Set CRASH_ADDR_LOW_MAX upper bound for crash memory, -* crashkernel=x,high reserves memory over 4G, also allocates -* 256M extra low memory for DMA buffers and swiotlb. +* crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX, +* also allocates 256M extra low memory for DMA buffers +* and swiotlb. * But the extra memory is not required for all machines. * So try low memory first and fall back to high memory * unless "crashkernel=size[KMG],high" is specified. @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) } } - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { memblock_phys_free(crash_base, crash_size); return; } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent
On 12/10/21 12:55 AM, Zhen Lei wrote: From: Chen Zhou The lower bounds of crash kernel reservation and crash kernel low reservation are different, use the consistent value CRASH_ALIGN. Suggested-by: Dave Young Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp Acked-by: John Donnelly --- arch/x86/kernel/setup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 5cc60996eac56d6..6424ee4f23da2cf 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void) return 0; } - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN, + CRASH_ADDR_LOW_MAX); if (!low_base) { pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n", (unsigned long)(low_size >> 20)); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
On 12/10/21 12:55 AM, Zhen Lei wrote: From: Chen Zhou Make the functions reserve_crashkernel[_low]() as generic. Since reserve_crashkernel[_low]() implementations are quite similar on other architectures as well, we can have more users of this later. So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and select this by X86. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp Acked-by: John Donnelly --- arch/Kconfig | 3 + arch/x86/Kconfig | 2 + arch/x86/include/asm/elf.h | 3 + arch/x86/include/asm/kexec.h | 28 ++- arch/x86/kernel/setup.c | 143 +--- include/linux/crash_core.h | 3 + include/linux/kexec.h| 2 - kernel/crash_core.c | 156 +++ kernel/kexec_core.c | 17 9 files changed, 194 insertions(+), 163 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index d3c4ab249e9c275..7bdb32c41985dc5 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -24,6 +24,9 @@ config KEXEC_ELF config HAVE_IMA_KEXEC bool +config ARCH_WANT_RESERVE_CRASH_KERNEL + bool + config SET_FS bool diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5c2ccb85f2efb86..bd78ed8193079b9 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -12,6 +12,7 @@ config X86_32 depends on !64BIT # Options that are inherently 32-bit kernel only: select ARCH_WANT_IPC_PARSE_VERSION + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select CLKSRC_I8253 select CLONE_BACKWARDS select GENERIC_VDSO_32 @@ -28,6 +29,7 @@ config X86_64 select ARCH_HAS_GIGANTIC_PAGE select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 29fea180a6658e8..7a6c36cff8331f5 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled; #define elf_check_arch(x) elf_check_arch_ia32(x) +/* We can also handle crash dumps from 64 bit kernel. */ +# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) + /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx contains a pointer to a function which might be registered using `atexit'. This provides a mean for the dynamic linker to call DT_FINI functions for diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 3a22e65262aa70b..3ff38a1353a2b86 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -21,6 +21,27 @@ /* 16M alignment for crash kernel regions */ #define CRASH_ALIGN SZ_16M +/* + * Keep the crash kernel below this limit. + * + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range + * due to mapping restrictions. + * + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is + * the upper limit of system RAM in 4-level paging mode. Since the kdump + * jump could be from 5-level paging to 4-level paging, the jump will fail if + * the kernel is put above 64 TB, and during the 1st kernel bootup there's + * no good way to detect the paging mode of the target kernel which will be + * loaded for dumping. + */ +#ifdef CONFIG_X86_32 +# define CRASH_ADDR_LOW_MAXSZ_512M +# define CRASH_ADDR_HIGH_MAX SZ_512M +#else +# define CRASH_ADDR_LOW_MAXSZ_4G +# define CRASH_ADDR_HIGH_MAX SZ_64T +#endif + #ifndef __ASSEMBLY__ #include @@ -51,9 +72,6 @@ struct kimage; /* The native architecture */ # define KEXEC_ARCH KEXEC_ARCH_386 - -/* We can also handle crash dumps from 64 bit kernel. */ -# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) #else /* Maximum physical address we can use pages from */ # define KEXEC_SOURCE_MEMORY_LIMIT (MAXMEM-1) @@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void); extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss; extern void kdump_nmi_shootdown_cpus(void); +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_KEXEC_H */ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 7ae00716a208f82..5519baa7f4b964e 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void) } } -/* - * - Crashkernel reservation -- - */ - -#ifdef CONFIG_KEXEC_CORE - -/* - * Keep the crash kernel below this limit. - * -
Re: [PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()
On 12/10/21 12:55 AM, Zhen Lei wrote: From: Chen Zhou We will make the functions reserve_crashkernel() as generic, the xen_pv_domain() check in reserve_crashkernel() is relevant only to x86, the same as insert_resource() in reserve_crashkernel[_low](). So move xen_pv_domain() check and insert_resource() to setup_arch() to keep them in x86. Suggested-by: Mike Rapoport Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp Acked-by: Baoquan He Acked-by: John Donnelly --- arch/x86/kernel/setup.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index bb2a0973b98059e..7ae00716a208f82 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void) crashk_low_res.start = low_base; crashk_low_res.end = low_base + low_size - 1; - insert_resource(&iomem_resource, &crashk_low_res); #endif return 0; } @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void) high = true; } - if (xen_pv_domain()) { - pr_info("Ignoring crashkernel for a Xen PV domain\n"); - return; - } - /* 0 means: find the address automatically */ if (!crash_base) { /* @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void) crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; - insert_resource(&iomem_resource, &crashk_res); } #else static void __init reserve_crashkernel(void) @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p) * Reserve memory for crash kernel after SRAT is parsed so that it * won't consume hotpluggable memory. */ - reserve_crashkernel(); + if (xen_pv_domain()) + pr_info("Ignoring crashkernel for a Xen PV domain\n"); + else { + reserve_crashkernel(); +#ifdef CONFIG_KEXEC_CORE + if (crashk_res.end > crashk_res.start) + insert_resource(&iomem_resource, &crashk_res); + if (crashk_low_res.end > crashk_low_res.start) + insert_resource(&iomem_resource, &crashk_low_res); +#endif + } memblock_find_dma_reserve(); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v17 07/10] arm64: kdump: reimplement crashkernel=X
On 12/10/21 12:55 AM, Zhen Lei wrote: From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Another minor change, there may be two regions reserved for crash dump kernel, in order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)". Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp Acked-by: John Donnelly --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/kexec.h | 4 ++ arch/arm64/kernel/machine_kexec_file.c | 12 +- arch/arm64/kernel/setup.c | 13 +- arch/arm64/mm/init.c | 59 +- 5 files changed, 38 insertions(+), 51 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4207cf9bb17ffb..4b99efa36da3793 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,7 @@ config ARM64 select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36) select ARCH_WANT_LD_ORPHAN_WARN + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE select ARCH_WANTS_NO_INSTR select ARCH_HAS_UBSAN_SANITIZE_ALL select ARM_AMBA diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 1b9edc69f0244ca..3bde0079925d771 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {} static inline void crash_post_resume(void) {} #endif +#ifdef CONFIG_KEXEC_CORE +extern void __init reserve_crashkernel(void); +#endif + #if defined(CONFIG_KEXEC_CORE) void cpu_soft_restart(unsigned long el2_switch, unsigned long entry, unsigned long arg0, unsigned long arg1, diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 63634b4d72c158f..6f3fa059ca4e816 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index be5f85b0a24de69..4bb2e55366be64d 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -248,7 +248,18 @@ static void __init request_standard_resources(void) kernel_data.end <= res->end) request_resource(res, &kernel_data); #ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ + /* +* Userspace will find "Crash kernel" or "Crash kernel (low)" +* region in /proc/iomem. +* In order to distinct from the high region and make no effect +* to the use of existing kexec-tools, rename the low region as +* "Crash kernel (low)". +*/ + if (crashk_low_res.end && crashk_low_res.start >= res->start && + crashk_low_res.end <= res->end) { + crashk_low_res.name = "Crash kernel (low)"; + request_resource(res, &crashk_low_res); + } if (crashk_res.end && crashk_res.start >= res->start && crashk_res.end <= res->end) request_resource(res, &crashk_res); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index be4595dc7459115..85c83e4eff2b6c4 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #includ
Re: [PATCH v17 06/10] arm64: kdump: introduce some macros for crash kernel reservation
On 12/10/21 12:55 AM, Zhen Lei wrote: From: Chen Zhou Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for upper bound of high crash memory, use macros instead. Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound of crash kernel reservation. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp Acked-by: John Donnelly --- arch/arm64/include/asm/kexec.h | 6 ++ arch/arm64/mm/init.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 9839bfc163d7147..1b9edc69f0244ca 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -25,6 +25,12 @@ #define KEXEC_ARCH KEXEC_ARCH_AARCH64 +/* 2M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE + #ifndef __ASSEMBLY__ /** diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index a8834434af99ae0..be4595dc7459115 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void) crash_max = crash_base + crash_size; /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M, + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v17 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
On 12/10/21 12:55 AM, Zhen Lei wrote: From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: Dave Kleikamp Acked-by: John Donnelly --- drivers/of/fdt.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 37b477a51175359..f7b72fa773250ad 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -967,6 +967,15 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range * location from flat tree @@ -974,10 +983,9 @@ static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; */ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) { - const __be32 *prop; - int len; - phys_addr_t cap_mem_addr; - phys_addr_t cap_mem_size; + struct memblock_region rgn[MAX_USABLE_RANGES] = {0}; + const __be32 *prop, *endp; + int len, i; if ((long)node < 0) return; @@ -985,16 +993,21 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells))) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) { + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop); - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +i, &rgn[i].base, &rgn[i].size); + } - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn[0].base, rgn[0].size); + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++) + memblock_add(rgn[i].base, rgn[i].size); } #ifdef CONFIG_SERIAL_EARLYCON ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v17 08/10] of: fdt: Aggregate the processing of "linux,usable-memory-range"
On 12/10/21 12:55 AM, Zhen Lei wrote: Currently, we parse the "linux,usable-memory-range" property in early_init_dt_scan_chosen(), to obtain the specified memory range of the crash kernel. We then reserve the required memory after early_init_dt_scan_memory() has identified all available physical memory. Because the two pieces of code are separated far, the readability and maintainability are reduced. So bring them together. Suggested-by: Rob Herring Signed-off-by: Zhen Lei Tested-by: Dave Kleikamp Acked-by: John Donnelly --- drivers/of/fdt.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index bdca35284cebd56..37b477a51175359 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -965,8 +965,7 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) elfcorehdr_addr, elfcorehdr_size); } -static phys_addr_t cap_mem_addr; -static phys_addr_t cap_mem_size; +static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range @@ -977,6 +976,11 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) { const __be32 *prop; int len; + phys_addr_t cap_mem_addr; + phys_addr_t cap_mem_size; + + if ((long)node < 0) + return; pr_debug("Looking for usable-memory-range property... "); @@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node) pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, &cap_mem_size); + + memblock_cap_memory_range(cap_mem_addr, cap_mem_size); } #ifdef CONFIG_SERIAL_EARLYCON @@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0)) return 0; + chosen_node_offset = node; + early_init_dt_check_for_initrd(node); early_init_dt_check_for_elfcorehdr(node); - early_init_dt_check_for_usable_mem_range(node); /* Retrieve command line */ p = of_get_flat_dt_prop(node, "bootargs", &l); @@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void) of_scan_flat_dt(early_init_dt_scan_memory, NULL); /* Handle linux,usable-memory-range property */ - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + early_init_dt_check_for_usable_mem_range(chosen_node_offset); } bool __init early_init_dt_scan(void *params) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v17 10/10] kdump: update Documentation about crashkernel
On 12/10/21 12:55 AM, Zhen Lei wrote: From: Chen Zhou For arm64, the behavior of crashkernel=X has been changed, which tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. So update the Documentation. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Acked-by: John Donnelly --- Documentation/admin-guide/kdump/kdump.rst | 11 +-- Documentation/admin-guide/kernel-parameters.txt | 11 +-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index cb30ca3df27c9b2..d4c287044be0c70 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -361,8 +361,15 @@ Boot into System Kernel kernel will automatically locate the crash kernel image within the first 512MB of RAM if X is not given. - On arm64, use "crashkernel=Y[@X]". Note that the start address of - the kernel, X if explicitly specified, must be aligned to 2MiB (0x20). + On arm64, use "crashkernel=X" to try low allocation in DMA zone and + fall back to high allocation if it fails. + We can also use "crashkernel=X,high" to select a high region above + DMA zone, which also tries to allocate at least 256M low memory in + DMA zone automatically. + "crashkernel=Y,low" can be used to allocate specified size low memory. + Use "crashkernel=Y@X" if you really have to reserve memory from + specified start address X. Note that the start address of the kernel, + X if explicitly specified, must be aligned to 2MiB (0x20). Load the Dump-capture Kernel diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9725c546a0d46db..91f3a8dc537d404 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -783,6 +783,9 @@ [KNL, X86-64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. + [KNL, ARM64] Try low allocation in DMA zone and fall back + to high allocation if it fails when '@offset' hasn't been + specified. See Documentation/admin-guide/kdump/kdump.rst for further details. crashkernel=range1:size1[,range2:size2,...][@offset] @@ -799,6 +802,8 @@ Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. + [KNL, ARM64] range in high memory. + Allow kernel to allocate physical memory region from top. crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region @@ -807,13 +812,15 @@ requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate at - at least 256M below 4G automatically. + least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. - + [KNL, ARM64] range in low memory. + This one let user to specify a low range in DMA zone for + crash dump kernel. cryptomgr.notests [KNL] Disable crypto self-tests ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump
On 12/10/21 12:55 AM, Zhen Lei wrote: There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X. crashkernel=X tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices. So there may be two regions reserved for crash dump kernel. In order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)", and pass the low region by reusing DT property "linux,usable-memory-range". We made the low memory region as the last range of "linux,usable-memory-range" to keep compatibility with existing user-space and older kdump kernels. Besides, we need to modify kexec-tools: arm64: support more than one crash kernel regions(see [1]) Another update is document about DT property 'linux,usable-memory-range': schemas: update 'linux,usable-memory-range' node schema(see [2]) This patchset contains the following 10 patches: 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic. 0005 makes functions reserve_crashkernel[_low]() generic. 0006-0007 reimplements arm64 crashkernel=X. 0008-0009 adds memory for devices by DT property linux,usable-memory-range. 0010 updates the doc. Changes since [v16] - Because no functional changes in this version, so add "Tested-by: Dave Kleikamp " for patch 1-9 - Add "Reviewed-by: Rob Herring " for patch 8 - Update patch 9 based on the review comments of Rob Herring - As Catalin Marinas's suggestion, merge the implementation of ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the contents of X86 and ARM64 do not overlap, and reduce unnecessary temporary differences. Changes since [v15] - Aggregate the processing of "linux,usable-memory-range" into one function. Only patch 9-10 have been updated. Changes since [v14] - Recovering the requirement that the CrashKernel memory regions on X86 only requires 1 MiB alignment. - Combine patches 5 and 6 in v14 into one. The compilation warning fixed by patch 6 was introduced by patch 5 in v14. - As with crashk_res, crashk_low_res is also processed by crash_exclude_mem_range() in patch 7. - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling") has removed the architecture-specific code, extend the property "linux,usable-memory-range" in the platform-agnostic FDT core code. See patch 9. - Discard the x86 description update in the document, because the description has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide"). - Change "arm64" to "ARM64" in Doc. Changes since [v13] - Rebased on top of 5.11-rc5. - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL. Since reserve_crashkernel[_low]() implementations are quite similar on other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and select this by X86 and ARM64. - Some minor cleanup. Changes since [v12] - Rebased on top of 5.10-rc1. - Keep CRASH_ALIGN as 16M suggested by Dave. - Drop patch "kdump: add threshold for the required memory". - Add Tested-by from John. Changes since [v11] - Rebased on top of 5.9-rc4. - Make the function reserve_crashkernel() of x86 generic. Suggested by Catalin, make the function reserve_crashkernel() of x86 generic and arm64 use the generic version to reimplement crashkernel=X. Changes since [v10] - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin. Changes since [v9] - Patch 1 add Acked-by from Dave. - Update patch 5 according to Dave's comments. - Update chosen schema. Changes since [v8] - Reuse DT property "linux,usable-memory-range". Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low memory region. - Fix kdump broken with ZONE_DMA reintroduced. - Update chosen schema. Changes since [v7] - Move x86 CRASH_ALIGN to 2M Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M. - Update Documentation/devicetree/bindings/chosen.txt. Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd. - Add Tested-by from Jhon and pk. Changes since [v6] - Fix build errors reported by kbuild test robot. Changes since [v5] - Move reserve_crashkernel_low() into kernel/crash_core.c. - Delete crashkernel=X,high. - Modify crashkernel=X,low. If crashkernel=X,low is specified simulta
Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
On 12/14/21 10:31 AM, Christoph Hellwig wrote: On Mon, Dec 13, 2021 at 08:27:12PM +0800, Baoquan He wrote: Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled. However, it will fail if DMA zone has no managed pages. The failure can be seen in kdump kernel of x86_64 as below: Please just switch the sr allocation to use GFP_KERNEL without GFP_DMA. The block layer will do the proper bounce buffering underneath for the very unlikely case that we're actually using the single HBA driver that has ISA DMA addressing limitations. Same for the ch drive, btw. Hi, Is CONFIG_ZONE_DMA even needed anymore in x86_64 ? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages
On 12/23/21 3:44 AM, Baoquan He wrote: Currently three dma atomic pools are initialized as long as the relevant kernel codes are built in. While in kdump kernel of x86_64, this is not right when trying to create atomic_pool_dma, because there's no managed pages in DMA zone. In the case, DMA zone only has low 1M memory presented and locked down by memblock allocator. So no pages are added into buddy of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve the first 1M of RAM"). Then in kdump kernel of x86_64, it always prints below failure message: DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1 Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018 Call Trace: dump_stack+0x7f/0xa1 warn_alloc.cold+0x72/0xd6 ? _raw_spin_unlock_irq+0x24/0x40 ? __alloc_pages_direct_compact+0x90/0x1b0 __alloc_pages_slowpath.constprop.0+0xf29/0xf50 ? __cond_resched+0x16/0x50 ? prepare_alloc_pages.constprop.0+0x19d/0x1b0 __alloc_pages+0x24d/0x2c0 ? __dma_atomic_pool_init+0x93/0x93 alloc_page_interleave+0x13/0xb0 atomic_pool_expand+0x118/0x210 ? __dma_atomic_pool_init+0x93/0x93 __dma_atomic_pool_init+0x45/0x93 dma_atomic_pool_init+0xdb/0x176 do_one_initcall+0x67/0x320 ? rcu_read_lock_sched_held+0x3f/0x80 kernel_init_freeable+0x290/0x2dc ? rest_init+0x24f/0x24f kernel_init+0xa/0x111 ret_from_fork+0x22/0x30 Mem-Info: .. DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations Here, let's check if DMA zone has managed pages, then create atomic_pool_dma if yes. Otherwise just skip it. Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Cc: sta...@vger.kernel.org Signed-off-by: Baoquan He Acked-by: John Donnelly Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: io...@lists.linux-foundation.org --- kernel/dma/pool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 5a85804b5beb..00df3edd6c5d 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void) GFP_KERNEL); if (!atomic_pool_kernel) ret = -ENOMEM; - if (IS_ENABLED(CONFIG_ZONE_DMA)) { + if (has_managed_dma()) { atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size, GFP_KERNEL | GFP_DMA); if (!atomic_pool_dma) @@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp) if (prev == NULL) { if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32)) return atomic_pool_dma32; - if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA)) + if (atomic_pool_dma && (gfp & GFP_DMA)) return atomic_pool_dma; return atomic_pool_kernel; } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
On 12/23/21 3:44 AM, Baoquan He wrote: In kdump kernel of x86_64, page allocation failure is observed: kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5 Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013 Workqueue: events_unbound async_run_entry_fn Call Trace: dump_stack_lvl+0x48/0x5e warn_alloc.cold+0x72/0xd6 __alloc_pages_slowpath.constprop.0+0xc69/0xcd0 __alloc_pages+0x1df/0x210 new_slab+0x389/0x4d0 ___slab_alloc+0x58f/0x770 __slab_alloc.constprop.0+0x4a/0x80 kmem_cache_alloc_trace+0x24b/0x2c0 sr_probe+0x1db/0x620 .. device_add+0x405/0x920 .. __scsi_add_device+0xe5/0x100 ata_scsi_scan_host+0x97/0x1d0 async_run_entry_fn+0x30/0x130 process_one_work+0x1e8/0x3c0 worker_thread+0x50/0x3b0 ? rescuer_thread+0x350/0x350 kthread+0x16b/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 Mem-Info: .. The above failure happened when calling kmalloc() to allocate buffer with GFP_DMA. It requests to allocate slab page from DMA zone while no managed pages at all in there. sr_probe() --> get_capabilities() --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA); Because in the current kernel, dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified"). The failure can be always reproduced. For now, let's mute the warning of allocation failure if requesting pages from DMA zone while no managed pages. Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Cc: sta...@vger.kernel.org Signed-off-by: Baoquan He Acked-by: John Donnelly Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Vlastimil Babka --- mm/page_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7c7a0b5de2ff..843bc8e5550a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) va_list args; static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1); - if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs)) + if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) || + (gfp_mask & __GFP_DMA) && !has_managed_dma()) return; va_start(args, fmt); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 1/3] mm_zone: add function to check if managed dma zone exists
On 12/23/21 3:44 AM, Baoquan He wrote: In some places of the current kernel, it assumes that dma zone must have managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true. E.g in kdump kernel of x86_64, only low 1M is presented and locked down at very early stage of boot, so that there's no managed pages at all in DMA zone. This exception will always cause page allocation failure if page is requested from DMA zone. Here add function has_managed_dma() and the relevant helper functions to check if there's DMA zone with managed pages. It will be used in later patches. Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Cc: sta...@vger.kernel.org Signed-off-by: Baoquan He Reviewed-by: David Hildenbrand Acked-by: John Donnelly --- include/linux/mmzone.h | 9 + mm/page_alloc.c| 15 +++ 2 files changed, 24 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 58e744b78c2c..6e1b726e9adf 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1046,6 +1046,15 @@ static inline int is_highmem_idx(enum zone_type idx) #endif } +#ifdef CONFIG_ZONE_DMA +bool has_managed_dma(void); +#else +static inline bool has_managed_dma(void) +{ + return false; +} +#endif + /** * is_highmem - helper function to quickly check if a struct zone is a * highmem zone or not. This is an attempt to keep references diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..7c7a0b5de2ff 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -9460,3 +9460,18 @@ bool take_page_off_buddy(struct page *page) return ret; } #endif + +#ifdef CONFIG_ZONE_DMA +bool has_managed_dma(void) +{ + struct pglist_data *pgdat; + + for_each_online_pgdat(pgdat) { + struct zone *zone = &pgdat->node_zones[ZONE_DMA]; + + if (managed_zone(zone)) + return true; + } + return false; +} +#endif /* CONFIG_ZONE_DMA */ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v19 01/13] kdump: add helper parse_crashkernel_high_low()
On 12/28/21 7:26 AM, Zhen Lei wrote: The bootup command line option crashkernel=Y,low is valid only when crashkernel=X,high is specified. Putting their parsing into a separate function makes the code logic clearer and easier to understand the strong dependencies between them. Signed-off-by: Zhen Lei > Acked-by: John Donnelly --- include/linux/crash_core.h | 3 +++ kernel/crash_core.c| 35 +++ 2 files changed, 38 insertions(+) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index de62a722431e7db..2d3a64761d18998 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -83,5 +83,8 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); +int __init parse_crashkernel_high_low(char *cmdline, + unsigned long long *high_size, + unsigned long long *low_size); #endif /* LINUX_CRASH_CORE_H */ diff --git a/kernel/crash_core.c b/kernel/crash_core.c index eb53f5ec62c900f..8966beaf7c4fd52 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -295,6 +295,41 @@ int __init parse_crashkernel_low(char *cmdline, "crashkernel=", suffix_tbl[SUFFIX_LOW]); } +/** + * parse_crashkernel_high_low - Parsing "crashkernel=X,high" and possible + * "crashkernel=Y,low". + * @cmdline: The bootup command line. + * @high_size: Save the memory size specified by "crashkernel=X,high". + * @low_size: Save the memory size specified by "crashkernel=Y,low" or "-1" + * if it's not specified. + * + * Returns 0 on success, else a negative status code. + */ +int __init parse_crashkernel_high_low(char *cmdline, + unsigned long long *high_size, + unsigned long long *low_size) +{ + int ret; + unsigned long long base; + + BUG_ON(!high_size || !low_size); + + /* crashkernel=X,high */ + ret = parse_crashkernel_high(cmdline, 0, high_size, &base); + if (ret) + return ret; + + if (*high_size <= 0) + return -EINVAL; + + /* crashkernel=Y,low */ + ret = parse_crashkernel_low(cmdline, 0, low_size, &base); + if (ret) + *low_size = -1; + + return 0; +} + Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len) { ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v19 02/13] x86/setup: Use parse_crashkernel_high_low() to simplify code
On 12/28/21 7:26 AM, Zhen Lei wrote: Use parse_crashkernel_high_low() to bring the parsing of "crashkernel=X,high" and the parsing of "crashkernel=Y,low" together, they are strongly dependent, make code logic clear and more readable. Suggested-by: Borislav Petkov Signed-off-by: Zhen Lei > Acked-by: John Donnelly --- arch/x86/kernel/setup.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6a190c7f4d71b05..93d78aae1937db3 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -416,18 +416,16 @@ static void __init memblock_x86_reserve_range_setup_data(void) # define CRASH_ADDR_HIGH_MAX SZ_64T #endif -static int __init reserve_crashkernel_low(void) +static int __init reserve_crashkernel_low(unsigned long long low_size) { #ifdef CONFIG_X86_64 - unsigned long long base, low_base = 0, low_size = 0; + unsigned long long low_base = 0; unsigned long low_mem_limit; - int ret; low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base); - if (ret) { + /* crashkernel=Y,low is not specified */ + if ((long)low_size < 0) { /* * two parts from kernel/dma/swiotlb.c: * -swiotlb size: user-specified with swiotlb= or default. @@ -465,7 +463,7 @@ static int __init reserve_crashkernel_low(void) static void __init reserve_crashkernel(void) { - unsigned long long crash_size, crash_base, total_mem; + unsigned long long crash_size, crash_base, total_mem, low_size; bool high = false; int ret; @@ -474,10 +472,9 @@ static void __init reserve_crashkernel(void) /* crashkernel=XM */ ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base); if (ret != 0 || crash_size <= 0) { - /* crashkernel=X,high */ - ret = parse_crashkernel_high(boot_command_line, total_mem, -&crash_size, &crash_base); - if (ret != 0 || crash_size <= 0) + /* crashkernel=X,high and possible crashkernel=Y,low */ + ret = parse_crashkernel_high_low(boot_command_line, &crash_size, &low_size); + if (ret) return; high = true; } @@ -520,7 +517,7 @@ static void __init reserve_crashkernel(void) } } - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { + if (crash_base >= (1ULL << 32) && reserve_crashkernel_low(low_size)) { memblock_phys_free(crash_base, crash_size); return; } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v19 03/13] kdump: make parse_crashkernel_{high|low}() static
On 12/28/21 7:26 AM, Zhen Lei wrote: Make parse_crashkernel_{high|low}() static, they are only referenced by parse_crashkernel_high_low() in the same file. The latter is recommended. Signed-off-by: Zhen Lei > Acked-by: John Donnelly --- include/linux/crash_core.h | 4 kernel/crash_core.c| 4 ++-- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 2d3a64761d18998..598fd55d83c169e 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -79,10 +79,6 @@ void final_note(Elf_Word *buf); int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); int __init parse_crashkernel_high_low(char *cmdline, unsigned long long *high_size, unsigned long long *low_size); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 8966beaf7c4fd52..3b9e01fc450b2a4 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -277,7 +277,7 @@ int __init parse_crashkernel(char *cmdline, "crashkernel=", NULL); } -int __init parse_crashkernel_high(char *cmdline, +static int __init parse_crashkernel_high(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base) @@ -286,7 +286,7 @@ int __init parse_crashkernel_high(char *cmdline, "crashkernel=", suffix_tbl[SUFFIX_HIGH]); } -int __init parse_crashkernel_low(char *cmdline, +static int __init parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v19 04/13] kdump: reduce unnecessary parameters of parse_crashkernel_{high|low}()
On 12/28/21 7:26 AM, Zhen Lei wrote: Delete confusing parameters 'system_ram' and 'crash_base' of parse_crashkernel_{high|low}(), they are only needed by the case of "crashkernel=X@[offset]". Signed-off-by: Zhen Lei > Acked-by: John Donnelly --- kernel/crash_core.c | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 3b9e01fc450b2a4..b7d024eb464d0ae 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -278,20 +278,20 @@ int __init parse_crashkernel(char *cmdline, } static int __init parse_crashkernel_high(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) +unsigned long long *crash_size) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, + unsigned long long base; + + return __parse_crashkernel(cmdline, 0, crash_size, &base, "crashkernel=", suffix_tbl[SUFFIX_HIGH]); } static int __init parse_crashkernel_low(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) + unsigned long long *crash_size) { - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, + unsigned long long base; + + return __parse_crashkernel(cmdline, 0, crash_size, &base, "crashkernel=", suffix_tbl[SUFFIX_LOW]); } @@ -310,12 +310,11 @@ int __init parse_crashkernel_high_low(char *cmdline, unsigned long long *low_size) { int ret; - unsigned long long base; BUG_ON(!high_size || !low_size); /* crashkernel=X,high */ - ret = parse_crashkernel_high(cmdline, 0, high_size, &base); + ret = parse_crashkernel_high(cmdline, high_size); if (ret) return ret; @@ -323,7 +322,7 @@ int __init parse_crashkernel_high_low(char *cmdline, return -EINVAL; /* crashkernel=Y,low */ - ret = parse_crashkernel_low(cmdline, 0, low_size, &base); + ret = parse_crashkernel_low(cmdline, low_size); if (ret) *low_size = -1; ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v19 05/13] x86/setup: Add and use CRASH_BASE_ALIGN
On 12/28/21 7:26 AM, Zhen Lei wrote: Add macro CRASH_BASE_ALIGN to indicate the alignment for crash kernel fixed region, in preparation for making partial implementation of reserve_crashkernel[_low]() generic. Signed-off-by: Zhen Lei > Acked-by: John Donnelly --- arch/x86/kernel/setup.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 93d78aae1937db3..cb7f237a2ae0dfa 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -392,9 +392,12 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE -/* 16M alignment for crash kernel regions */ +/* alignment for crash kernel dynamic regions */ #define CRASH_ALIGN SZ_16M +/* alignment for crash kernel fixed region */ +#define CRASH_BASE_ALIGN SZ_1M + /* * Keep the crash kernel below this limit. * @@ -509,7 +512,7 @@ static void __init reserve_crashkernel(void) } else { unsigned long long start; - start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base, + start = memblock_phys_alloc_range(crash_size, CRASH_BASE_ALIGN, crash_base, crash_base + crash_size); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v19 06/13] kexec: move crashk[_low]_res to crash_core module
On 12/28/21 7:26 AM, Zhen Lei wrote: From: Chen Zhou Move the definition and declaration of global variable crashk[_low]_res from kexec module to crash_core module, in preparation of adding generic reserve_crashkernel_mem[_low]() to crash_core.c, the latter refers to variable crashk[_low]_res. Due to the config KEXEC automatically selects CRASH_CORE, and the header crash_core.h is included by kexec.h, so there is no functional change. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei > Acked-by: John Donnelly --- include/linux/crash_core.h | 4 include/linux/kexec.h | 4 kernel/crash_core.c| 16 kernel/kexec_core.c| 17 - 4 files changed, 20 insertions(+), 21 deletions(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 598fd55d83c169e..f5437c9c9411fce 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -73,6 +73,10 @@ extern unsigned char *vmcoreinfo_data; extern size_t vmcoreinfo_size; extern u32 *vmcoreinfo_note; +/* Location of a reserved region to hold the crash kernel. */ +extern struct resource crashk_res; +extern struct resource crashk_low_res; + Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len); void final_note(Elf_Word *buf); diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0c994ae37729e1e..47e784d66ea8645 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -350,10 +350,6 @@ extern int kexec_load_disabled; #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \ KEXEC_FILE_NO_INITRAMFS) -/* Location of a reserved region to hold the crash kernel. - */ -extern struct resource crashk_res; -extern struct resource crashk_low_res; extern note_buf_t __percpu *crash_notes; /* flag to track if kexec reboot is in progress */ diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b7d024eb464d0ae..686d8a65e12a337 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -22,6 +22,22 @@ u32 *vmcoreinfo_note; /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */ static unsigned char *vmcoreinfo_data_safecopy; +/* Location of the reserved area for the crash kernel */ +struct resource crashk_res = { + .name = "Crash kernel", + .start = 0, + .end = 0, + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM, + .desc = IORES_DESC_CRASH_KERNEL +}; +struct resource crashk_low_res = { + .name = "Crash kernel", + .start = 0, + .end = 0, + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM, + .desc = IORES_DESC_CRASH_KERNEL +}; + /* * parsing the "crashkernel" commandline * diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 5a5d192a89ac307..1e0d4909bbb6b77 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -54,23 +54,6 @@ note_buf_t __percpu *crash_notes; /* Flag to indicate we are going to kexec a new kernel */ bool kexec_in_progress = false; - -/* Location of the reserved area for the crash kernel */ -struct resource crashk_res = { - .name = "Crash kernel", - .start = 0, - .end = 0, - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM, - .desc = IORES_DESC_CRASH_KERNEL -}; -struct resource crashk_low_res = { - .name = "Crash kernel", - .start = 0, - .end = 0, - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM, - .desc = IORES_DESC_CRASH_KERNEL -}; - int kexec_should_crash(struct task_struct *p) { /* ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 0/3] Handle warning of allocation failure on DMA zone w/o managed pages
On 12/23/21 3:44 AM, Baoquan He wrote: **Problem observed: On x86_64, when crash is triggered and entering into kdump kernel, page allocation failure can always be seen. - DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 1 Comm: swapper/0 Call Trace: dump_stack+0x7f/0xa1 warn_alloc.cold+0x72/0xd6 .. __alloc_pages+0x24d/0x2c0 .. dma_atomic_pool_init+0xdb/0x176 do_one_initcall+0x67/0x320 ? rcu_read_lock_sched_held+0x3f/0x80 kernel_init_freeable+0x290/0x2dc ? rest_init+0x24f/0x24f kernel_init+0xa/0x111 ret_from_fork+0x22/0x30 Mem-Info: ***Root cause: In the current kernel, it assumes that DMA zone must have managed pages and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not always true. E.g in kdump kernel of x86_64, only low 1M is presented and locked down at very early stage of boot, so that this low 1M won't be added into buddy allocator to become managed pages of DMA zone. This exception will always cause page allocation failure if page is requested from DMA zone. ***Investigation: This failure happens since below commit merged into linus's tree. 1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options 23721c8e92f7 x86/crash: Remove crash_reserve_low_1M() f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM 7c321eb2b843 x86/kdump: Remove the backup region handling 6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified Before them, on x86_64, the low 640K area will be reused by kdump kernel. So in kdump kernel, the content of low 640K area is copied into a backup region for dumping before jumping into kdump. Then except of those firmware reserved region in [0, 640K], the left area will be added into buddy allocator to become available managed pages of DMA zone. However, after above commits applied, in kdump kernel of x86_64, the low 1M is reserved by memblock, but not released to buddy allocator. So any later page allocation requested from DMA zone will fail. At the beginning, if crashkernel is reserved, the low 1M need be locked down because AMD SME encrypts memory making the old backup region mechanims impossible when switching into kdump kernel. Later, it was also observed that there are BIOSes corrupting memory under 1M. To solve this, in commit f1d4d47c5851, the entire region of low 1M is always reserved after the real mode trampoline is allocated. Besides, recently, Intel engineer mentioned their TDX (Trusted domain extensions) which is under development in kernel also needs to lock down the low 1M. So we can't simply revert above commits to fix the page allocation failure from DMA zone as someone suggested. ***Solution: Currently, only DMA atomic pool and dma-kmalloc will initialize and request page allocation with GFP_DMA during bootup. So only initializ DMA atomic pool when DMA zone has available managed pages, otherwise just skip the initialization. For dma-kmalloc(), for the time being, let's mute the warning of allocation failure if requesting pages from DMA zone while no manged pages. Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc() if not necessary. Christoph is posting patches to fix those under drivers/scsi/. Finally, we can remove the need of dma-kmalloc() as people suggested. Changelog: v3->v4: - Split the old v3 into two separate patchset. The first two clean up/improvement patches in v3 have been sent out in a independent patchset. The fixes patchs are adapted and sent in this patchset. - Do not change dma-kmalloc(), mute the warning of allocation failure instead if it's requesting page from DMA zone which has no managed pages. v2-Resend -> v3: - Re-implement has_managed_dma() according to David's suggestion. - Add Fixes tag and cc stable. v2->v2 RESEND: - John pinged to push the repost of this patchset. So fix one typo of suject of patch 3/5; Fix a building error caused by mix declaration in patch 5/5. Both of them are found by John from his testing. - Rewrite cover letter to add more information. v1->v2: Change to check if managed DMA zone exists. If DMA zone has managed pages, go further to request page from DMA zone to initialize. Otherwise, just skip to initialize stuffs which need pages from DMA zone. v3: https://urldefense.com/v3/__https://lore.kernel.org/all/20211213122712.23805-1-...@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpS2Y0ecPm$ V2 RESEND post: https://urldefense.com/v3/__https://lore.kernel.org/all/20211207030750.30824-1-...@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUi
Re: [PATCH v20 1/5] arm64: Use insert_resource() to simplify code
On 1/24/22 2:47 AM, Zhen Lei wrote: insert_resource() traverses the subtree layer by layer from the root node until a proper location is found. Compared with request_resource(), the parent node does not need to be determined in advance. In addition, move the insertion of node 'crashk_res' into function reserve_crashkernel() to make the associated code close together. Signed-off-by: Zhen Lei Acked-by: John Donnelly --- arch/arm64/kernel/setup.c | 17 +++-- arch/arm64/mm/init.c | 1 + 2 files changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index f70573928f1bff0..a81efcc359e4e78 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -225,6 +225,8 @@ static void __init request_standard_resources(void) kernel_code.end = __pa_symbol(__init_begin - 1); kernel_data.start = __pa_symbol(_sdata); kernel_data.end = __pa_symbol(_end - 1); + insert_resource(&iomem_resource, &kernel_code); + insert_resource(&iomem_resource, &kernel_data); num_standard_resources = memblock.memory.cnt; res_size = num_standard_resources * sizeof(*standard_resources); @@ -246,20 +248,7 @@ static void __init request_standard_resources(void) res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; } - request_resource(&iomem_resource, res); - - if (kernel_code.start >= res->start && - kernel_code.end <= res->end) - request_resource(res, &kernel_code); - if (kernel_data.start >= res->start && - kernel_data.end <= res->end) - request_resource(res, &kernel_data); -#ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ - if (crashk_res.end && crashk_res.start >= res->start && - crashk_res.end <= res->end) - request_resource(res, &crashk_res); -#endif + insert_resource(&iomem_resource, res); } } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index db63cc885771a52..90f276d46b93bc6 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -109,6 +109,7 @@ static void __init reserve_crashkernel(void) kmemleak_ignore_phys(crash_base); crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; + insert_resource(&iomem_resource, &crashk_res); } #else static void __init reserve_crashkernel(void) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v20 2/5] arm64: kdump: introduce some macros for crash kernel reservation
On 1/24/22 2:47 AM, Zhen Lei wrote: From: Chen Zhou Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for upper bound of high crash memory, use macros instead. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Tested-by: John Donnelly Tested-by: Dave Kleikamp Acked-by: John Donnelly --- arch/arm64/mm/init.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 90f276d46b93bc6..6c653a2c7cff052 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -65,6 +65,12 @@ EXPORT_SYMBOL(memstart_addr); phys_addr_t arm64_dma_phys_limit __ro_after_init; #ifdef CONFIG_KEXEC_CORE +/* Current arm64 boot protocol requires 2MB alignment */ +#define CRASH_ALIGNSZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE + /* * reserve_crashkernel() - reserves memory for crash kernel * @@ -75,7 +81,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -90,8 +96,7 @@ static void __init reserve_crashkernel(void) if (crash_base) crash_max = crash_base + crash_size; - /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M, + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v20 3/5] arm64: kdump: reimplement crashkernel=X
On 1/24/22 2:47 AM, Zhen Lei wrote: From: Chen Zhou There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically. "crashkernel=Y,low" can be used to allocate specified size low memory. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei Acked-by: John Donnelly --- arch/arm64/kernel/machine_kexec.c | 9 +++- arch/arm64/kernel/machine_kexec_file.c | 12 - arch/arm64/mm/init.c | 68 -- 3 files changed, 81 insertions(+), 8 deletions(-) diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index e16b248699d5c3c..19c2d487cb08feb 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -329,8 +329,13 @@ bool crash_is_nosave(unsigned long pfn) /* in reserved memory? */ addr = __pfn_to_phys(pfn); - if ((addr < crashk_res.start) || (crashk_res.end < addr)) - return false; + if ((addr < crashk_res.start) || (crashk_res.end < addr)) { + if (!crashk_low_res.end) + return false; + + if ((addr < crashk_low_res.start) || (crashk_low_res.end < addr)) + return false; + } if (!kexec_crash_image) return true; diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 59c648d51848886..889951291cc0f9c 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 6c653a2c7cff052..a5d43feac0d7d96 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -71,6 +71,30 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; #define CRASH_ADDR_LOW_MAXarm64_dma_phys_limit #define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE +static int __init reserve_crashkernel_low(unsigned long long low_size) +{ + unsigned long long low_base; + + /* passed with crashkernel=0,low ? */ + if (!low_size) + return 0; + + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + if (!low_base) { + pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size); + return -ENOMEM; + } + + pr_info("crashkernel low memory reserved: 0x%llx - 0x%llx (%lld MB)\n", + low_base, low_base + low_size, low_size >> 20); + + crashk_low_res.start = low_base; + crashk_low_res.end = low_base + low_size - 1; + insert_resource(&iomem_resource, &crashk_low_res); + + return 0; +} + /* * reserve_crashkernel() - reserves memory for crash kernel * @@ -81,29 +105,62 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; + unsigned long long crash_low_size = SZ_256M; unsigned long long crash_max = CRASH_ADDR_LOW_MAX; int ret; + bool fixed_base; + char *cmdline = boot_command_line; - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), + /* crashkernel=X[@offset] */ + ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), &crash_size, &crash_base); - /* no crashkernel= or invalid value specified */ - if (ret || !crash_size) - return; + if (ret || !crash_size) { + unsigned long long low_size; + /* crashkernel=X,high */ + ret = parse_crashkernel_high(cmdline, 0, &crash_size, &crash_base); + if (ret || !crash_size) + return; + + /* crashkernel=X,lo
Re: [PATCH v20 4/5] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
On 1/24/22 2:47 AM, Zhen Lei wrote: From: Chen Zhou When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou Co-developed-by: Zhen Lei Signed-off-by: Zhen Lei Reviewed-by: Rob Herring Tested-by: Dave Kleikamp Acked-by: John Donnelly --- drivers/of/fdt.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index ad85ff6474ff139..df4b9d2418a13d4 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -973,16 +973,24 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node) static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; +/* + * The main usage of linux,usable-memory-range is for crash dump kernel. + * Originally, the number of usable-memory regions is one. Now there may + * be two regions, low region and high region. + * To make compatibility with existing user-space and older kdump, the low + * region is always the last range of linux,usable-memory-range if exist. + */ +#define MAX_USABLE_RANGES 2 + /** * early_init_dt_check_for_usable_mem_range - Decode usable memory range * location from flat tree */ void __init early_init_dt_check_for_usable_mem_range(void) { - const __be32 *prop; - int len; - phys_addr_t cap_mem_addr; - phys_addr_t cap_mem_size; + struct memblock_region rgn[MAX_USABLE_RANGES] = {0}; + const __be32 *prop, *endp; + int len, i; unsigned long node = chosen_node_offset; if ((long)node < 0) @@ -991,16 +999,21 @@ void __init early_init_dt_check_for_usable_mem_range(void) pr_debug("Looking for usable-memory-range property... "); prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len); - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells))) + if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells))) return; - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop); + endp = prop + (len / sizeof(__be32)); + for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) { + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop); + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop); - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr, -&cap_mem_size); + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n", +i, &rgn[i].base, &rgn[i].size); + } - memblock_cap_memory_range(cap_mem_addr, cap_mem_size); + memblock_cap_memory_range(rgn[0].base, rgn[0].size); + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++) + memblock_add(rgn[i].base, rgn[i].size); } #ifdef CONFIG_SERIAL_EARLYCON ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v20 5/5] kdump: update Documentation about crashkernel
On 1/24/22 2:47 AM, Zhen Lei wrote: From: Chen Zhou For arm64, the behavior of crashkernel=X has been changed, which tries low allocation in DMA zone and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically and "crashkernel=Y,low" can be used to allocate specified size low memory. So update the Documentation. Signed-off-by: Chen Zhou Signed-off-by: Zhen Lei Acked-by: John Donnelly --- Documentation/admin-guide/kdump/kdump.rst | 11 +-- Documentation/admin-guide/kernel-parameters.txt | 11 +-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index cb30ca3df27c9b2..d4c287044be0c70 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -361,8 +361,15 @@ Boot into System Kernel kernel will automatically locate the crash kernel image within the first 512MB of RAM if X is not given. - On arm64, use "crashkernel=Y[@X]". Note that the start address of - the kernel, X if explicitly specified, must be aligned to 2MiB (0x20). + On arm64, use "crashkernel=X" to try low allocation in DMA zone and + fall back to high allocation if it fails. + We can also use "crashkernel=X,high" to select a high region above + DMA zone, which also tries to allocate at least 256M low memory in + DMA zone automatically. + "crashkernel=Y,low" can be used to allocate specified size low memory. + Use "crashkernel=Y@X" if you really have to reserve memory from + specified start address X. Note that the start address of the kernel, + X if explicitly specified, must be aligned to 2MiB (0x20). Load the Dump-capture Kernel diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index f5a27f067db9ed9..65780c2ca830be0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -792,6 +792,9 @@ [KNL, X86-64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. + [KNL, ARM64] Try low allocation in DMA zone and fall back + to high allocation if it fails when '@offset' hasn't been + specified. See Documentation/admin-guide/kdump/kdump.rst for further details. crashkernel=range1:size1[,range2:size2,...][@offset] @@ -808,6 +811,8 @@ Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. + [KNL, ARM64] range in high memory. + Allow kernel to allocate physical memory region from top. crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region @@ -816,13 +821,15 @@ requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate at - at least 256M below 4G automatically. + least 256M below 4G automatically. This one let user to specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. - + [KNL, ARM64] range in low memory. + This one let user to specify a low range in DMA zone for + crash dump kernel. cryptomgr.notests [KNL] Disable crypto self-tests ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec