Re: [PATCH v3 0/2] arm64: kdump: Function supplement and performance optimization

2022-10-06 Thread john . p . donnelly

On 8/1/22 9:47 PM, Leizhen (ThunderTown) wrote:



On 2022/8/1 16:20, Baoquan He wrote:

Hi Catalin,

On 07/11/22 at 05:03pm, Zhen Lei wrote:

v2 --> v3:
1. Discard patch 3 in v2, a cleanup patch.

v1 --> v2:
1. Update the commit message of Patch 1, explicitly indicates that 
"crashkernel=X,high"
is specified but "crashkernel=Y,low" is not specified.
2. Drop Patch 4-5. Currently, focus on function integrity, performance 
optimization
will be considered in later versions.
3. Patch 3 is not mandatory, it's just a cleanup now, although it is a must for 
patch 4-5.
But to avoid subsequent duplication of effort, I'm glad it was accepted.


v1:
After the basic functions of "support reserving crashkernel above 4G on arm64
kdump"(see 
https://urldefense.com/v3/__https://lkml.org/lkml/2022/5/6/428__;!!ACWV5N9M2RV99hQ!ORBFa4UAmMss_79nuwu1kpW3D-mTela240vFo0FXOuV9QpGWy7Fp2H81ZjLPOuaufAQC_XBFEFGjAqs5njfGS6Rd4dZLhaez$
 ) are implemented, we still have
three features to be improved.
1. When crashkernel=X,high is specified but crashkernel=Y,low is not specified,
the default crash low memory size is provided.
2. For crashkernel=X without '@offset', if the low memory fails to be allocated,
fall back to reserve region from high memory(above DMA zones).
3. If crashkernel=X,high is used, page mapping is performed only for the crash
high memory, and block mapping is still used for other linear address 
spaces.
Compared to the previous version:
(1) For crashkernel=X[@offset], the memory above 4G is not changed to block
mapping, leave it to the next time.
(2) The implementation method is modified. Now the implementation is simpler
and clearer.


Do you have plan to pick this series so that it can be taken into 5.20
rc-1~3?


Hi, Catalin:
   Only function reserve_crashkernel() is modified in these two patches. The 
core
process of the arm64 architecture is not affected. I remember you suggested that
arm64 and x86 share the same kdump code, so these two subfeatures are needed.
Maybe we can lay the foundation first for the people who build the road. 
Unifying
the external interfaces of kdump on arm64 and x86 does not seem to hurt.




We have back ported the basic crashkernel=high, low, support into our
distros and have taken wide testing on arm64 servers, need this patchset
to back port for more testing.

Thanks
Baoquan



Zhen Lei (2):
   arm64: kdump: Provide default size when crashkernel=Y,low is not
 specified
   arm64: kdump: Support crashkernel=X fall back to reserve region above
 DMA zones

  .../admin-guide/kernel-parameters.txt | 10 ++-
  arch/arm64/mm/init.c  | 28 +--
  2 files changed, 28 insertions(+), 10 deletions(-)

--
2.25.1



.




Hi ,

What is the progress of this series ?

Without this patch set we are seeing  larger crashkernel=896M failures 
on Arm  with Linux-6.0.rc7.  This larger value is needed for

iSCSI booted systems with certain network adapters.


Thank you,
John.






___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 0/2] arm64: kdump: Function supplement and performance optimization

2022-10-26 Thread john . p . donnelly

On 10/14/22 11:29 AM, Catalin Marinas wrote:

On Thu, Oct 13, 2022 at 06:46:35PM +0800, Baoquan He wrote:

On 10/06/22 at 09:55am, john.p.donne...@oracle.com wrote:

What is the progress of this series ?

Without this patch set we are seeing  larger crashkernel=896M failures on
Arm  with Linux-6.0.rc7.  This larger value is needed for
iSCSI booted systems with certain network adapters.


This change is located in arch/arm64 folder, I have pinged arm64
maintainer to consider merging this patchset. Not sure if they are
still thinking, or ignore this.

Hi Catalin, Will,

Ping again!

Do you have plan to accept this patchset? It's very important for
crashkernel setting on arm64 with a simple and default syntax.


I'll look at it once the merging window closes. I saw discussions on
this thread and I ignored it until you all agreed ;).




Hi,

Do you have a timeline for this ?  This crashkernel > 4G for Arm item 
has been lingering for 3 years. I think it is time for it to be 
incorporated.



Thanks,

John.





___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/1] kernel/crash_core.c - Add crashkernel=auto for x86 and ARM

2021-01-21 Thread john . p . donnelly

On 11/22/20 9:47 PM, Dave Young wrote:

Hi Guilherme,
On 11/22/20 at 12:32pm, Guilherme Piccoli wrote:

Hi Dave and Kairui, thanks for your responses! OK, if that makes sense
to you I'm fine with it. I'd just recommend to test recent kernels in
multiple distros with the minimum "range" to see if 64M is enough for
crashkernel, maybe we'd need to bump that.


Giving the different kernel configs and the different userspace
initramfs setup it is hard to get an uniform value for all distributions,
but we can have an interface/kconfig-option for them to provide a value like 
this patch
is doing. And it could be improved like Kairui said about some known
kernel added extra values later, probably some more improvements if
doable.

Thanks
Dave



Hi.

Are we going to move forward with implementing this for X86 and Arm ?

If other platform maintainers want to include this CONFIG option in 
their configuration settings they have a starting point.


Thank you,

John.

( I am not currently on many of the included dist lists  in this email, 
so hopefully key contributors are included in this exchange )


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation

2021-02-17 Thread john . p . donnelly

On 2/11/21 12:08 PM, Saeed Mirzamohammadi wrote:

This adds crashkernel=auto feature to configure reserved memory for
vmcore creation. CONFIG_CRASH_AUTO_STR is defined to be set for
different kernel distributions and different archs based on their
needs.

Signed-off-by: Saeed Mirzamohammadi 
Signed-off-by: John Donnelly 
Tested-by: John Donnelly 
---
  Documentation/admin-guide/kdump/kdump.rst |  3 ++-
  .../admin-guide/kernel-parameters.txt |  6 +
  arch/Kconfig  | 24 +++
  kernel/crash_core.c   |  7 ++
  4 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index 2da65fef2a1c..e55cdc404c6b 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -285,7 +285,8 @@ This would mean:
  2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
  3) if the RAM size is larger than 2G, then reserve 128M
  
-

+Or you can use crashkernel=auto to choose the crash kernel memory size
+based on the recommended configuration set for each arch.
  
  Boot into System Kernel

  ===
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 7d4e523646c3..aa2099465458 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -736,6 +736,12 @@
a memory unit (amount[KMG]). See also
Documentation/admin-guide/kdump/kdump.rst for an 
example.
  
+	crashkernel=auto

+   [KNL] This parameter will set the reserved memory for
+   the crash kernel based on the value of the 
CRASH_AUTO_STR
+   that is the best effort estimation for each arch. See 
also
+   arch/Kconfig for further details.
+
crashkernel=size[KMG],high
[KNL, X86-64] range could be above 4G. Allow kernel
to allocate physical memory region from top, so could
diff --git a/arch/Kconfig b/arch/Kconfig
index af14a567b493..f87c88ffa2f8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,30 @@ menu "General architecture-dependent options"
  config CRASH_CORE
bool
  
+if CRASH_CORE

+
+config CRASH_AUTO_STR
+   string "Memory reserved for crash kernel"
+   depends on CRASH_CORE
+   default "1G-64G:128M,64G-1T:256M,1T-:512M"
+   help
+ This configures the reserved memory dependent
+ on the value of System RAM. The syntax is:
+ crashkernel=:[,:,...][@offset]
+ range=start-[end]
+
+ For example:
+ crashkernel=512M-2G:64M,2G-:128M
+
+ This would mean:
+
+ 1) if the RAM is smaller than 512M, then don't reserve anything
+(this is the "rescue" case)
+ 2) if the RAM size is between 512M and 2G (exclusive), then 
reserve 64M
+ 3) if the RAM size is larger than 2G, then reserve 128M
+
+endif # CRASH_CORE
+
  config KEXEC_CORE
select CRASH_CORE
bool
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 106e4500fd53..ab0a2b4b1ffa 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -7,6 +7,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -250,6 +251,12 @@ static int __init __parse_crashkernel(char *cmdline,
if (suffix)
return parse_crashkernel_suffix(ck_cmdline, crash_size,
suffix);
+#ifdef CONFIG_CRASH_AUTO_STR
+   if (strncmp(ck_cmdline, "auto", 4) == 0) {
+   ck_cmdline = CONFIG_CRASH_AUTO_STR;
+   pr_info("Using crashkernel=auto, the size chosen is a best effort 
estimation.\n");
+   }
+#endif
/*
 * if the commandline contains a ':', then that's the extended
 * syntax -- if not, it must be the classic syntax




Hello.

Ping.

Can we get this reviewed and staged ?

Thank you.

John.



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation

2021-02-22 Thread john . p . donnelly

On 2/17/21 8:02 PM, Baoquan He wrote:

On 02/11/21 at 10:08am, Saeed Mirzamohammadi wrote:

This adds crashkernel=auto feature to configure reserved memory for
vmcore creation. CONFIG_CRASH_AUTO_STR is defined to be set for
different kernel distributions and different archs based on their
needs.

Signed-off-by: Saeed Mirzamohammadi 
Signed-off-by: John Donnelly 
Tested-by: John Donnelly 
---
  Documentation/admin-guide/kdump/kdump.rst |  3 ++-
  .../admin-guide/kernel-parameters.txt |  6 +
  arch/Kconfig  | 24 +++
  kernel/crash_core.c   |  7 ++
  4 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index 2da65fef2a1c..e55cdc404c6b 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -285,7 +285,8 @@ This would mean:
  2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
  3) if the RAM size is larger than 2G, then reserve 128M
  
-

+Or you can use crashkernel=auto to choose the crash kernel memory size
+based on the recommended configuration set for each arch.
  
  Boot into System Kernel

  ===
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 7d4e523646c3..aa2099465458 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -736,6 +736,12 @@
a memory unit (amount[KMG]). See also
Documentation/admin-guide/kdump/kdump.rst for an 
example.
  
+	crashkernel=auto

+   [KNL] This parameter will set the reserved memory for
+   the crash kernel based on the value of the 
CRASH_AUTO_STR
+   that is the best effort estimation for each arch. See 
also
+   arch/Kconfig for further details.
+
crashkernel=size[KMG],high
[KNL, X86-64] range could be above 4G. Allow kernel
to allocate physical memory region from top, so could
diff --git a/arch/Kconfig b/arch/Kconfig
index af14a567b493..f87c88ffa2f8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,30 @@ menu "General architecture-dependent options"
  config CRASH_CORE
bool
  
+if CRASH_CORE

+
+config CRASH_AUTO_STR
+   string "Memory reserved for crash kernel"
+   depends on CRASH_CORE
+   default "1G-64G:128M,64G-1T:256M,1T-:512M"
+   help
+ This configures the reserved memory dependent
+ on the value of System RAM. The syntax is:
+ crashkernel=:[,:,...][@offset]
+ range=start-[end]
+
+ For example:
+ crashkernel=512M-2G:64M,2G-:128M
+
+ This would mean:
+
+ 1) if the RAM is smaller than 512M, then don't reserve anything
+(this is the "rescue" case)
+ 2) if the RAM size is between 512M and 2G (exclusive), then 
reserve 64M
+ 3) if the RAM size is larger than 2G, then reserve 128M
+
+endif # CRASH_CORE


Wondering if this CRASH_CORE ifdeffery is a little redundent here
since CRASH_CORE dependency has been added. Except of this, I like this
patch. As we discussed in private threads, we can try to push it into
mainline and continue improving later.


Hi,

Are we good to move forward with this and apply it now?

Dave Young acked it.

Thank you,

John.

(Note - I am currently not on any vger.kernel.org dlist at the moment so 
please cc me ).








+
  config KEXEC_CORE
select CRASH_CORE
bool
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 106e4500fd53..ab0a2b4b1ffa 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -7,6 +7,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -250,6 +251,12 @@ static int __init __parse_crashkernel(char *cmdline,
if (suffix)
return parse_crashkernel_suffix(ck_cmdline, crash_size,
suffix);
+#ifdef CONFIG_CRASH_AUTO_STR
+   if (strncmp(ck_cmdline, "auto", 4) == 0) {
+   ck_cmdline = CONFIG_CRASH_AUTO_STR;
+   pr_info("Using crashkernel=auto, the size chosen is a best effort 
estimation.\n");
+   }
+#endif
/*
 * if the commandline contains a ':', then that's the extended
 * syntax -- if not, it must be the classic syntax
--
2.27.0






___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation

2021-03-03 Thread john . p . donnelly

On 2/25/21 6:38 PM, Dave Young wrote:

On 02/23/21 at 09:41am, Saeed Mirzamohammadi wrote:

This adds crashkernel=auto feature to configure reserved memory for
vmcore creation. CONFIG_CRASH_AUTO_STR is defined to be set for
different kernel distributions and different archs based on their
needs.

Signed-off-by: Saeed Mirzamohammadi 
Signed-off-by: John Donnelly 
Tested-by: John Donnelly 
---
  Documentation/admin-guide/kdump/kdump.rst |  3 ++-
  .../admin-guide/kernel-parameters.txt |  6 ++
  arch/Kconfig  | 20 +++
  kernel/crash_core.c   |  7 +++
  4 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index 75a9dd98e76e..ae030111e22a 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -285,7 +285,8 @@ This would mean:
  2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
  3) if the RAM size is larger than 2G, then reserve 128M
  
-

+Or you can use crashkernel=auto to choose the crash kernel memory size
+based on the recommended configuration set for each arch.
  
  Boot into System Kernel

  ===
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9e3cdb271d06..a5deda5c85fe 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -747,6 +747,12 @@
a memory unit (amount[KMG]). See also
Documentation/admin-guide/kdump/kdump.rst for an 
example.
  
+	crashkernel=auto

+   [KNL] This parameter will set the reserved memory for
+   the crash kernel based on the value of the 
CRASH_AUTO_STR
+   that is the best effort estimation for each arch. See 
also
+   arch/Kconfig for further details.
+
crashkernel=size[KMG],high
[KNL, X86-64] range could be above 4G. Allow kernel
to allocate physical memory region from top, so could
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a3..23d047548772 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,26 @@ menu "General architecture-dependent options"
  config CRASH_CORE
bool
  
+config CRASH_AUTO_STR

+   string "Memory reserved for crash kernel"
+   depends on CRASH_CORE
+   default "1G-64G:128M,64G-1T:256M,1T-:512M"
+   help
+ This configures the reserved memory dependent
+ on the value of System RAM. The syntax is:
+ crashkernel=:[,:,...][@offset]
+ range=start-[end]
+
+ For example:
+ crashkernel=512M-2G:64M,2G-:128M
+
+ This would mean:
+
+ 1) if the RAM is smaller than 512M, then don't reserve anything
+(this is the "rescue" case)
+ 2) if the RAM size is between 512M and 2G (exclusive), then 
reserve 64M
+ 3) if the RAM size is larger than 2G, then reserve 128M
+
  config KEXEC_CORE
select CRASH_CORE
bool
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 825284baaf46..90f9e4bb6704 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -7,6 +7,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -250,6 +251,12 @@ static int __init __parse_crashkernel(char *cmdline,
if (suffix)
return parse_crashkernel_suffix(ck_cmdline, crash_size,
suffix);
+#ifdef CONFIG_CRASH_AUTO_STR
+   if (strncmp(ck_cmdline, "auto", 4) == 0) {
+   ck_cmdline = CONFIG_CRASH_AUTO_STR;
+   pr_info("Using crashkernel=auto, the size chosen is a best effort 
estimation.\n");
+   }
+#endif
/*
 * if the commandline contains a ':', then that's the extended
 * syntax -- if not, it must be the classic syntax
--
2.27.0




Acked-by: Dave Young 

Thanks
Dave


Hi,

  Thank you.

  When can  we expect this to be applied in a future build ?





___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v16 00/11] support reserving crashkernel above 4G on arm64 kdump

2021-12-05 Thread john . p . donnelly

On 11/23/21 6:46 AM, Zhen Lei wrote:

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone and fall back to high
allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

This patchset contains the following 11 patches:

0001-0004 are some x86 cleanups which prepares for making 
functionsreserve_crashkernel[_low]() generic.
0005 makes functions reserve_crashkernel[_low]() generic.
0006-0008 reimplements arm64 crashkernel=X.
0009-0010 adds memory for devices by DT property linux,usable-memory-range.
0011 updates the doc.

Changes since [v15]
-  Aggregate the processing of "linux,usable-memory-range" into one function.
Only patch 9-10 have been updated.

Changes since [v14]
- Recovering the requirement that the CrashKernel memory regions on X86
   only requires 1 MiB alignment.
- Combine patches 5 and 6 in v14 into one. The compilation warning fixed
   by patch 6 was introduced by patch 5 in v14.
- As with crashk_res, crashk_low_res is also processed by
   crash_exclude_mem_range() in patch 7.
- Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range 
handling")
   has removed the architecture-specific code, extend the property 
"linux,usable-memory-range"
   in the platform-agnostic FDT core code. See patch 9.
- Discard the x86 description update in the document, because the description
   has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump 
guide").
- Change "arm64" to "ARM64" in Doc.


Changes since [v13]
- Rebased on top of 5.11-rc5.
- Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
Since reserve_crashkernel[_low]() implementations are quite similar on
other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
arch/Kconfig and select this by X86 and ARM64.
- Some minor cleanup.

Changes since [v12]
- Rebased on top of 5.10-rc1.
- Keep CRASH_ALIGN as 16M suggested by Dave.
- Drop patch "kdump: add threshold for the required memory".
- Add Tested-by from John.

Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.

Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ra

Re: [PATCH v16 00/11] support reserving crashkernel above 4G on arm64 kdump

2021-12-10 Thread john . p . donnelly

On 12/8/21 11:13 AM, Catalin Marinas wrote:

On Tue, Nov 23, 2021 at 08:46:35PM +0800, Zhen Lei wrote:

Chen Zhou (10):
   x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
   x86: kdump: make the lower bound of crash kernel reservation
 consistent
   x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
 reserve_crashkernel()
   x86: kdump: move xen_pv_domain() check and insert_resource() to
 setup_arch()
   x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
   arm64: kdump: introduce some macros for crash kernel reservation
   arm64: kdump: reimplement crashkernel=X
   x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config
   of: fdt: Add memory for devices by DT property
 "linux,usable-memory-range"
   kdump: update Documentation about crashkernel

Zhen Lei (1):
   of: fdt: Aggregate the processing of "linux,usable-memory-range"


Apart from a minor comment I made on patch 8 and some comments from Rob
that need addressing, the rest looks fine to me.

Ingo stated in the past that he's happy to ack the x86 changes as long
as there's no functional change (and that's the case AFAICT). Ingo, does
your conditional ack still stand?

In terms of merging, I'm happy to take it all through the arm64 tree
with acks from the x86 maintainers. Alternatively, with the change I
mentioned for patch 8, the first 5 patches could be queued via the tip
tree on a stable branch and I can base the rest of the arm64 on top.

Thomas, Ingo, Peter, any preference?

Thanks.



Hi,

If you notice the trend over the past year , some of additional review 
requests are because the submitter had to rebase to the next version.


Can we get this acked and placed in a build so others can test and start 
using it ?


Thank you,
JD







___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 2/5] dma-pool: allow user to disable atomic pool

2021-12-13 Thread john . p . donnelly

On 12/13/21 6:27 AM, Baoquan He wrote:

In the current code, three atomic memory pools are always created,
atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
specified in kernel command line. In fact, atomic pool is only
necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
which are needed on few ARCHes.

So change code to allow user to disable atomic pool by specifying
'coherent_pool=0'.

Meanwhile, update the relevant document in kernel-parameter.txt.

Signed-off-by: Baoquan He 

>
 Acked-by: John Donnelly 
 Tested-by:  John Donnelly 


---
  Documentation/admin-guide/kernel-parameters.txt | 3 ++-
  kernel/dma/pool.c   | 7 +--
  2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index ec4d25e854a8..d7015309614b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -664,7 +664,8 @@
  
  	coherent_pool=nn[KMG]	[ARM,KNL]

Sets the size of memory pool for coherent, atomic dma
-   allocations. Otherwise the default size will be scaled
+   allocations. A value of 0 disables the three atomic
+   memory pool. Otherwise the default size will be scaled
with memory capacity, while clamped between 128K and
1 << (PAGE_SHIFT + MAX_ORDER-1).
  
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c

index 5f84e6cdb78e..5a85804b5beb 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -21,7 +21,7 @@ static struct gen_pool *atomic_pool_kernel __ro_after_init;
  static unsigned long pool_size_kernel;
  
  /* Size can be defined by the coherent_pool command line */

-static size_t atomic_pool_size;
+static unsigned long atomic_pool_size = -1;
  
  /* Dynamic background expansion when the atomic pool is near capacity */

  static struct work_struct atomic_pool_work;
@@ -188,11 +188,14 @@ static int __init dma_atomic_pool_init(void)
  {
int ret = 0;
  
+	if (!atomic_pool_size)

+   return 0;
+
/*
 * If coherent_pool was not used on the command line, default the pool
 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
 */
-   if (!atomic_pool_size) {
+   if (atomic_pool_size == -1) {
unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/5] docs: kernel-parameters: Update to reflect the current default size of atomic pool

2021-12-13 Thread john . p . donnelly

On 12/13/21 6:27 AM, Baoquan He wrote:

Since commit 1d659236fb43("dma-pool: scale the default DMA coherent pool
size with memory capacity"), the default size of atomic pool has been
changed to take by scaling with system memory capacity. So update the
document in kerenl-parameter.txt accordingly.

Signed-off-by: Baoquan He 

>
 Acked-by: John Donnelly 
 Tested-by:  John Donnelly 


---
  Documentation/admin-guide/kernel-parameters.txt | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d4..ec4d25e854a8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -664,7 +664,9 @@
  
  	coherent_pool=nn[KMG]	[ARM,KNL]

Sets the size of memory pool for coherent, atomic dma
-   allocations, by default set to 256K.
+   allocations. Otherwise the default size will be scaled
+   with memory capacity, while clamped between 128K and
+   1 << (PAGE_SHIFT + MAX_ORDER-1).
  
  	com20020=	[HW,NET] ARCnet - COM20020 chipset

Format:



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 3/5] mm_zone: add function to check if managed dma zone exists

2021-12-13 Thread john . p . donnelly

On 12/13/21 6:27 AM, Baoquan He wrote:

In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
E.g in kdump kernel of x86_64, only low 1M is presented and locked down
at very early stage of boot, so that there's no managed pages at all in
DMA zone. This exception will always cause page allocation failure if page
is requested from DMA zone.

Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages. It will be used in later
patches.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel 
option is specified")
Cc: sta...@vger.kernel.org
Signed-off-by: Baoquan He 


>
 Acked-by: John Donnelly 
 Tested-by:  John Donnelly 


---
v2->v3:
  Rewrite has_managed_dma() in a simpler and more efficient way which is
  sugggested by DavidH.

  include/linux/mmzone.h |  9 +
  mm/page_alloc.c| 15 +++
  2 files changed, 24 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..6e1b726e9adf 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1046,6 +1046,15 @@ static inline int is_highmem_idx(enum zone_type idx)
  #endif
  }
  
+#ifdef CONFIG_ZONE_DMA

+bool has_managed_dma(void);
+#else
+static inline bool has_managed_dma(void)
+{
+   return false;
+}
+#endif
+
  /**
   * is_highmem - helper function to quickly check if a struct zone is a
   *  highmem zone or not.  This is an attempt to keep references
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..7c7a0b5de2ff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9460,3 +9460,18 @@ bool take_page_off_buddy(struct page *page)
return ret;
  }
  #endif
+
+#ifdef CONFIG_ZONE_DMA
+bool has_managed_dma(void)
+{
+   struct pglist_data *pgdat;
+
+   for_each_online_pgdat(pgdat) {
+   struct zone *zone = &pgdat->node_zones[ZONE_DMA];
+
+   if (managed_zone(zone))
+   return true;
+   }
+   return false;
+}
+#endif /* CONFIG_ZONE_DMA */



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages

2021-12-13 Thread john . p . donnelly

On 12/13/21 6:27 AM, Baoquan He wrote:

Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory presented
and locked down by memblock allocator. So no pages are added into buddy
of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
the first 1M of RAM").

Then in kdump kernel of x86_64, it always prints below failure message:

  DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
  swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), 
nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
  Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
  Call Trace:
   dump_stack+0x7f/0xa1
   warn_alloc.cold+0x72/0xd6
   ? _raw_spin_unlock_irq+0x24/0x40
   ? __alloc_pages_direct_compact+0x90/0x1b0
   __alloc_pages_slowpath.constprop.0+0xf29/0xf50
   ? __cond_resched+0x16/0x50
   ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
   __alloc_pages+0x24d/0x2c0
   ? __dma_atomic_pool_init+0x93/0x93
   alloc_page_interleave+0x13/0xb0
   atomic_pool_expand+0x118/0x210
   ? __dma_atomic_pool_init+0x93/0x93
   __dma_atomic_pool_init+0x45/0x93
   dma_atomic_pool_init+0xdb/0x176
   do_one_initcall+0x67/0x320
   ? rcu_read_lock_sched_held+0x3f/0x80
   kernel_init_freeable+0x290/0x2dc
   ? rest_init+0x24f/0x24f
   kernel_init+0xa/0x111
   ret_from_fork+0x22/0x30
  Mem-Info:
  ..
  DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
  DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations

Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
if yes. Otherwise just skip it.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel 
option is specified")
Cc: sta...@vger.kernel.org
Signed-off-by: Baoquan He 




 Acked-by: John Donnelly 
 Tested-by:  John Donnelly 



Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Cc: io...@lists.linux-foundation.org
---
  kernel/dma/pool.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5a85804b5beb..00df3edd6c5d 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
GFP_KERNEL);
if (!atomic_pool_kernel)
ret = -ENOMEM;
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+   if (has_managed_dma()) {
atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
GFP_KERNEL | GFP_DMA);
if (!atomic_pool_dma)
@@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct 
gen_pool *prev, gfp_t gfp)
if (prev == NULL) {
if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
return atomic_pool_dma32;
-   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+   if (atomic_pool_dma && (gfp & GFP_DMA))
return atomic_pool_dma;
return atomic_pool_kernel;
}



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-13 Thread john . p . donnelly

On 12/13/21 6:27 AM, Baoquan He wrote:

Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
However, it will fail if DMA zone has no managed pages. The failure
can be seen in kdump kernel of x86_64 as below:

  CPU: 0 PID: 65 Comm: kworker/u2:1 Not tainted 5.14.0-rc2+ #9
  Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., 
BIOS RMLSDP.86I.R2.28.D690.1306271008 06/27/2013
  Workqueue: events_unbound async_run_entry_fn
  Call Trace:
   dump_stack_lvl+0x57/0x72
   warn_alloc.cold+0x72/0xd6
   __alloc_pages_slowpath.constprop.0+0xf56/0xf70
   __alloc_pages+0x23b/0x2b0
   allocate_slab+0x406/0x630
   ___slab_alloc+0x4b1/0x7e0
   ? sr_probe+0x200/0x600
   ? lock_acquire+0xc4/0x2e0
   ? fs_reclaim_acquire+0x4d/0xe0
   ? lock_is_held_type+0xa7/0x120
   ? sr_probe+0x200/0x600
   ? __slab_alloc+0x67/0x90
   __slab_alloc+0x67/0x90
   ? sr_probe+0x200/0x600
   ? sr_probe+0x200/0x600
   kmem_cache_alloc_trace+0x259/0x270
   sr_probe+0x200/0x600
   ..
   bus_probe_device+0x9f/0xb0
   device_add+0x3d2/0x970
   ..
   __scsi_add_device+0xea/0x100
   ata_scsi_scan_host+0x97/0x1d0
   async_run_entry_fn+0x30/0x130
   process_one_work+0x2b0/0x5c0
   worker_thread+0x55/0x3c0
   ? process_one_work+0x5c0/0x5c0
   kthread+0x149/0x170
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x22/0x30
  Mem-Info:
  ..

The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages in there.
  sr_probe()
  --> get_capabilities()
  --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);

The DMA zone should be checked if it has managed pages, then try to create
dma-kmalloc.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel 
option is specified")
Cc: sta...@vger.kernel.org
Signed-off-by: Baoquan He 


 Acked-by: John Donnelly 
 Tested-by:  John Donnelly 


Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
---
  mm/slab_common.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index e5d080a93009..ae4ef0f8903a 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -878,6 +878,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
  {
int i;
enum kmalloc_cache_type type;
+#ifdef CONFIG_ZONE_DMA
+   bool managed_dma;
+#endif
  
  	/*

 * Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined
@@ -905,10 +908,16 @@ void __init create_kmalloc_caches(slab_flags_t flags)
slab_state = UP;
  
  #ifdef CONFIG_ZONE_DMA

+   managed_dma = has_managed_dma();
+
for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i];
  
  		if (s) {

+   if (!managed_dma) {
+   kmalloc_caches[KMALLOC_DMA][i] = 
kmalloc_caches[KMALLOC_NORMAL][i];
+   continue;
+   }
kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
kmalloc_info[i].name[KMALLOC_DMA],
kmalloc_info[i].size,



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

From: Chen Zhou 

Move CRASH_ALIGN to header asm/kexec.h for later use.

Suggested-by: Dave Young 
Suggested-by: Baoquan He 
Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: John Donnelly 
Tested-by: Dave Kleikamp 

>
 Acked-by: John Donnelly 



---
  arch/x86/include/asm/kexec.h | 3 +++
  arch/x86/kernel/setup.c  | 3 ---
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 11b7c06e2828c30..3a22e65262aa70b 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -18,6 +18,9 @@
  
  # define KEXEC_CONTROL_CODE_MAX_SIZE	2048
  
+/* 16M alignment for crash kernel regions */

+#define CRASH_ALIGNSZ_16M
+
  #ifndef __ASSEMBLY__
  
  #include 

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6a190c7f4d71b05..5cc60996eac56d6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -392,9 +392,6 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
  
  #ifdef CONFIG_KEXEC_CORE
  
-/* 16M alignment for crash kernel regions */

-#define CRASH_ALIGNSZ_16M
-
  /*
   * Keep the crash kernel below this limit.
   *



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

From: Chen Zhou 

To make the functions reserve_crashkernel() as generic,
replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: John Donnelly 
Tested-by: Dave Kleikamp 
Acked-by: Baoquan He 


 Acked-by: John Donnelly 


---
  arch/x86/kernel/setup.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6424ee4f23da2cf..bb2a0973b98059e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void)
if (!crash_base) {
/*
 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
-* crashkernel=x,high reserves memory over 4G, also allocates
-* 256M extra low memory for DMA buffers and swiotlb.
+* crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
+* also allocates 256M extra low memory for DMA buffers
+* and swiotlb.
 * But the extra memory is not required for all machines.
 * So try low memory first and fall back to high memory
 * unless "crashkernel=size[KMG],high" is specified.
@@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
}
}
  
-	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {

+   if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
memblock_phys_free(crash_base, crash_size);
return;
}



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

From: Chen Zhou 

The lower bounds of crash kernel reservation and crash kernel low
reservation are different, use the consistent value CRASH_ALIGN.

Suggested-by: Dave Young 
Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: John Donnelly 
Tested-by: Dave Kleikamp 


 Acked-by: John Donnelly 


---
  arch/x86/kernel/setup.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5cc60996eac56d6..6424ee4f23da2cf 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
return 0;
}
  
-	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);

+   low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
+   CRASH_ADDR_LOW_MAX);
if (!low_base) {
pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
smaller size.\n",
   (unsigned long)(low_size >> 20));



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

From: Chen Zhou 

Make the functions reserve_crashkernel[_low]() as generic. Since
reserve_crashkernel[_low]() implementations are quite similar on other
architectures as well, we can have more users of this later.

So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and
select this by X86.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: John Donnelly 
Tested-by: Dave Kleikamp 


 Acked-by: John Donnelly 

---
  arch/Kconfig |   3 +
  arch/x86/Kconfig |   2 +
  arch/x86/include/asm/elf.h   |   3 +
  arch/x86/include/asm/kexec.h |  28 ++-
  arch/x86/kernel/setup.c  | 143 +---
  include/linux/crash_core.h   |   3 +
  include/linux/kexec.h|   2 -
  kernel/crash_core.c  | 156 +++
  kernel/kexec_core.c  |  17 
  9 files changed, 194 insertions(+), 163 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index d3c4ab249e9c275..7bdb32c41985dc5 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -24,6 +24,9 @@ config KEXEC_ELF
  config HAVE_IMA_KEXEC
bool
  
+config ARCH_WANT_RESERVE_CRASH_KERNEL

+   bool
+
  config SET_FS
bool
  
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig

index 5c2ccb85f2efb86..bd78ed8193079b9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -12,6 +12,7 @@ config X86_32
depends on !64BIT
# Options that are inherently 32-bit kernel only:
select ARCH_WANT_IPC_PARSE_VERSION
+   select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
select CLKSRC_I8253
select CLONE_BACKWARDS
select GENERIC_VDSO_32
@@ -28,6 +29,7 @@ config X86_64
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_USE_CMPXCHG_LOCKREF
+   select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 29fea180a6658e8..7a6c36cff8331f5 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled;
  
  #define elf_check_arch(x)	elf_check_arch_ia32(x)
  
+/* We can also handle crash dumps from 64 bit kernel. */

+# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
+
  /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx
 contains a pointer to a function which might be registered using `atexit'.
 This provides a mean for the dynamic linker to call DT_FINI functions for
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 3a22e65262aa70b..3ff38a1353a2b86 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -21,6 +21,27 @@
  /* 16M alignment for crash kernel regions */
  #define CRASH_ALIGN   SZ_16M
  
+/*

+ * Keep the crash kernel below this limit.
+ *
+ * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
+ * due to mapping restrictions.
+ *
+ * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
+ * the upper limit of system RAM in 4-level paging mode. Since the kdump
+ * jump could be from 5-level paging to 4-level paging, the jump will fail if
+ * the kernel is put above 64 TB, and during the 1st kernel bootup there's
+ * no good way to detect the paging mode of the target kernel which will be
+ * loaded for dumping.
+ */
+#ifdef CONFIG_X86_32
+# define CRASH_ADDR_LOW_MAXSZ_512M
+# define CRASH_ADDR_HIGH_MAX   SZ_512M
+#else
+# define CRASH_ADDR_LOW_MAXSZ_4G
+# define CRASH_ADDR_HIGH_MAX   SZ_64T
+#endif
+
  #ifndef __ASSEMBLY__
  
  #include 

@@ -51,9 +72,6 @@ struct kimage;
  
  /* The native architecture */

  # define KEXEC_ARCH KEXEC_ARCH_386
-
-/* We can also handle crash dumps from 64 bit kernel. */
-# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
  #else
  /* Maximum physical address we can use pages from */
  # define KEXEC_SOURCE_MEMORY_LIMIT  (MAXMEM-1)
@@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void);
  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
  extern void kdump_nmi_shootdown_cpus(void);
  
+#ifdef CONFIG_KEXEC_CORE

+extern void __init reserve_crashkernel(void);
+#endif
+
  #endif /* __ASSEMBLY__ */
  
  #endif /* _ASM_X86_KEXEC_H */

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 7ae00716a208f82..5519baa7f4b964e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -39,6 +39,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -386,147 +387,7 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
}
  }
  
-/*

- * - Crashkernel reservation --
- */
-
-#ifdef CONFIG_KEXEC_CORE
-
-/*
- * Keep the crash kernel below this limit.
- *
- 

Re: [PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

From: Chen Zhou 

We will make the functions reserve_crashkernel() as generic, the
xen_pv_domain() check in reserve_crashkernel() is relevant only to
x86, the same as insert_resource() in reserve_crashkernel[_low]().
So move xen_pv_domain() check and insert_resource() to setup_arch()
to keep them in x86.

Suggested-by: Mike Rapoport 
Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: John Donnelly 
Tested-by: Dave Kleikamp 
Acked-by: Baoquan He 


 Acked-by: John Donnelly 


---
  arch/x86/kernel/setup.c | 19 +++
  1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bb2a0973b98059e..7ae00716a208f82 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
  
  	crashk_low_res.start = low_base;

crashk_low_res.end   = low_base + low_size - 1;
-   insert_resource(&iomem_resource, &crashk_low_res);
  #endif
return 0;
  }
@@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
high = true;
}
  
-	if (xen_pv_domain()) {

-   pr_info("Ignoring crashkernel for a Xen PV domain\n");
-   return;
-   }
-
/* 0 means: find the address automatically */
if (!crash_base) {
/*
@@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
  
  	crashk_res.start = crash_base;

crashk_res.end   = crash_base + crash_size - 1;
-   insert_resource(&iomem_resource, &crashk_res);
  }
  #else
  static void __init reserve_crashkernel(void)
@@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
 * Reserve memory for crash kernel after SRAT is parsed so that it
 * won't consume hotpluggable memory.
 */
-   reserve_crashkernel();
+   if (xen_pv_domain())
+   pr_info("Ignoring crashkernel for a Xen PV domain\n");
+   else {
+   reserve_crashkernel();
+#ifdef CONFIG_KEXEC_CORE
+   if (crashk_res.end > crashk_res.start)
+   insert_resource(&iomem_resource, &crashk_res);
+   if (crashk_low_res.end > crashk_low_res.start)
+   insert_resource(&iomem_resource, &crashk_low_res);
+#endif
+   }
  
  	memblock_find_dma_reserve();
  



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 07/10] arm64: kdump: reimplement crashkernel=X

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

From: Chen Zhou 

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X and
introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
in DMA zone, and fall back to high allocation if it fails.
We can also use "crashkernel=X,high" to select a region above DMA zone,
which also tries to allocate at least 256M in DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory.

Another minor change, there may be two regions reserved for crash
dump kernel, in order to distinct from the high region and make no
effect to the use of existing kexec-tools, rename the low region as
"Crash kernel (low)".

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: John Donnelly 
Tested-by: Dave Kleikamp 


 Acked-by: John Donnelly 


---
  arch/arm64/Kconfig |  1 +
  arch/arm64/include/asm/kexec.h |  4 ++
  arch/arm64/kernel/machine_kexec_file.c | 12 +-
  arch/arm64/kernel/setup.c  | 13 +-
  arch/arm64/mm/init.c   | 59 +-
  5 files changed, 38 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17ffb..4b99efa36da3793 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -95,6 +95,7 @@ config ARM64
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES 
&& !ARM64_VA_BITS_36)
select ARCH_WANT_LD_ORPHAN_WARN
+   select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
select ARCH_WANTS_NO_INSTR
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARM_AMBA
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 1b9edc69f0244ca..3bde0079925d771 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {}
  static inline void crash_post_resume(void) {}
  #endif
  
+#ifdef CONFIG_KEXEC_CORE

+extern void __init reserve_crashkernel(void);
+#endif
+
  #if defined(CONFIG_KEXEC_CORE)
  void cpu_soft_restart(unsigned long el2_switch, unsigned long entry,
  unsigned long arg0, unsigned long arg1,
diff --git a/arch/arm64/kernel/machine_kexec_file.c 
b/arch/arm64/kernel/machine_kexec_file.c
index 63634b4d72c158f..6f3fa059ca4e816 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long 
*sz)
  
  	/* Exclude crashkernel region */

ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+   if (ret)
+   goto out;
+
+   if (crashk_low_res.end) {
+   ret = crash_exclude_mem_range(cmem, crashk_low_res.start, 
crashk_low_res.end);
+   if (ret)
+   goto out;
+   }
  
-	if (!ret)

-   ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
+   ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
  
+out:

kfree(cmem);
return ret;
  }
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index be5f85b0a24de69..4bb2e55366be64d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -248,7 +248,18 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
  #ifdef CONFIG_KEXEC_CORE
-   /* Userspace will find "Crash kernel" region in /proc/iomem. */
+   /*
+* Userspace will find "Crash kernel" or "Crash kernel (low)"
+* region in /proc/iomem.
+* In order to distinct from the high region and make no effect
+* to the use of existing kexec-tools, rename the low region as
+* "Crash kernel (low)".
+*/
+   if (crashk_low_res.end && crashk_low_res.start >= res->start &&
+   crashk_low_res.end <= res->end) {
+   crashk_low_res.name = "Crash kernel (low)";
+   request_resource(res, &crashk_low_res);
+   }
if (crashk_res.end && crashk_res.start >= res->start &&
crashk_res.end <= res->end)
request_resource(res, &crashk_res);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index be4595dc7459115..85c83e4eff2b6c4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #includ

Re: [PATCH v17 06/10] arm64: kdump: introduce some macros for crash kernel reservation

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

From: Chen Zhou 

Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
upper bound of high crash memory, use macros instead.

Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound
of crash kernel reservation.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: John Donnelly 
Tested-by: Dave Kleikamp 


 Acked-by: John Donnelly 


---
  arch/arm64/include/asm/kexec.h | 6 ++
  arch/arm64/mm/init.c   | 4 ++--
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 9839bfc163d7147..1b9edc69f0244ca 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -25,6 +25,12 @@
  
  #define KEXEC_ARCH KEXEC_ARCH_AARCH64
  
+/* 2M alignment for crash kernel regions */

+#define CRASH_ALIGNSZ_2M
+
+#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
+#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE
+
  #ifndef __ASSEMBLY__
  
  /**

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a8834434af99ae0..be4595dc7459115 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
  static void __init reserve_crashkernel(void)
  {
unsigned long long crash_base, crash_size;
-   unsigned long long crash_max = arm64_dma_phys_limit;
+   unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
int ret;
  
  	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),

@@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void)
crash_max = crash_base + crash_size;
  
  	/* Current arm64 boot protocol requires 2MB alignment */

-   crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
+   crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
   crash_base, crash_max);
if (!crash_base) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

From: Chen Zhou 

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices and never mapped by the first kernel.
This memory range is advertised to crash dump kernel via DT property
under /chosen,
 linux,usable-memory-range = 

We reused the DT property linux,usable-memory-range and made the low
memory region as the second range "BASE2 SIZE2", which keeps compatibility
with existing user-space and older kdump kernels.

Crash dump kernel reads this property at boot time and call memblock_add()
to add the low memory region after memblock_cap_memory_range() has been
called.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: Dave Kleikamp 


 Acked-by: John Donnelly 


---
  drivers/of/fdt.c | 33 +++--
  1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 37b477a51175359..f7b72fa773250ad 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -967,6 +967,15 @@ static void __init 
early_init_dt_check_for_elfcorehdr(unsigned long node)
  
  static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
  
+/*

+ * The main usage of linux,usable-memory-range is for crash dump kernel.
+ * Originally, the number of usable-memory regions is one. Now there may
+ * be two regions, low region and high region.
+ * To make compatibility with existing user-space and older kdump, the low
+ * region is always the last range of linux,usable-memory-range if exist.
+ */
+#define MAX_USABLE_RANGES  2
+
  /**
   * early_init_dt_check_for_usable_mem_range - Decode usable memory range
   * location from flat tree
@@ -974,10 +983,9 @@ static unsigned long chosen_node_offset = 
-FDT_ERR_NOTFOUND;
   */
  static void __init early_init_dt_check_for_usable_mem_range(unsigned long 
node)
  {
-   const __be32 *prop;
-   int len;
-   phys_addr_t cap_mem_addr;
-   phys_addr_t cap_mem_size;
+   struct memblock_region rgn[MAX_USABLE_RANGES] = {0};
+   const __be32 *prop, *endp;
+   int len, i;
  
  	if ((long)node < 0)

return;
@@ -985,16 +993,21 @@ static void __init 
early_init_dt_check_for_usable_mem_range(unsigned long node)
pr_debug("Looking for usable-memory-range property... ");
  
  	prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);

-   if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells)))
+   if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells)))
return;
  
-	cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop);

-   cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop);
+   endp = prop + (len / sizeof(__be32));
+   for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) {
+   rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop);
+   rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop);
  
-	pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,

-&cap_mem_size);
+   pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n",
+i, &rgn[i].base, &rgn[i].size);
+   }
  
-	memblock_cap_memory_range(cap_mem_addr, cap_mem_size);

+   memblock_cap_memory_range(rgn[0].base, rgn[0].size);
+   for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++)
+   memblock_add(rgn[i].base, rgn[i].size);
  }
  
  #ifdef CONFIG_SERIAL_EARLYCON



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 08/10] of: fdt: Aggregate the processing of "linux,usable-memory-range"

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

Currently, we parse the "linux,usable-memory-range" property in
early_init_dt_scan_chosen(), to obtain the specified memory range of the
crash kernel. We then reserve the required memory after
early_init_dt_scan_memory() has identified all available physical memory.
Because the two pieces of code are separated far, the readability and
maintainability are reduced. So bring them together.

Suggested-by: Rob Herring 
Signed-off-by: Zhen Lei 
Tested-by: Dave Kleikamp 


 Acked-by: John Donnelly 


---
  drivers/of/fdt.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index bdca35284cebd56..37b477a51175359 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -965,8 +965,7 @@ static void __init 
early_init_dt_check_for_elfcorehdr(unsigned long node)
 elfcorehdr_addr, elfcorehdr_size);
  }
  
-static phys_addr_t cap_mem_addr;

-static phys_addr_t cap_mem_size;
+static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
  
  /**

   * early_init_dt_check_for_usable_mem_range - Decode usable memory range
@@ -977,6 +976,11 @@ static void __init 
early_init_dt_check_for_usable_mem_range(unsigned long node)
  {
const __be32 *prop;
int len;
+   phys_addr_t cap_mem_addr;
+   phys_addr_t cap_mem_size;
+
+   if ((long)node < 0)
+   return;
  
  	pr_debug("Looking for usable-memory-range property... ");
  
@@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
  
  	pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,

 &cap_mem_size);
+
+   memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
  }
  
  #ifdef CONFIG_SERIAL_EARLYCON

@@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, 
const char *uname,
(strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0))
return 0;
  
+	chosen_node_offset = node;

+
early_init_dt_check_for_initrd(node);
early_init_dt_check_for_elfcorehdr(node);
-   early_init_dt_check_for_usable_mem_range(node);
  
  	/* Retrieve command line */

p = of_get_flat_dt_prop(node, "bootargs", &l);
@@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void)
of_scan_flat_dt(early_init_dt_scan_memory, NULL);
  
  	/* Handle linux,usable-memory-range property */

-   memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
+   early_init_dt_check_for_usable_mem_range(chosen_node_offset);
  }
  
  bool __init early_init_dt_scan(void *params)



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 10/10] kdump: update Documentation about crashkernel

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

From: Chen Zhou 

For arm64, the behavior of crashkernel=X has been changed, which
tries low allocation in DMA zone and fall back to high allocation
if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

So update the Documentation.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 


 Acked-by: John Donnelly 


---
  Documentation/admin-guide/kdump/kdump.rst   | 11 +--
  Documentation/admin-guide/kernel-parameters.txt | 11 +--
  2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index cb30ca3df27c9b2..d4c287044be0c70 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -361,8 +361,15 @@ Boot into System Kernel
 kernel will automatically locate the crash kernel image within the
 first 512MB of RAM if X is not given.
  
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of

-   the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
+   On arm64, use "crashkernel=X" to try low allocation in DMA zone and
+   fall back to high allocation if it fails.
+   We can also use "crashkernel=X,high" to select a high region above
+   DMA zone, which also tries to allocate at least 256M low memory in
+   DMA zone automatically.
+   "crashkernel=Y,low" can be used to allocate specified size low memory.
+   Use "crashkernel=Y@X" if you really have to reserve memory from
+   specified start address X. Note that the start address of the kernel,
+   X if explicitly specified, must be aligned to 2MiB (0x20).
  
  Load the Dump-capture Kernel

  
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d46db..91f3a8dc537d404 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -783,6 +783,9 @@
[KNL, X86-64] Select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
+   [KNL, ARM64] Try low allocation in DMA zone and fall 
back
+   to high allocation if it fails when '@offset' hasn't 
been
+   specified.
See Documentation/admin-guide/kdump/kdump.rst for 
further details.
  
  	crashkernel=range1:size1[,range2:size2,...][@offset]

@@ -799,6 +802,8 @@
Otherwise memory region will be allocated below 4G, if
available.
It will be ignored if crashkernel=X is specified.
+   [KNL, ARM64] range in high memory.
+   Allow kernel to allocate physical memory region from 
top.
crashkernel=size[KMG],low
[KNL, X86-64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
@@ -807,13 +812,15 @@
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
-   at least 256M below 4G automatically.
+   least 256M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
-
+   [KNL, ARM64] range in low memory.
+   This one let user to specify a low range in DMA zone for
+   crash dump kernel.
cryptomgr.notests
[KNL] Disable crypto self-tests
  



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump

2021-12-13 Thread john . p . donnelly

On 12/10/21 12:55 AM, Zhen Lei wrote:

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone and fall back to high
allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

This patchset contains the following 10 patches:

0001-0004 are some x86 cleanups which prepares for making 
functionsreserve_crashkernel[_low]() generic.
0005 makes functions reserve_crashkernel[_low]() generic.
0006-0007 reimplements arm64 crashkernel=X.
0008-0009 adds memory for devices by DT property linux,usable-memory-range.
0010 updates the doc.

Changes since [v16]
- Because no functional changes in this version, so add
   "Tested-by: Dave Kleikamp " for patch 1-9
- Add "Reviewed-by: Rob Herring " for patch 8
- Update patch 9 based on the review comments of Rob Herring
- As Catalin Marinas's suggestion, merge the implementation of
   ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
   contents of X86 and ARM64 do not overlap, and reduce unnecessary
   temporary differences.

Changes since [v15]
-  Aggregate the processing of "linux,usable-memory-range" into one function.
Only patch 9-10 have been updated.

Changes since [v14]
- Recovering the requirement that the CrashKernel memory regions on X86
   only requires 1 MiB alignment.
- Combine patches 5 and 6 in v14 into one. The compilation warning fixed
   by patch 6 was introduced by patch 5 in v14.
- As with crashk_res, crashk_low_res is also processed by
   crash_exclude_mem_range() in patch 7.
- Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range 
handling")
   has removed the architecture-specific code, extend the property 
"linux,usable-memory-range"
   in the platform-agnostic FDT core code. See patch 9.
- Discard the x86 description update in the document, because the description
   has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump 
guide").
- Change "arm64" to "ARM64" in Doc.


Changes since [v13]
- Rebased on top of 5.11-rc5.
- Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
Since reserve_crashkernel[_low]() implementations are quite similar on
other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
arch/Kconfig and select this by X86 and ARM64.
- Some minor cleanup.

Changes since [v12]
- Rebased on top of 5.10-rc1.
- Keep CRASH_ALIGN as 16M suggested by Dave.
- Drop patch "kdump: add threshold for the required memory".
- Add Tested-by from John.

Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.

Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simulta

Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-14 Thread john . p . donnelly

On 12/14/21 10:31 AM, Christoph Hellwig wrote:

On Mon, Dec 13, 2021 at 08:27:12PM +0800, Baoquan He wrote:

Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
However, it will fail if DMA zone has no managed pages. The failure
can be seen in kdump kernel of x86_64 as below:


Please just switch the sr allocation to use GFP_KERNEL without GFP_DMA.
The block layer will do the proper bounce buffering underneath for the
very unlikely case that we're actually using the single HBA driver that
has ISA DMA addressing limitations.

Same for the ch drive, btw.


Hi,

Is CONFIG_ZONE_DMA even needed anymore in x86_64  ?


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages

2021-12-23 Thread john . p . donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:

Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory presented
and locked down by memblock allocator. So no pages are added into buddy
of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
the first 1M of RAM").

Then in kdump kernel of x86_64, it always prints below failure message:

  DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
  swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), 
nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
  Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
  Call Trace:
   dump_stack+0x7f/0xa1
   warn_alloc.cold+0x72/0xd6
   ? _raw_spin_unlock_irq+0x24/0x40
   ? __alloc_pages_direct_compact+0x90/0x1b0
   __alloc_pages_slowpath.constprop.0+0xf29/0xf50
   ? __cond_resched+0x16/0x50
   ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
   __alloc_pages+0x24d/0x2c0
   ? __dma_atomic_pool_init+0x93/0x93
   alloc_page_interleave+0x13/0xb0
   atomic_pool_expand+0x118/0x210
   ? __dma_atomic_pool_init+0x93/0x93
   __dma_atomic_pool_init+0x45/0x93
   dma_atomic_pool_init+0xdb/0x176
   do_one_initcall+0x67/0x320
   ? rcu_read_lock_sched_held+0x3f/0x80
   kernel_init_freeable+0x290/0x2dc
   ? rest_init+0x24f/0x24f
   kernel_init+0xa/0x111
   ret_from_fork+0x22/0x30
  Mem-Info:
  ..
  DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
  DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations

Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
if yes. Otherwise just skip it.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel 
option is specified")
Cc: sta...@vger.kernel.org
Signed-off-by: Baoquan He 

Acked-by: John Donnelly  


Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Cc: io...@lists.linux-foundation.org
---
  kernel/dma/pool.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5a85804b5beb..00df3edd6c5d 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
GFP_KERNEL);
if (!atomic_pool_kernel)
ret = -ENOMEM;
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+   if (has_managed_dma()) {
atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
GFP_KERNEL | GFP_DMA);
if (!atomic_pool_dma)
@@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct 
gen_pool *prev, gfp_t gfp)
if (prev == NULL) {
if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
return atomic_pool_dma32;
-   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+   if (atomic_pool_dma && (gfp & GFP_DMA))
return atomic_pool_dma;
return atomic_pool_kernel;
}



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages

2021-12-23 Thread john . p . donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:

In kdump kernel of x86_64, page allocation failure is observed:

  kworker/u2:2: page allocation failure: order:0, 
mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
  Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
  Workqueue: events_unbound async_run_entry_fn
  Call Trace:
   
   dump_stack_lvl+0x48/0x5e
   warn_alloc.cold+0x72/0xd6
   __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
   __alloc_pages+0x1df/0x210
   new_slab+0x389/0x4d0
   ___slab_alloc+0x58f/0x770
   __slab_alloc.constprop.0+0x4a/0x80
   kmem_cache_alloc_trace+0x24b/0x2c0
   sr_probe+0x1db/0x620
   ..
   device_add+0x405/0x920
   ..
   __scsi_add_device+0xe5/0x100
   ata_scsi_scan_host+0x97/0x1d0
   async_run_entry_fn+0x30/0x130
   process_one_work+0x1e8/0x3c0
   worker_thread+0x50/0x3b0
   ? rescuer_thread+0x350/0x350
   kthread+0x16b/0x190
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x22/0x30
   
  Mem-Info:
  ..

The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages at all in there.
  sr_probe()
  --> get_capabilities()
  --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);

Because in the current kernel, dma-kmalloc will be created as long as
CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
reserve the low 1M when the crashkernel option is specified"). The failure
can be always reproduced.

For now, let's mute the warning of allocation failure if requesting pages
from DMA zone while no managed pages.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel 
option is specified")
Cc: sta...@vger.kernel.org
Signed-off-by: Baoquan He 

Acked-by: John Donnelly  



Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
---
  mm/page_alloc.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7c7a0b5de2ff..843bc8e5550a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, 
const char *fmt, ...)
va_list args;
static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
  
-	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))

+   if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
+   (gfp_mask & __GFP_DMA) && !has_managed_dma())
return;
  
  	va_start(args, fmt);



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 1/3] mm_zone: add function to check if managed dma zone exists

2021-12-23 Thread john . p . donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:

In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
E.g in kdump kernel of x86_64, only low 1M is presented and locked down
at very early stage of boot, so that there's no managed pages at all in
DMA zone. This exception will always cause page allocation failure if page
is requested from DMA zone.

Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages. It will be used in later
patches.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel 
option is specified")
Cc: sta...@vger.kernel.org
Signed-off-by: Baoquan He 
Reviewed-by: David Hildenbrand 

Acked-by: John Donnelly  


---
  include/linux/mmzone.h |  9 +
  mm/page_alloc.c| 15 +++
  2 files changed, 24 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..6e1b726e9adf 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1046,6 +1046,15 @@ static inline int is_highmem_idx(enum zone_type idx)
  #endif
  }
  
+#ifdef CONFIG_ZONE_DMA

+bool has_managed_dma(void);
+#else
+static inline bool has_managed_dma(void)
+{
+   return false;
+}
+#endif
+
  /**
   * is_highmem - helper function to quickly check if a struct zone is a
   *  highmem zone or not.  This is an attempt to keep references
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..7c7a0b5de2ff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9460,3 +9460,18 @@ bool take_page_off_buddy(struct page *page)
return ret;
  }
  #endif
+
+#ifdef CONFIG_ZONE_DMA
+bool has_managed_dma(void)
+{
+   struct pglist_data *pgdat;
+
+   for_each_online_pgdat(pgdat) {
+   struct zone *zone = &pgdat->node_zones[ZONE_DMA];
+
+   if (managed_zone(zone))
+   return true;
+   }
+   return false;
+}
+#endif /* CONFIG_ZONE_DMA */



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v19 01/13] kdump: add helper parse_crashkernel_high_low()

2022-01-11 Thread john . p . donnelly

On 12/28/21 7:26 AM, Zhen Lei wrote:

The bootup command line option crashkernel=Y,low is valid only when
crashkernel=X,high is specified. Putting their parsing into a separate
function makes the code logic clearer and easier to understand the strong
dependencies between them.

Signed-off-by: Zhen Lei 

>
Acked-by: John Donnelly  


---
  include/linux/crash_core.h |  3 +++
  kernel/crash_core.c| 35 +++
  2 files changed, 38 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e7db..2d3a64761d18998 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -83,5 +83,8 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
+int __init parse_crashkernel_high_low(char *cmdline,
+ unsigned long long *high_size,
+ unsigned long long *low_size);
  
  #endif /* LINUX_CRASH_CORE_H */

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index eb53f5ec62c900f..8966beaf7c4fd52 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -295,6 +295,41 @@ int __init parse_crashkernel_low(char *cmdline,
"crashkernel=", suffix_tbl[SUFFIX_LOW]);
  }
  
+/**

+ * parse_crashkernel_high_low - Parsing "crashkernel=X,high" and possible
+ * "crashkernel=Y,low".
+ * @cmdline:   The bootup command line.
+ * @high_size: Save the memory size specified by "crashkernel=X,high".
+ * @low_size:  Save the memory size specified by "crashkernel=Y,low" or "-1"
+ * if it's not specified.
+ *
+ * Returns 0 on success, else a negative status code.
+ */
+int __init parse_crashkernel_high_low(char *cmdline,
+ unsigned long long *high_size,
+ unsigned long long *low_size)
+{
+   int ret;
+   unsigned long long base;
+
+   BUG_ON(!high_size || !low_size);
+
+   /* crashkernel=X,high */
+   ret = parse_crashkernel_high(cmdline, 0, high_size, &base);
+   if (ret)
+   return ret;
+
+   if (*high_size <= 0)
+   return -EINVAL;
+
+   /* crashkernel=Y,low */
+   ret = parse_crashkernel_low(cmdline, 0, low_size, &base);
+   if (ret)
+   *low_size = -1;
+
+   return 0;
+}
+
  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
  void *data, size_t data_len)
  {



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v19 02/13] x86/setup: Use parse_crashkernel_high_low() to simplify code

2022-01-11 Thread john . p . donnelly

On 12/28/21 7:26 AM, Zhen Lei wrote:

Use parse_crashkernel_high_low() to bring the parsing of
"crashkernel=X,high" and the parsing of "crashkernel=Y,low" together, they
are strongly dependent, make code logic clear and more readable.

Suggested-by: Borislav Petkov 
Signed-off-by: Zhen Lei 

>
Acked-by: John Donnelly  

---
  arch/x86/kernel/setup.c | 21 +
  1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6a190c7f4d71b05..93d78aae1937db3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -416,18 +416,16 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
  # define CRASH_ADDR_HIGH_MAX  SZ_64T
  #endif
  
-static int __init reserve_crashkernel_low(void)

+static int __init reserve_crashkernel_low(unsigned long long low_size)
  {
  #ifdef CONFIG_X86_64
-   unsigned long long base, low_base = 0, low_size = 0;
+   unsigned long long low_base = 0;
unsigned long low_mem_limit;
-   int ret;
  
  	low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
  
-	/* crashkernel=Y,low */

-   ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, 
&base);
-   if (ret) {
+   /* crashkernel=Y,low is not specified */
+   if ((long)low_size < 0) {
/*
 * two parts from kernel/dma/swiotlb.c:
 * -swiotlb size: user-specified with swiotlb= or default.
@@ -465,7 +463,7 @@ static int __init reserve_crashkernel_low(void)
  
  static void __init reserve_crashkernel(void)

  {
-   unsigned long long crash_size, crash_base, total_mem;
+   unsigned long long crash_size, crash_base, total_mem, low_size;
bool high = false;
int ret;
  
@@ -474,10 +472,9 @@ static void __init reserve_crashkernel(void)

/* crashkernel=XM */
ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, 
&crash_base);
if (ret != 0 || crash_size <= 0) {
-   /* crashkernel=X,high */
-   ret = parse_crashkernel_high(boot_command_line, total_mem,
-&crash_size, &crash_base);
-   if (ret != 0 || crash_size <= 0)
+   /* crashkernel=X,high and possible crashkernel=Y,low */
+   ret = parse_crashkernel_high_low(boot_command_line, &crash_size, 
&low_size);
+   if (ret)
return;
high = true;
}
@@ -520,7 +517,7 @@ static void __init reserve_crashkernel(void)
}
}
  
-	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {

+   if (crash_base >= (1ULL << 32) && reserve_crashkernel_low(low_size)) {
memblock_phys_free(crash_base, crash_size);
return;
}



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v19 03/13] kdump: make parse_crashkernel_{high|low}() static

2022-01-11 Thread john . p . donnelly

On 12/28/21 7:26 AM, Zhen Lei wrote:

Make parse_crashkernel_{high|low}() static, they are only referenced by
parse_crashkernel_high_low() in the same file. The latter is recommended.

Signed-off-by: Zhen Lei 

>
Acked-by: John Donnelly  


---
  include/linux/crash_core.h | 4 
  kernel/crash_core.c| 4 ++--
  2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 2d3a64761d18998..598fd55d83c169e 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -79,10 +79,6 @@ void final_note(Elf_Word *buf);
  
  int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,

unsigned long long *crash_size, unsigned long long *crash_base);
-int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
-   unsigned long long *crash_size, unsigned long long *crash_base);
-int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
-   unsigned long long *crash_size, unsigned long long *crash_base);
  int __init parse_crashkernel_high_low(char *cmdline,
  unsigned long long *high_size,
  unsigned long long *low_size);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 8966beaf7c4fd52..3b9e01fc450b2a4 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -277,7 +277,7 @@ int __init parse_crashkernel(char *cmdline,
"crashkernel=", NULL);
  }
  
-int __init parse_crashkernel_high(char *cmdline,

+static int __init parse_crashkernel_high(char *cmdline,
 unsigned long long system_ram,
 unsigned long long *crash_size,
 unsigned long long *crash_base)
@@ -286,7 +286,7 @@ int __init parse_crashkernel_high(char *cmdline,
"crashkernel=", suffix_tbl[SUFFIX_HIGH]);
  }
  
-int __init parse_crashkernel_low(char *cmdline,

+static int __init parse_crashkernel_low(char *cmdline,
 unsigned long long system_ram,
 unsigned long long *crash_size,
 unsigned long long *crash_base)



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v19 04/13] kdump: reduce unnecessary parameters of parse_crashkernel_{high|low}()

2022-01-11 Thread john . p . donnelly

On 12/28/21 7:26 AM, Zhen Lei wrote:

Delete confusing parameters 'system_ram' and 'crash_base' of
parse_crashkernel_{high|low}(), they are only needed by the case of
"crashkernel=X@[offset]".

Signed-off-by: Zhen Lei 

>
Acked-by: John Donnelly  


---
  kernel/crash_core.c | 21 ++---
  1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 3b9e01fc450b2a4..b7d024eb464d0ae 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -278,20 +278,20 @@ int __init parse_crashkernel(char *cmdline,
  }
  
  static int __init parse_crashkernel_high(char *cmdline,

-unsigned long long system_ram,
-unsigned long long *crash_size,
-unsigned long long *crash_base)
+unsigned long long *crash_size)
  {
-   return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
+   unsigned long long base;
+
+   return __parse_crashkernel(cmdline, 0, crash_size, &base,
"crashkernel=", suffix_tbl[SUFFIX_HIGH]);
  }
  
  static int __init parse_crashkernel_low(char *cmdline,

-unsigned long long system_ram,
-unsigned long long *crash_size,
-unsigned long long *crash_base)
+   unsigned long long *crash_size)
  {
-   return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
+   unsigned long long base;
+
+   return __parse_crashkernel(cmdline, 0, crash_size, &base,
"crashkernel=", suffix_tbl[SUFFIX_LOW]);
  }
  
@@ -310,12 +310,11 @@ int __init parse_crashkernel_high_low(char *cmdline,

  unsigned long long *low_size)
  {
int ret;
-   unsigned long long base;
  
  	BUG_ON(!high_size || !low_size);
  
  	/* crashkernel=X,high */

-   ret = parse_crashkernel_high(cmdline, 0, high_size, &base);
+   ret = parse_crashkernel_high(cmdline, high_size);
if (ret)
return ret;
  
@@ -323,7 +322,7 @@ int __init parse_crashkernel_high_low(char *cmdline,

return -EINVAL;
  
  	/* crashkernel=Y,low */

-   ret = parse_crashkernel_low(cmdline, 0, low_size, &base);
+   ret = parse_crashkernel_low(cmdline, low_size);
if (ret)
*low_size = -1;
  



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v19 05/13] x86/setup: Add and use CRASH_BASE_ALIGN

2022-01-11 Thread john . p . donnelly

On 12/28/21 7:26 AM, Zhen Lei wrote:

Add macro CRASH_BASE_ALIGN to indicate the alignment for crash kernel
fixed region, in preparation for making partial implementation of
reserve_crashkernel[_low]() generic.

Signed-off-by: Zhen Lei 


>
Acked-by: John Donnelly  


---
  arch/x86/kernel/setup.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 93d78aae1937db3..cb7f237a2ae0dfa 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -392,9 +392,12 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
  
  #ifdef CONFIG_KEXEC_CORE
  
-/* 16M alignment for crash kernel regions */

+/* alignment for crash kernel dynamic regions */
  #define CRASH_ALIGN   SZ_16M
  
+/* alignment for crash kernel fixed region */

+#define CRASH_BASE_ALIGN   SZ_1M
+
  /*
   * Keep the crash kernel below this limit.
   *
@@ -509,7 +512,7 @@ static void __init reserve_crashkernel(void)
} else {
unsigned long long start;
  
-		start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,

+   start = memblock_phys_alloc_range(crash_size, CRASH_BASE_ALIGN, 
crash_base,
  crash_base + crash_size);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in 
use.\n");



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v19 06/13] kexec: move crashk[_low]_res to crash_core module

2022-01-11 Thread john . p . donnelly

On 12/28/21 7:26 AM, Zhen Lei wrote:

From: Chen Zhou 

Move the definition and declaration of global variable crashk[_low]_res
from kexec module to crash_core module, in preparation of adding generic
reserve_crashkernel_mem[_low]() to crash_core.c, the latter refers to
variable crashk[_low]_res. Due to the config KEXEC automatically selects
CRASH_CORE, and the header crash_core.h is included by kexec.h, so there
is no functional change.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 


>
Acked-by: John Donnelly  


---
  include/linux/crash_core.h |  4 
  include/linux/kexec.h  |  4 
  kernel/crash_core.c| 16 
  kernel/kexec_core.c| 17 -
  4 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 598fd55d83c169e..f5437c9c9411fce 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -73,6 +73,10 @@ extern unsigned char *vmcoreinfo_data;
  extern size_t vmcoreinfo_size;
  extern u32 *vmcoreinfo_note;
  
+/* Location of a reserved region to hold the crash kernel. */

+extern struct resource crashk_res;
+extern struct resource crashk_low_res;
+
  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
  void *data, size_t data_len);
  void final_note(Elf_Word *buf);
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 0c994ae37729e1e..47e784d66ea8645 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -350,10 +350,6 @@ extern int kexec_load_disabled;
  #define KEXEC_FILE_FLAGS  (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \
 KEXEC_FILE_NO_INITRAMFS)
  
-/* Location of a reserved region to hold the crash kernel.

- */
-extern struct resource crashk_res;
-extern struct resource crashk_low_res;
  extern note_buf_t __percpu *crash_notes;
  
  /* flag to track if kexec reboot is in progress */

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index b7d024eb464d0ae..686d8a65e12a337 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -22,6 +22,22 @@ u32 *vmcoreinfo_note;
  /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
  static unsigned char *vmcoreinfo_data_safecopy;
  
+/* Location of the reserved area for the crash kernel */

+struct resource crashk_res = {
+   .name  = "Crash kernel",
+   .start = 0,
+   .end   = 0,
+   .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+   .desc  = IORES_DESC_CRASH_KERNEL
+};
+struct resource crashk_low_res = {
+   .name  = "Crash kernel",
+   .start = 0,
+   .end   = 0,
+   .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+   .desc  = IORES_DESC_CRASH_KERNEL
+};
+
  /*
   * parsing the "crashkernel" commandline
   *
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 5a5d192a89ac307..1e0d4909bbb6b77 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -54,23 +54,6 @@ note_buf_t __percpu *crash_notes;
  /* Flag to indicate we are going to kexec a new kernel */
  bool kexec_in_progress = false;
  
-

-/* Location of the reserved area for the crash kernel */
-struct resource crashk_res = {
-   .name  = "Crash kernel",
-   .start = 0,
-   .end   = 0,
-   .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
-   .desc  = IORES_DESC_CRASH_KERNEL
-};
-struct resource crashk_low_res = {
-   .name  = "Crash kernel",
-   .start = 0,
-   .end   = 0,
-   .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
-   .desc  = IORES_DESC_CRASH_KERNEL
-};
-
  int kexec_should_crash(struct task_struct *p)
  {
/*



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 0/3] Handle warning of allocation failure on DMA zone w/o managed pages

2022-01-12 Thread john . p . donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:

**Problem observed:
On x86_64, when crash is triggered and entering into kdump kernel, page
allocation failure can always be seen.

  -
  DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
  swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), 
nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 0 PID: 1 Comm: swapper/0
  Call Trace:
   dump_stack+0x7f/0xa1
   warn_alloc.cold+0x72/0xd6
   ..
   __alloc_pages+0x24d/0x2c0
   ..
   dma_atomic_pool_init+0xdb/0x176
   do_one_initcall+0x67/0x320
   ? rcu_read_lock_sched_held+0x3f/0x80
   kernel_init_freeable+0x290/0x2dc
   ? rest_init+0x24f/0x24f
   kernel_init+0xa/0x111
   ret_from_fork+0x22/0x30
  Mem-Info:
  

***Root cause:
In the current kernel, it assumes that DMA zone must have managed pages
and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
always true. E.g in kdump kernel of x86_64, only low 1M is presented and
locked down at very early stage of boot, so that this low 1M won't be
added into buddy allocator to become managed pages of DMA zone. This
exception will always cause page allocation failure if page is requested
from DMA zone.

***Investigation:
This failure happens since below commit merged into linus's tree.
   1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
   23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
   f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
   7c321eb2b843 x86/kdump: Remove the backup region handling
   6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel 
option is specified

Before them, on x86_64, the low 640K area will be reused by kdump kernel.
So in kdump kernel, the content of low 640K area is copied into a backup
region for dumping before jumping into kdump. Then except of those firmware
reserved region in [0, 640K], the left area will be added into buddy
allocator to become available managed pages of DMA zone.

However, after above commits applied, in kdump kernel of x86_64, the low
1M is reserved by memblock, but not released to buddy allocator. So any
later page allocation requested from DMA zone will fail.

At the beginning, if crashkernel is reserved, the low 1M need be locked
down because AMD SME encrypts memory making the old backup region
mechanims impossible when switching into kdump kernel.

Later, it was also observed that there are BIOSes corrupting memory
under 1M. To solve this, in commit f1d4d47c5851, the entire region of
low 1M is always reserved after the real mode trampoline is allocated.

Besides, recently, Intel engineer mentioned their TDX (Trusted domain
extensions) which is under development in kernel also needs to lock down
the low 1M. So we can't simply revert above commits to fix the page allocation
failure from DMA zone as someone suggested.

***Solution:
Currently, only DMA atomic pool and dma-kmalloc will initialize and
request page allocation with GFP_DMA during bootup.

So only initializ DMA atomic pool when DMA zone has available managed
pages, otherwise just skip the initialization.

For dma-kmalloc(), for the time being, let's mute the warning of
allocation failure if requesting pages from DMA zone while no manged
pages. Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to
replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc()
if not necessary. Christoph is posting patches to fix those under
drivers/scsi/. Finally, we can remove the need of dma-kmalloc() as
people suggested.

Changelog:
v3->v4:
  - Split the old v3 into two separate patchset. The first two clean
up/improvement patches in v3 have been sent out in a independent
patchset. The fixes patchs are adapted and sent in this patchset.
  - Do not change dma-kmalloc(), mute the warning of allocation failure
instead if it's requesting page from DMA zone which has no managed
pages.

v2-Resend -> v3:
  - Re-implement has_managed_dma() according to David's suggestion.
  - Add Fixes tag and cc stable.

v2->v2 RESEND:
  - John pinged to push the repost of this patchset. So fix one typo of
suject of patch 3/5; Fix a building error caused by mix declaration in
patch 5/5. Both of them are found by John from his testing.
  - Rewrite cover letter to add more information.

v1->v2:
  Change to check if managed DMA zone exists. If DMA zone has managed
  pages, go further to request page from DMA zone to initialize. Otherwise,
  just skip to initialize stuffs which need pages from DMA zone.

v3:
https://urldefense.com/v3/__https://lore.kernel.org/all/20211213122712.23805-1-...@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpS2Y0ecPm$

V2 RESEND post:
https://urldefense.com/v3/__https://lore.kernel.org/all/20211207030750.30824-1-...@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUi

Re: [PATCH v20 1/5] arm64: Use insert_resource() to simplify code

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

insert_resource() traverses the subtree layer by layer from the root node
until a proper location is found. Compared with request_resource(), the
parent node does not need to be determined in advance.

In addition, move the insertion of node 'crashk_res' into function
reserve_crashkernel() to make the associated code close together.

Signed-off-by: Zhen Lei 



Acked-by: John Donnelly  


---
  arch/arm64/kernel/setup.c | 17 +++--
  arch/arm64/mm/init.c  |  1 +
  2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index f70573928f1bff0..a81efcc359e4e78 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -225,6 +225,8 @@ static void __init request_standard_resources(void)
kernel_code.end = __pa_symbol(__init_begin - 1);
kernel_data.start   = __pa_symbol(_sdata);
kernel_data.end = __pa_symbol(_end - 1);
+   insert_resource(&iomem_resource, &kernel_code);
+   insert_resource(&iomem_resource, &kernel_data);
  
  	num_standard_resources = memblock.memory.cnt;

res_size = num_standard_resources * sizeof(*standard_resources);
@@ -246,20 +248,7 @@ static void __init request_standard_resources(void)
res->end = 
__pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
}
  
-		request_resource(&iomem_resource, res);

-
-   if (kernel_code.start >= res->start &&
-   kernel_code.end <= res->end)
-   request_resource(res, &kernel_code);
-   if (kernel_data.start >= res->start &&
-   kernel_data.end <= res->end)
-   request_resource(res, &kernel_data);
-#ifdef CONFIG_KEXEC_CORE
-   /* Userspace will find "Crash kernel" region in /proc/iomem. */
-   if (crashk_res.end && crashk_res.start >= res->start &&
-   crashk_res.end <= res->end)
-   request_resource(res, &crashk_res);
-#endif
+   insert_resource(&iomem_resource, res);
}
  }
  
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c

index db63cc885771a52..90f276d46b93bc6 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -109,6 +109,7 @@ static void __init reserve_crashkernel(void)
kmemleak_ignore_phys(crash_base);
crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
+   insert_resource(&iomem_resource, &crashk_res);
  }
  #else
  static void __init reserve_crashkernel(void)



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 2/5] arm64: kdump: introduce some macros for crash kernel reservation

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

From: Chen Zhou 

Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
upper bound of high crash memory, use macros instead.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: John Donnelly 
Tested-by: Dave Kleikamp 



Acked-by: John Donnelly  


---
  arch/arm64/mm/init.c | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 90f276d46b93bc6..6c653a2c7cff052 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -65,6 +65,12 @@ EXPORT_SYMBOL(memstart_addr);
  phys_addr_t arm64_dma_phys_limit __ro_after_init;
  
  #ifdef CONFIG_KEXEC_CORE

+/* Current arm64 boot protocol requires 2MB alignment */
+#define CRASH_ALIGNSZ_2M
+
+#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
+#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE
+
  /*
   * reserve_crashkernel() - reserves memory for crash kernel
   *
@@ -75,7 +81,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
  static void __init reserve_crashkernel(void)
  {
unsigned long long crash_base, crash_size;
-   unsigned long long crash_max = arm64_dma_phys_limit;
+   unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
int ret;
  
  	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),

@@ -90,8 +96,7 @@ static void __init reserve_crashkernel(void)
if (crash_base)
crash_max = crash_base + crash_size;
  
-	/* Current arm64 boot protocol requires 2MB alignment */

-   crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
+   crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
   crash_base, crash_max);
if (!crash_base) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 3/5] arm64: kdump: reimplement crashkernel=X

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

From: Chen Zhou 

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X and
introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
in DMA zone, and fall back to high allocation if it fails.
We can also use "crashkernel=X,high" to select a region above DMA zone,
which also tries to allocate at least 256M in DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory.

Signed-off-by: Chen Zhou 
Co-developed-by: Zhen Lei 
Signed-off-by: Zhen Lei 



Acked-by: John Donnelly  


---
  arch/arm64/kernel/machine_kexec.c  |  9 +++-
  arch/arm64/kernel/machine_kexec_file.c | 12 -
  arch/arm64/mm/init.c   | 68 --
  3 files changed, 81 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index e16b248699d5c3c..19c2d487cb08feb 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -329,8 +329,13 @@ bool crash_is_nosave(unsigned long pfn)
  
  	/* in reserved memory? */

addr = __pfn_to_phys(pfn);
-   if ((addr < crashk_res.start) || (crashk_res.end < addr))
-   return false;
+   if ((addr < crashk_res.start) || (crashk_res.end < addr)) {
+   if (!crashk_low_res.end)
+   return false;
+
+   if ((addr < crashk_low_res.start) || (crashk_low_res.end < 
addr))
+   return false;
+   }
  
  	if (!kexec_crash_image)

return true;
diff --git a/arch/arm64/kernel/machine_kexec_file.c 
b/arch/arm64/kernel/machine_kexec_file.c
index 59c648d51848886..889951291cc0f9c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long 
*sz)
  
  	/* Exclude crashkernel region */

ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+   if (ret)
+   goto out;
+
+   if (crashk_low_res.end) {
+   ret = crash_exclude_mem_range(cmem, crashk_low_res.start, 
crashk_low_res.end);
+   if (ret)
+   goto out;
+   }
  
-	if (!ret)

-   ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
+   ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
  
+out:

kfree(cmem);
return ret;
  }
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 6c653a2c7cff052..a5d43feac0d7d96 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -71,6 +71,30 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
  #define CRASH_ADDR_LOW_MAXarm64_dma_phys_limit
  #define CRASH_ADDR_HIGH_MAX   MEMBLOCK_ALLOC_ACCESSIBLE
  
+static int __init reserve_crashkernel_low(unsigned long long low_size)

+{
+   unsigned long long low_base;
+
+   /* passed with crashkernel=0,low ? */
+   if (!low_size)
+   return 0;
+
+   low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
CRASH_ADDR_LOW_MAX);
+   if (!low_base) {
+   pr_err("cannot allocate crashkernel low memory 
(size:0x%llx).\n", low_size);
+   return -ENOMEM;
+   }
+
+   pr_info("crashkernel low memory reserved: 0x%llx - 0x%llx (%lld MB)\n",
+   low_base, low_base + low_size, low_size >> 20);
+
+   crashk_low_res.start = low_base;
+   crashk_low_res.end   = low_base + low_size - 1;
+   insert_resource(&iomem_resource, &crashk_low_res);
+
+   return 0;
+}
+
  /*
   * reserve_crashkernel() - reserves memory for crash kernel
   *
@@ -81,29 +105,62 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
  static void __init reserve_crashkernel(void)
  {
unsigned long long crash_base, crash_size;
+   unsigned long long crash_low_size = SZ_256M;
unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
int ret;
+   bool fixed_base;
+   char *cmdline = boot_command_line;
  
-	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),

+   /* crashkernel=X[@offset] */
+   ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
&crash_size, &crash_base);
-   /* no crashkernel= or invalid value specified */
-   if (ret || !crash_size)
-   return;
+   if (ret || !crash_size) {
+   unsigned long long low_size;
  
+		/* crashkernel=X,high */

+   ret = parse_crashkernel_high(cmdline, 0, &crash_size, 
&crash_base);
+   if (ret || !crash_size)
+   return;
+
+   /* crashkernel=X,lo

Re: [PATCH v20 4/5] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

From: Chen Zhou 

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices and never mapped by the first kernel.
This memory range is advertised to crash dump kernel via DT property
under /chosen,
 linux,usable-memory-range = 

We reused the DT property linux,usable-memory-range and made the low
memory region as the second range "BASE2 SIZE2", which keeps compatibility
with existing user-space and older kdump kernels.

Crash dump kernel reads this property at boot time and call memblock_add()
to add the low memory region after memblock_cap_memory_range() has been
called.

Signed-off-by: Chen Zhou 
Co-developed-by: Zhen Lei 
Signed-off-by: Zhen Lei 
Reviewed-by: Rob Herring 
Tested-by: Dave Kleikamp 


Acked-by: John Donnelly  


---
  drivers/of/fdt.c | 33 +++--
  1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index ad85ff6474ff139..df4b9d2418a13d4 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -973,16 +973,24 @@ static void __init 
early_init_dt_check_for_elfcorehdr(unsigned long node)
  
  static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
  
+/*

+ * The main usage of linux,usable-memory-range is for crash dump kernel.
+ * Originally, the number of usable-memory regions is one. Now there may
+ * be two regions, low region and high region.
+ * To make compatibility with existing user-space and older kdump, the low
+ * region is always the last range of linux,usable-memory-range if exist.
+ */
+#define MAX_USABLE_RANGES  2
+
  /**
   * early_init_dt_check_for_usable_mem_range - Decode usable memory range
   * location from flat tree
   */
  void __init early_init_dt_check_for_usable_mem_range(void)
  {
-   const __be32 *prop;
-   int len;
-   phys_addr_t cap_mem_addr;
-   phys_addr_t cap_mem_size;
+   struct memblock_region rgn[MAX_USABLE_RANGES] = {0};
+   const __be32 *prop, *endp;
+   int len, i;
unsigned long node = chosen_node_offset;
  
  	if ((long)node < 0)

@@ -991,16 +999,21 @@ void __init early_init_dt_check_for_usable_mem_range(void)
pr_debug("Looking for usable-memory-range property... ");
  
  	prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);

-   if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells)))
+   if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells)))
return;
  
-	cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop);

-   cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop);
+   endp = prop + (len / sizeof(__be32));
+   for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) {
+   rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop);
+   rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop);
  
-	pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,

-&cap_mem_size);
+   pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n",
+i, &rgn[i].base, &rgn[i].size);
+   }
  
-	memblock_cap_memory_range(cap_mem_addr, cap_mem_size);

+   memblock_cap_memory_range(rgn[0].base, rgn[0].size);
+   for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++)
+   memblock_add(rgn[i].base, rgn[i].size);
  }
  
  #ifdef CONFIG_SERIAL_EARLYCON



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 5/5] kdump: update Documentation about crashkernel

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

From: Chen Zhou 

For arm64, the behavior of crashkernel=X has been changed, which
tries low allocation in DMA zone and fall back to high allocation
if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

So update the Documentation.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 


Acked-by: John Donnelly  


---
  Documentation/admin-guide/kdump/kdump.rst   | 11 +--
  Documentation/admin-guide/kernel-parameters.txt | 11 +--
  2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index cb30ca3df27c9b2..d4c287044be0c70 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -361,8 +361,15 @@ Boot into System Kernel
 kernel will automatically locate the crash kernel image within the
 first 512MB of RAM if X is not given.
  
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of

-   the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
+   On arm64, use "crashkernel=X" to try low allocation in DMA zone and
+   fall back to high allocation if it fails.
+   We can also use "crashkernel=X,high" to select a high region above
+   DMA zone, which also tries to allocate at least 256M low memory in
+   DMA zone automatically.
+   "crashkernel=Y,low" can be used to allocate specified size low memory.
+   Use "crashkernel=Y@X" if you really have to reserve memory from
+   specified start address X. Note that the start address of the kernel,
+   X if explicitly specified, must be aligned to 2MiB (0x20).
  
  Load the Dump-capture Kernel

  
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index f5a27f067db9ed9..65780c2ca830be0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -792,6 +792,9 @@
[KNL, X86-64] Select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
+   [KNL, ARM64] Try low allocation in DMA zone and fall 
back
+   to high allocation if it fails when '@offset' hasn't 
been
+   specified.
See Documentation/admin-guide/kdump/kdump.rst for 
further details.
  
  	crashkernel=range1:size1[,range2:size2,...][@offset]

@@ -808,6 +811,8 @@
Otherwise memory region will be allocated below 4G, if
available.
It will be ignored if crashkernel=X is specified.
+   [KNL, ARM64] range in high memory.
+   Allow kernel to allocate physical memory region from 
top.
crashkernel=size[KMG],low
[KNL, X86-64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
@@ -816,13 +821,15 @@
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
-   at least 256M below 4G automatically.
+   least 256M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
-
+   [KNL, ARM64] range in low memory.
+   This one let user to specify a low range in DMA zone for
+   crash dump kernel.
cryptomgr.notests
[KNL] Disable crypto self-tests
  



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec