kexec does not work for kernel version with patch level >= 256

2021-03-23 Thread Patrick Sung
Hello all,

I am using the 4.9 long term kernel which is currently at 4.9.262.
When using this kernel with kexec-tools it prints out this error

  Unsupported utsname.release: 4.9.262
  Cannot load 

A quick search in the code shows that kexec/kernel_version.c doing this check:

  if (major >= 256 || minor >= 256 || patch >= 256) {

and also in kexec/kexec.h
  #define KERNEL_VERSION(major, minor, patch) \
(((major) << 16) | ((minor) << 8) | patch)

which explains the reason for the range check in kernel_version.c

Increasing the number of bits allowed in "patch" seems to fix the issue.

Thanks,
Patrick

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 12/12] kdump: Use vmlinux_build_id() to simplify

2021-03-23 Thread Stephen Boyd
We can use the vmlinux_build_id() helper here now instead of open coding
it. This consolidates code and possibly avoids calculating the build ID
twice in the case of a crash with a stacktrace.

Cc: Jiri Olsa 
Cc: Alexei Starovoitov 
Cc: Jessica Yu 
Cc: Evan Green 
Cc: Hsin-Yi Wang 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: 
Signed-off-by: Stephen Boyd 
---
 include/linux/crash_core.h |  6 +-
 kernel/crash_core.c| 41 ++
 2 files changed, 3 insertions(+), 44 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 206bde8308b2..fb8ab99bb2ee 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -39,7 +39,7 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_OSRELEASE(value) \
vmcoreinfo_append_str("OSRELEASE=%s\n", value)
 #define VMCOREINFO_BUILD_ID(value) \
-   vmcoreinfo_append_str("BUILD-ID=%s\n", value)
+   vmcoreinfo_append_str("BUILD-ID=%20phN\n", value)
 #define VMCOREINFO_PAGESIZE(value) \
vmcoreinfo_append_str("PAGESIZE=%ld\n", value)
 #define VMCOREINFO_SYMBOL(name) \
@@ -69,10 +69,6 @@ extern unsigned char *vmcoreinfo_data;
 extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
 
-/* raw contents of kernel .notes section */
-extern const void __start_notes __weak;
-extern const void __stop_notes __weak;
-
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
  void *data, size_t data_len);
 void final_note(Elf_Word *buf);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 825284baaf46..0b0e24668697 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -4,6 +4,7 @@
  * Copyright (C) 2002-2004 Eric Biederman  
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -378,51 +379,13 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
 }
 EXPORT_SYMBOL(paddr_vmcoreinfo_note);
 
-#define NOTES_SIZE (&__stop_notes - &__start_notes)
-#define BUILD_ID_MAX SHA1_DIGEST_SIZE
-#define NT_GNU_BUILD_ID 3
-
-struct elf_note_section {
-   struct elf_note n_hdr;
-   u8 n_data[];
-};
-
 /*
  * Add build ID from .notes section as generated by the GNU ld(1)
  * or LLVM lld(1) --build-id option.
  */
 static void add_build_id_vmcoreinfo(void)
 {
-   char build_id[BUILD_ID_MAX * 2 + 1];
-   int n_remain = NOTES_SIZE;
-
-   while (n_remain >= sizeof(struct elf_note)) {
-   const struct elf_note_section *note_sec =
-   &__start_notes + NOTES_SIZE - n_remain;
-   const u32 n_namesz = note_sec->n_hdr.n_namesz;
-
-   if (note_sec->n_hdr.n_type == NT_GNU_BUILD_ID &&
-   n_namesz != 0 &&
-   !strcmp((char *)¬e_sec->n_data[0], "GNU")) {
-   if (note_sec->n_hdr.n_descsz <= BUILD_ID_MAX) {
-   const u32 n_descsz = note_sec->n_hdr.n_descsz;
-   const u8 *s = ¬e_sec->n_data[n_namesz];
-
-   s = PTR_ALIGN(s, 4);
-   bin2hex(build_id, s, n_descsz);
-   build_id[2 * n_descsz] = '\0';
-   VMCOREINFO_BUILD_ID(build_id);
-   return;
-   }
-   pr_warn("Build ID is too large to include in 
vmcoreinfo: %u > %u\n",
-   note_sec->n_hdr.n_descsz,
-   BUILD_ID_MAX);
-   return;
-   }
-   n_remain -= sizeof(struct elf_note) +
-   ALIGN(note_sec->n_hdr.n_namesz, 4) +
-   ALIGN(note_sec->n_hdr.n_descsz, 4);
-   }
+   VMCOREINFO_BUILD_ID(vmlinux_build_id());
 }
 
 static int __init crash_save_vmcoreinfo_init(void)
-- 
https://chromeos.dev


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 00/12] Add build ID to stacktraces

2021-03-23 Thread Stephen Boyd
This series adds the kernel's build ID[1] to the stacktrace header printed
in oops messages, warnings, etc. and the build ID for any module that
appears in the stacktrace after the module name. The goal is to make the
stacktrace more self-contained and descriptive by including the relevant
build IDs in the kernel logs when something goes wrong. This can be used
by post processing tools like script/decode_stacktrace.sh and kernel
developers to easily locate the debug info associated with a kernel
crash and line up what line and file things started falling apart at.

To show how this can be used I've included a patch to
decode_stacktrace.sh that downloads the debuginfo from a debuginfod
server.

This also includes some patches to make the buildid.c file use more
const arguments and consolidate logic into buildid.c from kdump. These
are left to the end as they were mostly cleanup patches. I don't know
who exactly maintains this so I guess Andrew is the best option to merge
all this code.

Here's an example lkdtm stacktrace on arm64.

 WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 
lkdtm_WARNING+0x28/0x30 [lkdtm]
 Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup 
uinput xt_MASQUERADE
 CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 
aa23f7a1231c229de205662d5a9e0d4c580f19a1
 Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
 pstate: 0049 (nzcv daif +PAN -UAO -TCO BTYPE=--)
 pc : lkdtm_WARNING+0x28/0x30 [lkdtm]
 lr : lkdtm_do_action+0x24/0x40 [lkdtm]
 sp : ffc0134fbca0
 x29: ffc0134fbca0 x28: ff92d53ba240
 x27:  x26: 
 x25:  x24: ffe3622352c0
 x23: 0020 x22: ffe362233366
 x21: ffe3622352e0 x20: ffc0134fbde0
 x19: 0008 x18: 
 x17: ff929b6536fc x16: 
 x15:  x14: 0012
 x13: ffe380ed892c x12: ffe381d05068
 x11:  x10: 
 x9 : 0001 x8 : ffe362237000
 x7 :  x6 : 
 x5 :  x4 : 0001
 x3 : 0008 x2 : ff93fef25a70
 x1 : ff93fef15788 x0 : ffe3622352e0
 Call trace:
  lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
  direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
  full_proxy_write+0x74/0xa4
  vfs_write+0xec/0x2e8
  ksys_write+0x84/0xf0
  __arm64_sys_write+0x24/0x30
  el0_svc_common+0xf4/0x1c0
  do_el0_svc_compat+0x28/0x3c
  el0_svc_compat+0x10/0x1c
  el0_sync_compat_handler+0xa8/0xcc
  el0_sync_compat+0x178/0x180
 ---[ end trace 3d95032303e59e68 ]---

Changes from v1 
(https://lore.kernel.org/r/20210301174749.1269154-1-swb...@chromium.org):
 * New printk format %pSb and %pSr
 * Return binary format instead of hex format string from build ID APIs
 * Some new patches to cleanup buildid/decode_stacktrace.sh
 * A new patch to decode_stacktrace.sh to parse output

[1] https://fedoraproject.org/wiki/Releases/FeatureBuildId

Cc: Alexei Starovoitov 
Cc: Andy Shevchenko 
Cc: Baoquan He 
Cc: Borislav Petkov 
Cc: Catalin Marinas 
Cc: Dave Young 
Cc: Evan Green 
Cc: Hsin-Yi Wang 
Cc: Ingo Molnar 
Cc: Jessica Yu 
Cc: Jiri Olsa 
Cc: 
Cc: Konstantin Khlebnikov 
Cc: 
Cc: 
Cc: 
Cc: Matthew Wilcox 
Cc: Petr Mladek 
Cc: Rasmus Villemoes 
Cc: Sasha Levin 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: Will Deacon 
Cc: 

Stephen Boyd (12):
  buildid: Add API to parse build ID out of buffer
  buildid: Add method to get running kernel's build ID
  dump_stack: Add vmlinux build ID to stack traces
  module: Add printk format to add module build ID to stacktraces
  arm64: stacktrace: Use %pSb for backtrace printing
  x86/dumpstack: Use %pSb for backtrace printing
  scripts/decode_stacktrace.sh: Support debuginfod
  scripts/decode_stacktrace.sh: Silence stderr messages from
addr2line/nm
  scripts/decode_stacktrace.sh: Indicate 'auto' can be used for base
path
  buildid: Mark some arguments const
  buildid: Fix kernel-doc notation
  kdump: Use vmlinux_build_id() to simplify

 Documentation/core-api/printk-formats.rst |  9 +++
 arch/arm64/kernel/stacktrace.c|  2 +-
 arch/x86/kernel/dumpstack.c   |  4 +-
 include/linux/buildid.h   |  3 +
 include/linux/crash_core.h|  6 +-
 include/linux/kallsyms.h  | 13 +++-
 include/linux/module.h|  6 +-
 kernel/crash_core.c   | 41 +--
 kernel/kallsyms.c | 73 ++-
 kernel/module.c   | 24 +-
 lib/buildid.c | 75 +++
 lib/dump_stack.c  |  5 +-
 lib/vsprintf.c|  6 +-
 scripts/decode_stacktrace.sh  | 89 +++
 14 files changed, 251 insertions(+), 105 deletions(-)


base-commi

[PATCH v2 02/12] buildid: Add method to get running kernel's build ID

2021-03-23 Thread Stephen Boyd
Add vmlinux_build_id() so that callers can print a hex format string
representation of the running kernel's build ID. This will be used in
the kdump and dump_stack code so that developers can easily locate the
vmlinux debug symbols for a crash/stacktrace.

Cc: Jiri Olsa 
Cc: Alexei Starovoitov 
Cc: Jessica Yu 
Cc: Evan Green 
Cc: Hsin-Yi Wang 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: 
Signed-off-by: Stephen Boyd 
---
 include/linux/buildid.h |  2 ++
 lib/buildid.c   | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/buildid.h b/include/linux/buildid.h
index ebce93f26d06..2ff6b1b7cc9b 100644
--- a/include/linux/buildid.h
+++ b/include/linux/buildid.h
@@ -10,4 +10,6 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char 
*build_id,
   __u32 *size);
 int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size);
 
+const unsigned char *vmlinux_build_id(void);
+
 #endif
diff --git a/lib/buildid.c b/lib/buildid.c
index 010ab0674cb9..fa1b6466b4b8 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define BUILD_ID 3
 
@@ -171,3 +172,21 @@ int build_id_parse_buf(const void *buf, unsigned char 
*build_id, u32 buf_size)
 {
return parse_build_id_buf(build_id, NULL, buf, buf_size);
 }
+
+/**
+ * vmlinux_build_id - Get the running kernel's build ID
+ *
+ * Return: Running kernel's build ID
+ */
+const unsigned char *vmlinux_build_id(void)
+{
+   extern const void __start_notes __weak;
+   extern const void __stop_notes __weak;
+   unsigned int size = &__stop_notes - &__start_notes;
+   static unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX];
+
+   if (!memchr_inv(vmlinux_build_id, 0, BUILD_ID_SIZE_MAX))
+   build_id_parse_buf(&__start_notes, vmlinux_build_id, size);
+
+   return vmlinux_build_id;
+}
-- 
https://chromeos.dev


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec-xen: Use correct image type for Live Update

2021-03-23 Thread Raphael Ning
From: Raphael Ning 

Unlike xen_kexec_load(), xen_kexec_unload() and xen_kexec_status()
fail to distinguish between normal kexec and Xen Live Update image
types.

Fix that by introducing a new helper function that maps internal
flags to KEXEC_TYPE_*, and using it throughout kexec-xen.c.

Signed-off-by: Raphael Ning 
---
 kexec/kexec-xen.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/kexec/kexec-xen.c b/kexec/kexec-xen.c
index da514d052e3d..47da3da466f0 100644
--- a/kexec/kexec-xen.c
+++ b/kexec/kexec-xen.c
@@ -91,6 +91,17 @@ out:
return rc;
 }
 
+static uint8_t xen_get_kexec_type(unsigned long kexec_flags)
+{
+   if (kexec_flags & KEXEC_ON_CRASH)
+   return KEXEC_TYPE_CRASH;
+
+   if (kexec_flags & KEXEC_LIVE_UPDATE)
+   return KEXEC_TYPE_LIVE_UPDATE;
+
+   return KEXEC_TYPE_DEFAULT;
+}
+
 #define IDENTMAP_1MiB (1024 * 1024)
 
 int xen_kexec_load(struct kexec_info *info)
@@ -177,12 +188,7 @@ int xen_kexec_load(struct kexec_info *info)
seg++;
}
 
-   if (info->kexec_flags & KEXEC_ON_CRASH)
-   type = KEXEC_TYPE_CRASH;
-   else if (info->kexec_flags & KEXEC_LIVE_UPDATE )
-   type = KEXEC_TYPE_LIVE_UPDATE;
-   else
-   type = KEXEC_TYPE_DEFAULT;
+   type = xen_get_kexec_type(info->kexec_flags);
 
arch = (info->kexec_flags & KEXEC_ARCH_MASK) >> 16;
 #if defined(__i386__) || defined(__x86_64__)
@@ -211,8 +217,7 @@ int xen_kexec_unload(uint64_t kexec_flags)
if (!xch)
return -1;
 
-   type = (kexec_flags & KEXEC_ON_CRASH) ? KEXEC_TYPE_CRASH
-   : KEXEC_TYPE_DEFAULT;
+   type = xen_get_kexec_type(kexec_flags);
 
ret = xc_kexec_unload(xch, type);
 
@@ -232,7 +237,7 @@ int xen_kexec_status(uint64_t kexec_flags)
if (!xch)
return -1;
 
-   type = (kexec_flags & KEXEC_ON_CRASH) ? KEXEC_TYPE_CRASH : 
KEXEC_TYPE_DEFAULT;
+   type = xen_get_kexec_type(kexec_flags);
 
ret = xc_kexec_status(xch, type);
 
-- 
2.23.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec: Make --status work with normal kexec images

2021-03-23 Thread Raphael Ning
From: Raphael Ning 

According to kexec(8) manpage, --status (-S) works with both
normal kexec (loaded by -l) and crash kernel (loaded by -p) image
types, and defaults to the latter. However, the implementation does
not match the description: `kexec -l -S` queries the -p image type
as if -l were not specified. This is because there is no internal
flag defined for the normal kexec type, and -S treats the zero flag
as the trigger for the default behaviour (-p).

Fix that by making sure the default behaviour for -S is not applied
when the -l option is present.

Signed-off-by: Raphael Ning 
---
 kexec/kexec.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kexec/kexec.c b/kexec/kexec.c
index ffc689ac3d7f..0c4b9619bfd7 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -1336,6 +1336,7 @@ static void print_crashkernel_region_size(void)
 
 int main(int argc, char *argv[])
 {
+   int has_opt_load = 0;
int do_load = 1;
int do_exec = 0;
int do_load_jump_back_helper = 0;
@@ -1393,6 +1394,7 @@ int main(int argc, char *argv[])
do_exec = 1;
break;
case OPT_LOAD:
+   has_opt_load = 1;
do_load = 1;
do_exec = 0;
do_shutdown = 0;
@@ -1512,7 +1514,7 @@ int main(int argc, char *argv[])
do_sync = 0;
 
if (do_status) {
-   if (kexec_flags == 0)
+   if (kexec_flags == 0 && !has_opt_load)
kexec_flags = KEXEC_ON_CRASH;
do_load = 0;
do_reuse_initrd = 0;
-- 
2.23.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec: Fix description of --status exit code

2021-03-23 Thread Raphael Ning
From: Raphael Ning 

On both Linux and Xen, an exit code of 0 from `kexec --status`
indicates that the kexec image being queried is NOT loaded, which
is contrary to what the man page and usage() say.

Signed-off-by: Raphael Ning 
---
 kexec/kexec.8 | 6 +++---
 kexec/kexec.c | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/kexec/kexec.8 b/kexec/kexec.8
index 258072587cd0..3ebede67388e 100644
--- a/kexec/kexec.8
+++ b/kexec/kexec.8
@@ -108,9 +108,9 @@ command:
 Enable debugging messages.
 .TP
 .B \-S\ (\-\-status)
-Return 0 if the type (by default crash) is loaded. Can be used in conjuction
-with -l or -p to toggle the type. Note this option supersedes other options
-and it will
+Return 1 if the type (by default crash) is loaded, 0 if not. Can be used in
+conjuction with -l or -p to toggle the type. Note this option supersedes other
+options and it will
 .BR not\ load\ or\ unload\ the\ kernel.
 .TP
 .B \-e\ (\-\-exec)
diff --git a/kexec/kexec.c b/kexec/kexec.c
index fd7c8d2b7a79..ffc689ac3d7f 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -1040,7 +1040,8 @@ void usage(void)
   "  syscall is not supported or the kernel 
did not\n"
   "  understand the image\n"
   " -d, --debug  Enable debugging to help spot a 
failure.\n"
-  " -S, --status Return 0 if the type (by default crash) 
is loaded.\n"
+  " -S, --status Return 1 if the type (by default crash) 
is loaded,\n"
+  "  0 if not.\n"
   "\n"
   "Supported kernel file types and options: \n");
for (i = 0; i < file_types; i++) {
-- 
2.23.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

2021-03-23 Thread Baoquan He
On 03/22/21 at 05:01pm, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
> 
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
>IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
>locate_mem_hole_callback(), so even after this change, we won't be
>placing kexec images onto dax/kmem and virtio-mem added memory. No
>change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
>not adding relevant ranges to the crash elf info, resulting in them
>not getting dumped via kdump.
> 
> This change fixes loading a crashkernel via kexec_file_load() and including
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
> 
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
> 
> Cc: Andrew Morton 
> Cc: Greg Kroah-Hartman 
> Cc: Dan Williams 
> Cc: Daniel Vetter 
> Cc: Andy Shevchenko 
> Cc: Mauro Carvalho Chehab 
> Cc: Signed-off-by: David Hildenbrand 
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: Vivek Goyal 
> Cc: Dave Hansen 
> Cc: Keith Busch 
> Cc: Michal Hocko 
> Cc: Qian Cai 
> Cc: Oscar Salvador 
> Cc: Eric Biederman 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Tom Lendacky 
> Cc: Brijesh Singh 
> Cc: x...@kernel.org
> Cc: kexec@lists.infradead.org
> Signed-off-by: David Hildenbrand 
> ---
>  kernel/resource.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
>  {
>   unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>  
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
>arg, func);

Thanks, David, this is a good fix.

Acked-by: Baoquan He 

>  }
>  
> -- 
> 2.29.2
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

2021-03-23 Thread David Hildenbrand

On 23.03.21 12:06, Andy Shevchenko wrote:

On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:

It used to be true that we can have busy system RAM only on the first level
in the resourc tree. However, this is no longer holds for driver-managed
system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
lower levels.

We have two users of walk_system_ram_res(), which currently only
consideres the first level:
a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
locate_mem_hole_callback(), so even after this change, we won't be
placing kexec images onto dax/kmem and virtio-mem added memory. No
change.
b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
not adding relevant ranges to the crash elf info, resulting in them
not getting dumped via kdump.

This change fixes loading a crashkernel via kexec_file_load() and including


"...fixes..." effectively means to me that Fixes tag should be provided.


We can certainly add, although it doesn't really affect the running 
kernel, but only crashdumps taken in the kdump kernel:


Fixes: ebf71552bb0e ("virtio-mem: Add parent resource for all added 
"System RAM"")
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use 
like normal RAM")


Thanks

--
Thanks,

David / dhildenb


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 2/3] kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources

2021-03-23 Thread David Hildenbrand

On 23.03.21 12:08, Andy Shevchenko wrote:

On Mon, Mar 22, 2021 at 05:01:59PM +0100, David Hildenbrand wrote:

It used to be true that we can have system RAM only on the first level
in the resourc tree. However, this is no longer holds for driver-managed
system RAM (i.e., dax/kmem and virtio-mem).

The function walk_mem_res() only consideres the first level and is
used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
fail to identify System RAM added by dax/kmem and virtio-mem as
"IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
"normal RAM" in __ioremap_caller().


Here I dunno, but consider to add Fixes tag if it fixes known bad behaviour.


Haven't observed it in the wild. We can add the same fixes tags as to 
patch #1.


--
Thanks,

David / dhildenb


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 3/3] kernel/resource: remove first_lvl / siblings_only logic

2021-03-23 Thread David Hildenbrand

On 23.03.21 12:11, Andy Shevchenko wrote:

On Mon, Mar 22, 2021 at 05:02:00PM +0100, David Hildenbrand wrote:

All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
whole resource tree, not just the first level. Let's drop the unused
first_lvl / siblings_only logic.

All functions properly search the whole tree, so remove documentation
that indicates that some functions behave differently.



Like this clean up!
Reviewed-by: Andy Shevchenko 

Although a few nit-picks below.


Cc: Andrew Morton 
Cc: Greg Kroah-Hartman 
Cc: Dan Williams 
Cc: Daniel Vetter 
Cc: Andy Shevchenko 
Cc: Mauro Carvalho Chehab 
Cc: Signed-off-by: David Hildenbrand 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Dave Hansen 
Cc: Keith Busch 
Cc: Michal Hocko 
Cc: Qian Cai 
Cc: Oscar Salvador 
Cc: Eric Biederman 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Tom Lendacky 
Cc: Brijesh Singh 
Cc: x...@kernel.org
Cc: kexec@lists.infradead.org
Signed-off-by: David Hildenbrand 
---
  kernel/resource.c | 45 -
  1 file changed, 12 insertions(+), 33 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 16e0c7e8ed24..7e00239a023a 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -64,12 +64,8 @@ static DEFINE_RWLOCK(resource_lock);
  static struct resource *bootmem_resource_free;
  static DEFINE_SPINLOCK(bootmem_resource_lock);
  
-static struct resource *next_resource(struct resource *p, bool sibling_only)

+static struct resource *next_resource(struct resource *p)
  {
-   /* Caller wants to traverse through siblings only */
-   if (sibling_only)
-   return p->sibling;
-
if (p->child)
return p->child;
while (!p->sibling && p->parent)
@@ -81,7 +77,7 @@ static void *r_next(struct seq_file *m, void *v, loff_t *pos)
  {
struct resource *p = v;
(*pos)++;
-   return (void *)next_resource(p, false);
+   return (void *)next_resource(p);
  }
  
  #ifdef CONFIG_PROC_FS

@@ -330,14 +326,10 @@ EXPORT_SYMBOL(release_resource);
   * of the resource that's within [@start..@end]; if none is found, returns
   * -ENODEV.  Returns -EINVAL for invalid parameters.
   *
- * This function walks the whole tree and not just first level children
- * unless @first_lvl is true.
- *
   * @start:start address of the resource searched for
   * @end:  end address of same resource
   * @flags:flags which the resource must have
   * @desc: descriptor the resource must have
- * @first_lvl: walk only the first level children, if set
   * @res:  return ptr, if resource found
   *
   * The caller must specify @start, @end, @flags, and @desc
@@ -345,9 +337,8 @@ EXPORT_SYMBOL(release_resource);
   */
  static int find_next_iomem_res(resource_size_t start, resource_size_t end,
   unsigned long flags, unsigned long desc,
-  bool first_lvl, struct resource *res)
+  struct resource *res)
  {
-   bool siblings_only = true;
struct resource *p;
  
  	if (!res)

@@ -358,7 +349,7 @@ static int find_next_iomem_res(resource_size_t start, 
resource_size_t end,
  
  	read_lock(&resource_lock);
  
-	for (p = iomem_resource.child; p; p = next_resource(p, siblings_only)) {

+   for (p = iomem_resource.child; p; p = next_resource(p)) {
/* If we passed the resource we are looking for, stop */
if (p->start > end) {
p = NULL;
@@ -369,13 +360,6 @@ static int find_next_iomem_res(resource_size_t start, 
resource_size_t end,
if (p->end < start)
continue;
  
-		/*

-* Now that we found a range that matches what we look for,
-* check the flags and the descriptor. If we were not asked to
-* use only the first level, start looking at children as well.
-*/
-   siblings_only = first_lvl;
-
if ((p->flags & flags) != flags)
continue;
if ((desc != IORES_DESC_NONE) && (desc != p->desc))
@@ -402,14 +386,14 @@ static int find_next_iomem_res(resource_size_t start, 
resource_size_t end,
  
  static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,

 unsigned long flags, unsigned long desc,
-bool first_lvl, void *arg,



+void *arg,
 int (*func)(struct resource *, void *))


Can it be one line?


  {
struct resource res;
int ret = -EINVAL;
  
  	while (start < end &&

-  !find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
+  !find_next_iomem_res(start, end, flags, desc, &res)) {
ret = (*func)(&res, arg);
if (ret)
break;
@@

Re: [PATCH v1 3/3] kernel/resource: remove first_lvl / siblings_only logic

2021-03-23 Thread Andy Shevchenko
On Mon, Mar 22, 2021 at 05:02:00PM +0100, David Hildenbrand wrote:
> All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
> whole resource tree, not just the first level. Let's drop the unused
> first_lvl / siblings_only logic.
> 
> All functions properly search the whole tree, so remove documentation
> that indicates that some functions behave differently.


Like this clean up!
Reviewed-by: Andy Shevchenko 

Although a few nit-picks below.

> Cc: Andrew Morton 
> Cc: Greg Kroah-Hartman 
> Cc: Dan Williams 
> Cc: Daniel Vetter 
> Cc: Andy Shevchenko 
> Cc: Mauro Carvalho Chehab 
> Cc: Signed-off-by: David Hildenbrand 
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: Vivek Goyal 
> Cc: Dave Hansen 
> Cc: Keith Busch 
> Cc: Michal Hocko 
> Cc: Qian Cai 
> Cc: Oscar Salvador 
> Cc: Eric Biederman 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Tom Lendacky 
> Cc: Brijesh Singh 
> Cc: x...@kernel.org
> Cc: kexec@lists.infradead.org
> Signed-off-by: David Hildenbrand 
> ---
>  kernel/resource.c | 45 -
>  1 file changed, 12 insertions(+), 33 deletions(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 16e0c7e8ed24..7e00239a023a 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -64,12 +64,8 @@ static DEFINE_RWLOCK(resource_lock);
>  static struct resource *bootmem_resource_free;
>  static DEFINE_SPINLOCK(bootmem_resource_lock);
>  
> -static struct resource *next_resource(struct resource *p, bool sibling_only)
> +static struct resource *next_resource(struct resource *p)
>  {
> - /* Caller wants to traverse through siblings only */
> - if (sibling_only)
> - return p->sibling;
> -
>   if (p->child)
>   return p->child;
>   while (!p->sibling && p->parent)
> @@ -81,7 +77,7 @@ static void *r_next(struct seq_file *m, void *v, loff_t 
> *pos)
>  {
>   struct resource *p = v;
>   (*pos)++;
> - return (void *)next_resource(p, false);
> + return (void *)next_resource(p);
>  }
>  
>  #ifdef CONFIG_PROC_FS
> @@ -330,14 +326,10 @@ EXPORT_SYMBOL(release_resource);
>   * of the resource that's within [@start..@end]; if none is found, returns
>   * -ENODEV.  Returns -EINVAL for invalid parameters.
>   *
> - * This function walks the whole tree and not just first level children
> - * unless @first_lvl is true.
> - *
>   * @start:   start address of the resource searched for
>   * @end: end address of same resource
>   * @flags:   flags which the resource must have
>   * @desc:descriptor the resource must have
> - * @first_lvl:   walk only the first level children, if set
>   * @res: return ptr, if resource found
>   *
>   * The caller must specify @start, @end, @flags, and @desc
> @@ -345,9 +337,8 @@ EXPORT_SYMBOL(release_resource);
>   */
>  static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>  unsigned long flags, unsigned long desc,
> -bool first_lvl, struct resource *res)
> +struct resource *res)
>  {
> - bool siblings_only = true;
>   struct resource *p;
>  
>   if (!res)
> @@ -358,7 +349,7 @@ static int find_next_iomem_res(resource_size_t start, 
> resource_size_t end,
>  
>   read_lock(&resource_lock);
>  
> - for (p = iomem_resource.child; p; p = next_resource(p, siblings_only)) {
> + for (p = iomem_resource.child; p; p = next_resource(p)) {
>   /* If we passed the resource we are looking for, stop */
>   if (p->start > end) {
>   p = NULL;
> @@ -369,13 +360,6 @@ static int find_next_iomem_res(resource_size_t start, 
> resource_size_t end,
>   if (p->end < start)
>   continue;
>  
> - /*
> -  * Now that we found a range that matches what we look for,
> -  * check the flags and the descriptor. If we were not asked to
> -  * use only the first level, start looking at children as well.
> -  */
> - siblings_only = first_lvl;
> -
>   if ((p->flags & flags) != flags)
>   continue;
>   if ((desc != IORES_DESC_NONE) && (desc != p->desc))
> @@ -402,14 +386,14 @@ static int find_next_iomem_res(resource_size_t start, 
> resource_size_t end,
>  
>  static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
>unsigned long flags, unsigned long desc,
> -  bool first_lvl, void *arg,

> +  void *arg,
>int (*func)(struct resource *, void *))

Can it be one line?

>  {
>   struct resource res;
>   int ret = -EINVAL;
>  
>   while (start < end &&
> -!find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
> +!find_next_iomem_res(start, end, f

Re: [PATCH v1 2/3] kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources

2021-03-23 Thread Andy Shevchenko
On Mon, Mar 22, 2021 at 05:01:59PM +0100, David Hildenbrand wrote:
> It used to be true that we can have system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., dax/kmem and virtio-mem).
> 
> The function walk_mem_res() only consideres the first level and is
> used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
> fail to identify System RAM added by dax/kmem and virtio-mem as
> "IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
> "normal RAM" in __ioremap_caller().

Here I dunno, but consider to add Fixes tag if it fixes known bad behaviour.

> Let's find all busy IORESOURCE_MEM resources, making the function
> behave similar to walk_system_ram_res().
> 
> Cc: Andrew Morton 
> Cc: Greg Kroah-Hartman 
> Cc: Dan Williams 
> Cc: Daniel Vetter 
> Cc: Andy Shevchenko 
> Cc: Mauro Carvalho Chehab 
> Cc: Signed-off-by: David Hildenbrand 
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: Vivek Goyal 
> Cc: Dave Hansen 
> Cc: Keith Busch 
> Cc: Michal Hocko 
> Cc: Qian Cai 
> Cc: Oscar Salvador 
> Cc: Eric Biederman 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Tom Lendacky 
> Cc: Brijesh Singh 
> Cc: x...@kernel.org
> Cc: kexec@lists.infradead.org
> Signed-off-by: David Hildenbrand 
> ---
>  kernel/resource.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 4efd6e912279..16e0c7e8ed24 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -470,7 +470,7 @@ int walk_mem_res(u64 start, u64 end, void *arg,
>  {
>   unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>  
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
>arg, func);
>  }
>  
> -- 
> 2.29.2
> 

-- 
With Best Regards,
Andy Shevchenko



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

2021-03-23 Thread Andy Shevchenko
On Tue, Mar 23, 2021 at 10:40:33AM +0100, David Hildenbrand wrote:
> On 22.03.21 17:01, David Hildenbrand wrote:
> > It used to be true that we can have busy system RAM only on the first level
> > in the resourc tree. However, this is no longer holds for driver-managed
> > system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> > lower levels.
> > 
> > We have two users of walk_system_ram_res(), which currently only
> > consideres the first level:
> > a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> > IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> > locate_mem_hole_callback(), so even after this change, we won't be
> > placing kexec images onto dax/kmem and virtio-mem added memory. No
> > change.
> > b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> > not adding relevant ranges to the crash elf info, resulting in them
> > not getting dumped via kdump.
> > 
> > This change fixes loading a crashkernel via kexec_file_load() and including
> > dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> > that e.g,, arm64 relies on memblock data and, therefore, always considers
> > all added System RAM already.
> > 
> > Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> > behave like walk_system_ram_range().
> > 
> > Cc: Andrew Morton 
> > Cc: Greg Kroah-Hartman 
> > Cc: Dan Williams 
> > Cc: Daniel Vetter 
> > Cc: Andy Shevchenko 
> > Cc: Mauro Carvalho Chehab 
> > Cc: Signed-off-by: David Hildenbrand 
> 
> ^ My copy-paste action when creating the cc list slipped in a duplicate  SO
> in all 3 patches. I can resend if desired.

I think to address my comments you will need to resend anyway (as v2).

-- 
With Best Regards,
Andy Shevchenko



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

2021-03-23 Thread Andy Shevchenko
On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
> 
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
>IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
>locate_mem_hole_callback(), so even after this change, we won't be
>placing kexec images onto dax/kmem and virtio-mem added memory. No
>change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
>not adding relevant ranges to the crash elf info, resulting in them
>not getting dumped via kdump.
> 
> This change fixes loading a crashkernel via kexec_file_load() and including

"...fixes..." effectively means to me that Fixes tag should be provided.

> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
> 
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
> 
> Cc: Andrew Morton 
> Cc: Greg Kroah-Hartman 
> Cc: Dan Williams 
> Cc: Daniel Vetter 
> Cc: Andy Shevchenko 
> Cc: Mauro Carvalho Chehab 
> Cc: Signed-off-by: David Hildenbrand 
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: Vivek Goyal 
> Cc: Dave Hansen 
> Cc: Keith Busch 
> Cc: Michal Hocko 
> Cc: Qian Cai 
> Cc: Oscar Salvador 
> Cc: Eric Biederman 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Tom Lendacky 
> Cc: Brijesh Singh 
> Cc: x...@kernel.org
> Cc: kexec@lists.infradead.org
> Signed-off-by: David Hildenbrand 
> ---
>  kernel/resource.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
>  {
>   unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>  
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
>arg, func);
>  }
>  
> -- 
> 2.29.2
> 

-- 
With Best Regards,
Andy Shevchenko



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-23 Thread Petr Mladek
On Wed 2021-03-17 00:33:25, John Ogness wrote:
> With @logbuf_lock removed, the high level printk functions for
> storing messages are lockless. Messages can be stored from any
> context, so there is no need for the NMI and safe buffers anymore.
> Remove the NMI and safe buffers.
> 
> Although the safe buffers are removed, the NMI and safe context
> tracking is still in place. In these contexts, store the message
> immediately but still use irq_work to defer the console printing.
> 
> Since printk recursion tracking is in place, safe context tracking
> for most of printk is not needed. Remove it. Only safe context
> tracking relating to the console lock is left in place. This is
> because the console lock is needed for the actual printing.

I have two more questions after actually checking the entire patch.
See below.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1084,7 +1069,6 @@ void __init setup_log_buf(int early)
>   struct printk_record r;
>   size_t new_descs_size;
>   size_t new_infos_size;
> - unsigned long flags;
>   char *new_log_buf;
>   unsigned int free;
>   u64 seq;
> @@ -1142,8 +1126,6 @@ void __init setup_log_buf(int early)
>new_descs, ilog2(new_descs_count),
>new_infos);
>  
> - printk_safe_enter_irqsave(flags);
> -
>   log_buf_len = new_log_buf_len;
>   log_buf = new_log_buf;
>   new_log_buf_len = 0;
> @@ -1159,8 +1141,6 @@ void __init setup_log_buf(int early)
>*/
>   prb = &printk_rb_dynamic;
>  
> - printk_safe_exit_irqrestore(flags);

This will allow to add new messages from the IRQ context when we
are copying them to the new buffer. They might get lost in
the small race window.

Also the messages from NMI might get lost because they are not
longer stored in the per-CPU buffer.

A possible solution might be to do something like this:

prb_for_each_record(0, &printk_rb_static, seq, &r)
free -= add_to_rb(&printk_rb_dynamic, &r);

prb = &printk_rb_dynamic;

/*
 * Copy the remaining messages that might have appeared
 * from IRQ or NMI context after we ended copying and
 * before we switched the buffers. They must be finalized
 * because only one CPU is up at this stage.
 */
prb_for_each_record(seq, &printk_rb_static, seq, &r)
free -= add_to_rb(&printk_rb_dynamic, &r);


> -
>   if (seq != prb_next_seq(&printk_rb_static)) {
>   pr_err("dropped %llu messages\n",
>  prb_next_seq(&printk_rb_static) - seq);
> @@ -2666,7 +2631,6 @@ void console_unlock(void)
>   size_t ext_len = 0;
>   size_t len;
>  
> - printk_safe_enter_irqsave(flags);
>  skip:
>   if (!prb_read_valid(prb, console_seq, &r))
>   break;
> @@ -2711,6 +2675,8 @@ void console_unlock(void)
>   printk_time);
>   console_seq++;
>  
> + printk_safe_enter_irqsave(flags);

What is the purpose of the printk_safe context here, please?

I guess that you wanted to prevent calling console drivers
recursively. But it is already serialized by console_lock().

IMHO, the only risk is when manipulating console_sem->lock
or console_owner_lock. But they are already guarded by
printk_safe context, for example, in console_lock() or
console_lock_spinning_enable().

Do I miss something, please?


> +
>   /*
>* While actively printing out messages, if another printk()
>* were to occur on another CPU, it may wait for this one to
> @@ -2745,8 +2711,6 @@ void console_unlock(void)
>* flush, no worries.
>*/
>   retry = prb_read_valid(prb, console_seq, NULL);
> - printk_safe_exit_irqrestore(flags);
> -
>   if (retry && console_trylock())
>   goto again;
>  }

Heh, all these patches feels like stripping printk of an armour. I hope
that we trained it enough to be flexible and avoid any damage.

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1 2/3] crashdump/x86: iterate only over actual crash memory ranges

2021-03-23 Thread David Hildenbrand
No need to iterate over empty entries.

Cc: Simon Horman 
Signed-off-by: David Hildenbrand 
---
 kexec/arch/i386/crashdump-x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index a301ac8..43e830a 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -988,7 +988,7 @@ int load_crashdump_segments(struct kexec_info *info, char* 
mod_cmdline,
cmdline_add_elfcorehdr(mod_cmdline, elfcorehdr);
 
/* Inform second kernel about the presence of ACPI tables. */
-   for (i = 0; i < CRASH_MAX_MEMORY_RANGES; i++) {
+   for (i = 0; i < nr_ranges; i++) {
unsigned long start, end, size, type;
if ( !( mem_range[i].type == RANGE_ACPI
|| mem_range[i].type == RANGE_ACPI_NVS
-- 
2.29.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1 0/3] crashdump/x86: dump dax/kmem and virito-mem added System RAM

2021-03-23 Thread David Hildenbrand
Let's properly support dumping any kind of "System RAM" that is not on the
top level of the kernel resource tree in /proc/iomem, processing any
/proc/iomem entry that contains "System RAM". In addition, increase
the maximum number of crash memory ranges to fully support virtio-mem
even in corner cases where we end up with a lot of individual memory
ranges.

David Hildenbrand (3):
  crashdump/x86: dump any kind of "System RAM"
  crashdump/x86: iterate only over actual crash memory ranges
  crashdump/x86: increase CRASH_MAX_MEMORY_RANGES to 32k

 kexec/arch/i386/crashdump-x86.c | 18 +-
 kexec/arch/i386/crashdump-x86.h |  3 ++-
 2 files changed, 15 insertions(+), 6 deletions(-)

-- 
2.29.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1 3/3] crashdump/x86: increase CRASH_MAX_MEMORY_RANGES to 32k

2021-03-23 Thread David Hildenbrand
virtio-mem in Linux adds/removes individual memory blocks (e.g., 128 MB
each). Linux merges adjacent memory blocks added by virtio-mem devices, but
we can still end up with a very sparse memory layout when unplugging
memory in corner cases.

Let's increase the maximum number of crash memory ranges from ~2k to 32k.
32k should be sufficient for a very long time.

e_phnum field in the header is 16 bits wide, so we can fit a maximum of
~64k entries in there, shared with other entries (i.e., CPU). Therefore,
using up to 32k memory ranges is fine. (if we ever need more than ~64k,
we can switch to the sh_info field)

Move the temporary xen ranges off the stack, dynamically allocating
memory for them.

Note: We don't have to increase MAX_MEMORY_RANGES, because virtio-mem
added memory is driver managed and always detected and added by a
driver in the kexec'ed kernel; for ordinary kexec, we must not expose
these ranges in the firmware-provided memmap.

Cc: Simon Horman 
Signed-off-by: David Hildenbrand 
---
 kexec/arch/i386/crashdump-x86.c | 6 --
 kexec/arch/i386/crashdump-x86.h | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index 43e830a..df84185 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -355,8 +355,8 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges,
 static int get_crash_memory_ranges_xen(struct memory_range **range,
int *ranges, unsigned long lowmem_limit)
 {
+   struct e820entry *e820entries;
int j, rc, ret = -1;
-   struct e820entry e820entries[CRASH_MAX_MEMORY_RANGES];
unsigned int i;
xc_interface *xc;
 
@@ -367,6 +367,8 @@ static int get_crash_memory_ranges_xen(struct memory_range 
**range,
return -1;
}
 
+   e820entries = xmalloc(sizeof(*e820entries) * CRASH_MAX_MEMORY_RANGES);
+
rc = xc_get_machine_memory_map(xc, e820entries, 
CRASH_MAX_MEMORY_RANGES);
 
if (rc < 0) {
@@ -395,7 +397,7 @@ static int get_crash_memory_ranges_xen(struct memory_range 
**range,
 
 err:
xc_interface_close(xc);
-
+   free(e820entries);
return ret;
 }
 #else
diff --git a/kexec/arch/i386/crashdump-x86.h b/kexec/arch/i386/crashdump-x86.h
index e4fdc82..479a549 100644
--- a/kexec/arch/i386/crashdump-x86.h
+++ b/kexec/arch/i386/crashdump-x86.h
@@ -22,7 +22,8 @@ int load_crashdump_segments(struct kexec_info *info, char 
*mod_cmdline,
 #define X86_64_KERNEL_TEXT_SIZE  (512UL*1024*1024)
 
 #define CRASH_MAX_MEMMAP_NR1024
-#define CRASH_MAX_MEMORY_RANGES(MAX_MEMORY_RANGES + 2)
+
+#define CRASH_MAX_MEMORY_RANGES32768
 
 /* Backup Region, First 640K of System RAM. */
 #define BACKUP_SRC_START   0x
-- 
2.29.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1 1/3] crashdump/x86: dump any kind of "System RAM"

2021-03-23 Thread David Hildenbrand
Traditionally, we had "System RAM" only on the top level of in the
kernel resource tree (-> /proc/iomem). Nowadays, we can also have
"System RAM" on lower levels of the tree -- driver-managed device memory
that is always detected and added via drivers. Current examples are
memory added via dax/kmem -- ("System RAM (kmem)") and virtio-mem ("System
RAM (virtio_mem)"). Note that in some kernel versions "System RAM
(kmem)" was exposed as "System RAM", but similarly, on lower levels of
the resource tree.

Let's add anything that contains "System RAM" to the elf core header, so
it will be dumped for kexec_load(). Handling kexec_file_load() in the
kernel is similarly getting fixed [1].

Loading a kdump kernel via "kexec -p -c" ... will result in the kdump
kernel to also dump dax/kmem and virtio-mem added System RAM now.

Note: We only want to dump this memory, we don't want to add this memory to
the memmap of an ordinary kexec'ed kernel ("fast system reboot").

[1] https://lkml.kernel.org/r/20210322160200.19633-1-da...@redhat.com

Cc: Dan Williams 
Cc: Dave Hansen 
Cc: Baoquan He 
Cc: Simon Horman 
Signed-off-by: David Hildenbrand 
---
 kexec/arch/i386/crashdump-x86.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index d5b5b68..a301ac8 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -271,8 +271,14 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges,
str = line + consumed;
dbgprintf("%016llx-%016llx : %s",
start, end, str);
-   /* Only Dumping memory of type System RAM. */
-   if (memcmp(str, "System RAM\n", 11) == 0) {
+   /*
+* We want to dump any System RAM -- memory regions currently
+* used by the kernel. In the usual case, this is "System RAM"
+* on the top level. However, we can also have "System RAM
+* (virtio_mem)" below virtio devices or "System RAM (kmem)"
+* below "Persistent Memory".
+*/
+   if (strstr(str, "System RAM")) {
type = RANGE_RAM;
} else if (memcmp(str, "ACPI Tables\n", 12) == 0) {
/*
-- 
2.29.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-23 Thread Petr Mladek
On Mon 2021-03-22 22:58:47, John Ogness wrote:
> On 2021-03-22, Petr Mladek  wrote:
> > On Mon 2021-03-22 12:16:15, John Ogness wrote:
> >> On 2021-03-21, Sergey Senozhatsky  wrote:
> >> >> @@ -369,7 +70,10 @@ __printf(1, 0) int vprintk_func(const char *fmt, 
> >> >> va_list args)
> >> >>  * Use the main logbuf even in NMI. But avoid calling console
> >> >>  * drivers that might have their own locks.
> >> >>  */
> >> >> -   if ((this_cpu_read(printk_context) & 
> >> >> PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
> >> >> +   if (this_cpu_read(printk_context) &
> >> >> +   (PRINTK_NMI_DIRECT_CONTEXT_MASK |
> >> >> +PRINTK_NMI_CONTEXT_MASK |
> >> >> +PRINTK_SAFE_CONTEXT_MASK)) {
> >> >

> >> But I suppose I could switch
> >> the 1 printk_nmi_direct_enter() user to printk_nmi_enter() so that
> >> PRINTK_NMI_DIRECT_CONTEXT_MASK can be removed now. I would do this in a
> >> 4th patch of the series.
> >
> > Yes, please unify the PRINTK_NMI_CONTEXT. One is enough.
> 
> Agreed. (But I'll go even further. See below.)
> 
> > I wonder if it would make sense to go even further at this stage.
> > What is possible?
> >
> > 1. We could get rid of printk_nmi_enter()/exit() and
> >PRINTK_NMI_CONTEXT completely already now. It is enough
> >to check in_nmi() in printk_func().
> >
> 
> Agreed. in_nmi() within vprintk_emit() is enough to detect if the
> console code should be skipped:
> 
> if (!in_sched && !in_nmi()) {
> ...
> }

Well, we also need to make sure that the irq work is scheduled to
call console later. We should keep this dicision in
printk_func(). I mean to replace the current

if (this_cpu_read(printk_context) &
(PRINTK_NMI_DIRECT_CONTEXT_MASK |
 PRINTK_NMI_CONTEXT_MASK |
 PRINTK_SAFE_CONTEXT_MASK)) {

with

/*
 * Avoid calling console drivers in recursive printk()
 * and in NMI context.
 */
if (this_cpu_read(printk_context) || in_nmi() {

That said, I am not sure how this fits your further rework.
I do not want to complicate it too much.

I am just afraid that the discussion about console rework might
take some time. And this would remove some complexity before we
started the more complicated or controversial changes.


> > 2. I thought about unifying printk_safe_enter()/exit() and
> >printk_enter()/exit(). They both count recursion with
> >IRQs disabled, have similar name. But they are used
> >different way.
> >
> >But better might be to rename printk_safe_enter()/exit() to
> >console_enter()/exit() or to printk_deferred_enter()/exit().
> >It would make more clear what it does now. And it might help
> >to better distinguish it from the new printk_enter()/exit().
> >
> >I am not sure if it is worth it.
> 
> I am also not sure if it is worth the extra "noise" just to give the
> function a more appropriate name. The plan is to remove it completely
> soon anyway. My vote is to leave the name as it is.

OK, let's keep printk_safe() name. It was just an idea. I wrote it
primary to sort my thoughts.

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

2021-03-23 Thread David Hildenbrand

On 22.03.21 17:01, David Hildenbrand wrote:

It used to be true that we can have busy system RAM only on the first level
in the resourc tree. However, this is no longer holds for driver-managed
system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
lower levels.

We have two users of walk_system_ram_res(), which currently only
consideres the first level:
a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
locate_mem_hole_callback(), so even after this change, we won't be
placing kexec images onto dax/kmem and virtio-mem added memory. No
change.
b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
not adding relevant ranges to the crash elf info, resulting in them
not getting dumped via kdump.

This change fixes loading a crashkernel via kexec_file_load() and including
dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
that e.g,, arm64 relies on memblock data and, therefore, always considers
all added System RAM already.

Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
behave like walk_system_ram_range().

Cc: Andrew Morton 
Cc: Greg Kroah-Hartman 
Cc: Dan Williams 
Cc: Daniel Vetter 
Cc: Andy Shevchenko 
Cc: Mauro Carvalho Chehab 
Cc: Signed-off-by: David Hildenbrand 


^ My copy-paste action when creating the cc list slipped in a duplicate 
 SO in all 3 patches. I can resend if desired.


--
Thanks,

David / dhildenb


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] include: linux: Remove duplicate include of pgtable.h

2021-03-23 Thread Baoquan He
On 03/23/21 at 11:13am, Wan Jiabing wrote:
> linux/pgtable.h has been included at line 11 with annotation.
> So we remove the duplicate one at line 8.
> 
> Signed-off-by: Wan Jiabing 

Thanks for your posting. But this resend is still not good. I have
pasted the suggested log, wondering why you just ignore it and send v2
without updating, and also not marking this is v2. Please read
Documentation/process/submitting-patches.rst before you post next time.
Anyway, I have ack-ed Tian Tao's patch since his patch log is good
enough.

Thanks
Baoquan

> ---
>  include/linux/crash_dump.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index a5192b718dbe..be79a45d7aa3 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -5,7 +5,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  
>  #include  /* for pgprot_t */
> -- 
> 2.25.1
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] crash_dump: Fix duplicate included pgtable.h

2021-03-23 Thread Baoquan He
On 03/19/21 at 08:03pm, Tian Tao wrote:
> linux/pgtable.h is included more than once, Remove the one that isn't
> necessary.
> 
> Fixes: ca5999fde0a1 ("mm: introduce include/linux/pgtable.h")
> Signed-off-by: Tian Tao 

Thanks, this looks good to me. But the Fixes tag is unnecessary since
this patch is a trivial cleanup, no need to add to stable kernel.
Otherwise,

Acked-by: Baoquan He 

> ---
>  include/linux/crash_dump.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index a5192b7..6bd8a33 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -8,8 +8,6 @@
>  #include 
>  #include 
>  
> -#include  /* for pgprot_t */
> -
>  #ifdef CONFIG_CRASH_DUMP
>  #define ELFCORE_ADDR_MAX (-1ULL)
>  #define ELFCORE_ADDR_ERR (-2ULL)
> -- 
> 2.7.4
> 
> 
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec