[PATCH 2/3] LoongArch: Add kdump support

2022-08-28 Thread Youling Tang
This patch adds support for kdump, the kernel will reserve a region
for the crash kernel and jump there on panic.

Arch-specific functions are added to allow for implementing a crash
dump file interface, /proc/vmcore, which can be viewed as a ELF file.

A user space tool, like kexec-tools, is responsible for allocating a
separate region for the core's ELF header within crash kdump kernel
memory and filling it in when executing kexec_load().

Then, its location will be advertised to crash dump kernel via a new
device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
the region for later use with fdt_reserve_elfcorehdr() at boot time.

At the same time, it will also limit the crash kdump kernel to the
crashkernel area via a new device-tree property, "linux, usable-memory-range",
so as not to destroy the original kernel dump data.

On crash dump kernel, /proc/vmcore will access the primary kernel's memory
with copy_oldmem_page().

I tested this on  LoongArch 3A5000 machine and works as expected (Suggest
crashkernel parameter is "crashkernel=512M@2320M"), you may test it by
triggering a crash through /proc/sysrq_trigger:

 $ sudo kexec -p /boot/vmlinux-kdump --reuse-cmdline --append="nr_cpus=1"
 # echo c > /proc/sysrq_trigger

Signed-off-by: Youling Tang 
---
 arch/loongarch/Kconfig  |  22 ++
 arch/loongarch/Makefile |   4 +
 arch/loongarch/kernel/Makefile  |   3 +-
 arch/loongarch/kernel/crash.c   | 100 
 arch/loongarch/kernel/crash_dump.c  |  19 +
 arch/loongarch/kernel/machine_kexec.c   |  12 ++-
 arch/loongarch/kernel/mem.c |   6 ++
 arch/loongarch/kernel/relocate_kernel.S |   6 ++
 arch/loongarch/kernel/setup.c   |  49 
 arch/loongarch/kernel/traps.c   |   4 +
 10 files changed, 217 insertions(+), 8 deletions(-)
 create mode 100644 arch/loongarch/kernel/crash.c
 create mode 100644 arch/loongarch/kernel/crash_dump.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 903c82fa958d..7c1b07a5b5bd 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -420,6 +420,28 @@ config KEXEC
 
  The name comes from the similarity to the exec system call.
 
+config CRASH_DUMP
+   bool "Build kdump crash kernel"
+   help
+ Generate crash dump after being started by kexec. This should
+ be normally only set in special crash dump kernels which are
+ loaded in the main kernel with kexec-tools into a specially
+ reserved region and then later executed after a crash by
+ kdump/kexec.
+
+ For more details see Documentation/admin-guide/kdump/kdump.rst
+
+config PHYSICAL_START
+   hex "Physical address where the kernel is loaded"
+   default "0x90009100" if 64BIT
+   depends on CRASH_DUMP
+   help
+ This gives the XKPRANGE address where the kernel is loaded.
+ If you plan to use kernel for capturing the crash dump change
+ this value to start of the reserved region (the "X" value as
+ specified in the "crashkernel=YM@XM" command line boot parameter
+ passed to the panic-ed kernel).
+
 config SECCOMP
bool "Enable seccomp to safely compute untrusted bytecode"
depends on PROC_FS
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index 4bc47f47cfd8..7dabd580426d 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -48,7 +48,11 @@ KBUILD_CFLAGS_MODULE += -fplt 
-Wa,-mla-global-with-abs,-mla-local-with-abs
 cflags-y += -ffreestanding
 cflags-y += $(call cc-option, -mno-check-zero-division)
 
+ifdef CONFIG_PHYSICAL_START
+load-y = $(CONFIG_PHYSICAL_START)
+else
 load-y = 0x9020
+endif
 bootvars-y = VMLINUX_LOAD_ADDRESS=$(load-y)
 
 drivers-$(CONFIG_PCI)  += arch/loongarch/pci/
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 20b64ac3f128..df5aea129364 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -17,7 +17,8 @@ obj-$(CONFIG_CPU_HAS_FPU) += fpu.o
 obj-$(CONFIG_MODULES)  += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o
+obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o
+obj-$(CONFIG_CRASH_DUMP)+= crash_dump.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
 
diff --git a/arch/loongarch/kernel/crash.c b/arch/loongarch/kernel/crash.c
new file mode 100644
index ..b4f249ec6301
--- /dev/null
+++ b/arch/loongarch/kernel/crash.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 Loongson Technology Corporation Limited
+ *
+ * Derived from MIPS
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static cpumask_t cpus_in_crash = 

[PATCH 0/3] LoongArch: Add kexec/kdump support

2022-08-28 Thread Youling Tang
This patch series to support kexec/kdump (only 64bit).

Kexec is a system call that enables you to load and boot into another kernel
from the currently running kernel. This is useful for kernel developers or
other people who need to reboot very quickly without waiting for the whole
BIOS boot process to finish. 

Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
dump of the system kernel's memory needs to be taken (for example, when
the system panics). The system kernel's memory image is preserved across
the reboot and is accessible to the dump-capture kernel.

For details, see Documentation/admin-guide/kdump/kdump.rst.

User tools kexec-tools see link [1].

TODO:
Currently kdump does not support the same binary image, the production kernel
and the capture kernel will be generated with different configurations. I will
support kernel relocation support in the near future. Then will go to implement
the same binary support based on kernel relocation support.

[1] Link: https://github.com/tangyouling/kexec-tools


Youling Tang (3):
  LoongArch: Add kexec support
  LoongArch: Add kdump support
  LoongArch: Enable CONFIG_KEXEC

 arch/loongarch/Kconfig |  33 
 arch/loongarch/Makefile|   4 +
 arch/loongarch/configs/loongson3_defconfig |   1 +
 arch/loongarch/include/asm/kexec.h |  58 +++
 arch/loongarch/kernel/Makefile |   3 +
 arch/loongarch/kernel/crash.c  | 100 
 arch/loongarch/kernel/crash_dump.c |  19 +++
 arch/loongarch/kernel/head.S   |   7 +-
 arch/loongarch/kernel/machine_kexec.c  | 176 +
 arch/loongarch/kernel/mem.c|   6 +
 arch/loongarch/kernel/relocate_kernel.S| 131 +++
 arch/loongarch/kernel/setup.c  |  49 ++
 arch/loongarch/kernel/traps.c  |   4 +
 13 files changed, 590 insertions(+), 1 deletion(-)
 create mode 100644 arch/loongarch/include/asm/kexec.h
 create mode 100644 arch/loongarch/kernel/crash.c
 create mode 100644 arch/loongarch/kernel/crash_dump.c
 create mode 100644 arch/loongarch/kernel/machine_kexec.c
 create mode 100644 arch/loongarch/kernel/relocate_kernel.S

-- 
2.36.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/3] LoongArch: Add kexec support

2022-08-28 Thread Youling Tang
Add three new files, kexec.h, machine_kexec.c and relocate_kernel.S to the
LoongArch architecture that add support for the kexec re-boot mechanis
(CONFIG_KEXEC) on LoongArch platforms.

Supports loading vmlinux (vmlinux.elf) in ELF format and vmlinux.efi in
PE format.

I tested this on  LoongArch 3A5000 machine and works as expected,

 $ sudo kexec -l /boot/vmlinux.efi --reuse-cmdline
 $ sudo kexec -e

Signed-off-by: Youling Tang 
---
 arch/loongarch/Kconfig  |  11 ++
 arch/loongarch/include/asm/kexec.h  |  58 
 arch/loongarch/kernel/Makefile  |   2 +
 arch/loongarch/kernel/head.S|   7 +-
 arch/loongarch/kernel/machine_kexec.c   | 178 
 arch/loongarch/kernel/relocate_kernel.S | 125 +
 6 files changed, 380 insertions(+), 1 deletion(-)
 create mode 100644 arch/loongarch/include/asm/kexec.h
 create mode 100644 arch/loongarch/kernel/machine_kexec.c
 create mode 100644 arch/loongarch/kernel/relocate_kernel.S

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 45364cffc793..903c82fa958d 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -409,6 +409,17 @@ config FORCE_MAX_ZONEORDER
  The page size is not necessarily 4KB.  Keep this in mind
  when choosing a value for this option.
 
+config KEXEC
+   bool "Kexec system call"
+   select KEXEC_CORE
+   help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel.  It is like a reboot
+ but it is independent of the system firmware.   And like a reboot
+ you can start any kernel with it, not just Linux.
+
+ The name comes from the similarity to the exec system call.
+
 config SECCOMP
bool "Enable seccomp to safely compute untrusted bytecode"
depends on PROC_FS
diff --git a/arch/loongarch/include/asm/kexec.h 
b/arch/loongarch/include/asm/kexec.h
new file mode 100644
index ..5c9e7b5eccb8
--- /dev/null
+++ b/arch/loongarch/include/asm/kexec.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * kexec.h for kexec
+ *
+ * Copyright (C) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _ASM_KEXEC_H
+#define _ASM_KEXEC_H
+
+#include 
+#include 
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+ /* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT (-1UL)
+
+/* Reserve a page for the control code buffer */
+#define KEXEC_CONTROL_PAGE_SIZE PAGE_SIZE
+
+/* The native architecture */
+#define KEXEC_ARCH KEXEC_ARCH_LOONGARCH
+
+static inline void crash_setup_regs(struct pt_regs *newregs,
+   struct pt_regs *oldregs)
+{
+   if (oldregs)
+   memcpy(newregs, oldregs, sizeof(*newregs));
+   else
+   prepare_frametrace(newregs);
+}
+
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+   unsigned long boot_flag;
+   unsigned long fdt_addr;
+};
+
+typedef void (*do_kexec_t)(unsigned long boot_flag,
+  unsigned long fdt_addr,
+  unsigned long first_ind_entry,
+  unsigned long jump_addr);
+
+struct kimage;
+extern const unsigned char relocate_new_kernel[];
+extern const size_t relocate_new_kernel_size;
+
+#ifdef CONFIG_SMP
+extern atomic_t kexec_ready_to_reboot;
+extern const unsigned char kexec_smp_wait[];
+extern void kexec_reboot(void);
+#endif
+
+#endif /* !_ASM_KEXEC_H */
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index a213e994db68..20b64ac3f128 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -17,6 +17,8 @@ obj-$(CONFIG_CPU_HAS_FPU) += fpu.o
 obj-$(CONFIG_MODULES)  += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
+obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o
+
 obj-$(CONFIG_PROC_FS)  += proc.o
 
 obj-$(CONFIG_SMP)  += smp.o
diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S
index 01bac62a6442..22bdf4928325 100644
--- a/arch/loongarch/kernel/head.S
+++ b/arch/loongarch/kernel/head.S
@@ -20,7 +20,12 @@
 
 _head:
.word   MZ_MAGIC/* "MZ", MS-DOS header */
-   .org0x3c/* 0x04 ~ 0x3b reserved */
+   .org0x8
+   .quad   0   /* Image load offset from start of RAM 
*/
+   .dword  _end - _text/* Effective size of kernel image */
+   .quad   0
+   .dword  kernel_entry/* Kernel entry point */
+   .org0x3c/* 0x28 ~ 0x3b reserved */
.long   pe_header - _head   /* Offset to the PE header */
 
 pe_header:
diff --git 

[PATCH 3/3] LoongArch: Enable CONFIG_KEXEC

2022-08-28 Thread Youling Tang
Defaults enable CONFIG_KEXEC to convenient kexec operations.

Signed-off-by: Youling Tang 
---
 arch/loongarch/configs/loongson3_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/loongarch/configs/loongson3_defconfig 
b/arch/loongarch/configs/loongson3_defconfig
index 68c9609670d4..52db7a3a79f3 100644
--- a/arch/loongarch/configs/loongson3_defconfig
+++ b/arch/loongarch/configs/loongson3_defconfig
@@ -45,6 +45,7 @@ CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
 CONFIG_NR_CPUS=64
 CONFIG_NUMA=y
+CONFIG_KEXEC=y
 CONFIG_PAGE_SIZE_16KB=y
 CONFIG_HZ_250=y
 CONFIG_ACPI=y
-- 
2.36.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH RFC 1/2] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

2022-08-28 Thread Linus Torvalds
On Sun, Aug 28, 2022 at 6:56 PM Dave Young  wrote:
>
> > John mentioned PANIC_ON().
>
> I would vote for PANIC_ON(), it sounds like a good idea, because
> BUG_ON() is not obvious and, PANIC_ON() can alert the code author that
> this will cause a kernel panic and one will be more careful before
> using it.

People, NO.

We're trying to get rid of BUG_ON() because it kills the machine.

Not replace it with another bogus thing that kills a machine.

So no PANIC_ON(). We used to have "panic()" many many years ago, we
got rid of it. We're not re-introducing it.

People who want to panic on warnings can do so. WARN_ON() _becomes_
PANIC for those people. But those people are the "we have a million
machines, we want to just fail things on any sign of trouble, and we
have MIS people who can look at the logs".

And it's not like we need to get rid of _all_ BUG_ON() cases. If you
have a "this is major internal corruption, there's no way we can
continue", then BUG_ON() is appropriate. It will try to kill that
process and try to keep the machine running, and again, the kind of
people who don't care about one machine (because - again - they have
millions of them) can just turn that into a panic-and-reboot
situation.

But the kind of people for whom the machine they are on IS THEIR ONLY
MACHINE - whether it be a workstation, a laptop, or a cellphone -
there is absolutely zero situation where "let's just kill the machine"
is *EVER* approproate. Even a BUG_ON() will try to continue as well as
it can after killing the current thread, but it's going to be iffy,
because locking etc.

So WARN_ON_ONCE() is the thing to aim for. BUG_ON() is the thing for
"oops, I really don't know what to do, and I physically *cannot*
continue" (and that is *not* "I'm too lazy to do error handling").

There is no room for PANIC. None. Ever.

The only thing there is are "I don't care about this machine because
I've got 999,999 other machines, so I'd rather take one machine
offline for analysis".

Understand? The "should I panic and reboot" is fundamentally not about
the code, and it's not a choice that the kernel code gets to make.
It's purely about the choice of the person maintaining the machine.

As a kernel developer, you do not EVER get to say "panic" or "kill the machine".

End of story.

 Linus

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH RFC 1/2] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

2022-08-28 Thread Dave Young
Hi David,

On Sat, 27 Aug 2022 at 01:02, David Hildenbrand  wrote:
>
> On 26.08.22 03:43, Dave Young wrote:
> > Hi David,
> >
> > [Added more people in cc]
> >
>
> Hi Dave,
>
> thanks for your input!

You are welcome :)

>
> [...]
>
> >> Side note: especially with kdump() I feel like we might see much more
> >> widespread use of panic_on_warn to be able to actually extract debug
> >> information in a controlled manner -- for example on enterprise distros.
> >> ... which would then make these systems more likely to crash, because
> >> there is no way to distinguish a rather harmless warning from a severe
> >> warning :/ . But let's see if some kdump() folks will share their
> >> opinion as reply to the cover letter.
> >
> > I can understand the intention of this patch, and I totally agree that
> > BUG() should be used carefully, this is a good proposal if we can
> > clearly define the standard about when to use BUG().  But I do have
>
> Essentially, the general rule from Linus is "absolutely no new BUG_ON()
> calls ever" -- but I think the consensus in that thread was that there
> are corner cases when it comes to unavoidable data corruption/security
> issues. And these are rare cases, not the usual case where we'd have
> used BUG_ON()/VM_BUG_ON().

Yes, probably.. (say probably because those cases are hidden and not
clear sometimes)

>
> > some worries,  I think this standard is different for different sub
> > components, it is not clear to me at least,  so this may introduce an
> > unstable running kernel and cause troubles (eg. data corruption) with
> > a WARN instead of a BUG. Probably it would be better to say "Do not
> > WARN lightly, and do not hesitate to use BUG if it is really needed"?
>
>
> Well, I don't make the rules, I document them and share them for general
> awareness/comments :) Documenting this is valuable, because there seem
> to be quite some different opinions floating around in the community --
> and I've been learning different rules from different people over the years.

Understand.

>
> >
> > About "patch_on_warn", it will depend on the admin/end user to set it,
> > it is not a good idea for distribution to set it. It seems we are
> > leaving it to end users to take the risk of a kernel panic even with
> > all kernel WARN even if it is sometimes not necessary.
>
> My question would be what we could add/improve to keep systems with
> kdump armed running as expected for end users, that is most probably:
>
> 1) don't crash on harmless WARN() that can just be reported and the
>machine will continue running mostly fine without real issues.
> 2) crash on severe issues (previously BUG) such that we can properly
>capture a system dump via kdump. The restart the machine.
>
> Of course, once one would run into 2), one could try reproducing with
> "panic_on_warn" to get a reasonable system dump. But I guess that's not
> what enterprise customers expect.
>

Sometimes the bug can not be easily reproduced again. So there seems
no easy and good way to use..

>
> One wild idea (in the cover letter) was to add something new that can be
> configured by user space and that expresses that something is more
> severe than just some warning that can be recovered easily. But it can
> eventually be recovered to keep the system running to some degree. But
> still, it's configurable if we want to trigger a panic or let the system
> run.
>
> John mentioned PANIC_ON().
>

I would vote for PANIC_ON(), it sounds like a good idea, because
BUG_ON() is not obvious and, PANIC_ON() can alert the code author that
this will cause a kernel panic and one will be more careful before
using it.

>
> What would be your expectation for kdump users under which conditions we
> want to trigger kdump and when not?
>
> Regarding panic_on_warn, how often do e.g., RHEL users observe warnings
> that we're not able to catch during testing, such that "panic_on_warn"
> would be a real no-go?

Well, I'm not sure how to answer the questions,  when to panic should
be decided by kernel developers instead of kdump users,  but I think
the panic behaviour does impact the supporting team.  I added Stephen
who is from the RH supporting team, maybe he can have some inputs.

BTW, I vaguely remember Prarit introduced the panic_on_warn, see if he
has any comments here.

Thanks
Dave



>
> --
> Thanks,
>
> David / dhildenb
>


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/5] arm64/mm: remap crash kernel with base pages even if rodata_full disabled

2022-08-28 Thread Baoquan He
On 08/25/22 at 10:48am, Mike Rapoport wrote:
.. 
> > > There were several rounds of discussion how to remap with base pages only
> > > the crash kernel area, the latest one here:
> > > 
> > > https://lore.kernel.org/all/1656777473-73887-1-git-send-email-guanghuif...@linux.alibaba.com
> > > 
> > > and this is my attempt to allow having both large pages in the linear map
> > > and protection for the crash kernel memory.
> > > 
> > > For server systems it is important to protect crash kernel memory for
> > > post-mortem analysis, and for that protection to work the crash kernel
> > > memory should be mapped with base pages in the linear map. 
> > > 
> > > On the systems with ZONE_DMA/DMA32 enabled, crash kernel reservation
> > > happens after the linear map is created and the current code forces using
> > > base pages for the entire linear map, which results in performance
> > > degradation.
> > > 
> > > These patches enable remapping of the crash kernel area with base pages
> > > while keeping large pages in the rest of the linear map.
> > > 
> > > The idea is to align crash kernel reservation to PUD boundaries, remap 
> > > that
> > > PUD and then free the extra memory.
> > 
> > Hi Mike,
> > 
> > Thanks for the effort to work on this issue. While I have to say this
> > isnt's good because it can only be made relying on a prerequisite that
> > there's big enough memory. If on a system, say 2G memory, it's not easy
> > to succeed on getting one 1G memory. While we only require far smaller
> > region than 1G, e.g about 200M which should be easy to get. So the way
> > taken in this patchset is too quirky and will cause regression on
> > systemswith small memory. This kind of sytems with small memory exists
> > widely on virt guest instance.
> 
> I don't agree there is a regression. If the PUD-aligned allocation fails,
> there is a fallback to the allocation of the exact size requested for crash
> kernel. This allocation just won't get protected.

Sorry, I misunderstood it. I just went through the log and didn't
look into codes.

But honestly, if we accept the fallback which doesn't do the protection,
we should be able to take off the protection completely, right?
Otherwise, the reservation code is a little complicated.

> 
> Also please note, that the changes are only for the case when user didn't
> force base-size pages in the linear map, so anything that works now will
> work the same way with this set applied.
>  
> > The crashkernel reservation happens after linear map because the
> > reservation needs to know the dma zone boundary, arm64_dma_phys_limit.
> > If we can deduce that before bootmem_init(), the reservation can be
> > done before linear map. I will make an attempt on that. If still can't
> > be accepted, we would like to take off the crashkernel region protection
> > on arm64 for now.
> 
> I doubt it would be easy because arm64_dma_phys_limit is determined after
> parsing of the device tree and there might be memory allocations of
> possibly unmapped memory during the parsing.

I have sent out the patches with an attempt, it's pretty straightforward
and simple. Because arm64 only has one exception, namely Raspberry Pi 4,
on which some peripherals can only address 30bit range. That is a corner
case, to be honest. And kdump is a necessary feature on server, but may
not be so expected on Raspberry Pi 4, a system for computer education
and hobbyists. And kdump only cares whether the dump target devices can
address 32bit range, namely storage device or network card on server.
If finally confirmed that storage devices can only address 30bit range
on Raspberry Pi 4, people still can have crashkernel=xM@yM method to
reserve crashkernel regions.

Thanks
Baoquan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec