Re: [PATCH v2 8/8] crash_core.c: remove unneeded functions
On 2023/8/29 20:16, Baoquan He wrote: > So far, nobody calls functions parse_crashkernel_high() and > parse_crashkernel_low(), remove both of them. Reviewed-by: Zhen Lei > > Signed-off-by: Baoquan He > --- > include/linux/crash_core.h | 4 > kernel/crash_core.c| 18 -- > 2 files changed, 22 deletions(-) > > diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h > index 86e22e6a039f..d64006c4bd43 100644 > --- a/include/linux/crash_core.h > +++ b/include/linux/crash_core.h > @@ -83,10 +83,6 @@ void final_note(Elf_Word *buf); > int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, > unsigned long long *crash_size, unsigned long long *crash_base, > unsigned long long *low_size, bool *high); > -int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, > - unsigned long long *crash_size, unsigned long long *crash_base); > -int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, > - unsigned long long *crash_size, unsigned long long *crash_base); > > #ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > #ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index 6bc00cc390b5..61a8ea3b23a2 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -323,24 +323,6 @@ int __init parse_crashkernel(char *cmdline, > return 0; > } > > -int __init parse_crashkernel_high(char *cmdline, > - unsigned long long system_ram, > - unsigned long long *crash_size, > - unsigned long long *crash_base) > -{ > - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, > - suffix_tbl[SUFFIX_HIGH]); > -} > - > -int __init parse_crashkernel_low(char *cmdline, > - unsigned long long system_ram, > - unsigned long long *crash_size, > - unsigned long long *crash_base) > -{ > - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, > - suffix_tbl[SUFFIX_LOW]); > -} > - > /* > * Add a dummy early_param handler to mark crashkernel= as a known command > line > * parameter and suppress incorrect warnings in init/main.c. > -- Regards, Zhen Lei ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 7/8] arm64: kdump: use generic interface to simplify crashkernel reservation
On 2023/8/29 20:16, Baoquan He wrote: > With the help of newly changed function parse_crashkernel() and > generic reserve_crashkernel_generic(), crashkernel reservation can be > simplified by steps: > > 1) Add a new header file , and define CRASH_ALIGN, >CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and >DEFAULT_CRASH_KERNEL_LOW_SIZE in ; > > 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and >reserve_crashkernel_generic(); > > 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in >arch/arm64/Kconfig. > > The old reserve_crashkernel_low() and reserve_crashkernel() can be > removed. Reviewed-by: Zhen Lei > > Signed-off-by: Baoquan He > --- > arch/arm64/Kconfig | 3 + > arch/arm64/include/asm/crash_core.h | 10 ++ > arch/arm64/mm/init.c| 140 ++-- > 3 files changed, 21 insertions(+), 132 deletions(-) > create mode 100644 arch/arm64/include/asm/crash_core.h > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 29db061db9bb..07fb8c71339d 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1481,6 +1481,9 @@ config KEXEC_FILE > for kernel and initramfs as opposed to list of segments as > accepted by previous system call. > > +config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > + def_bool CRASH_CORE > + > config KEXEC_SIG > bool "Verify kernel signature during kexec_file_load() syscall" > depends on KEXEC_FILE > diff --git a/arch/arm64/include/asm/crash_core.h > b/arch/arm64/include/asm/crash_core.h > new file mode 100644 > index ..9f5c8d339f44 > --- /dev/null > +++ b/arch/arm64/include/asm/crash_core.h > @@ -0,0 +1,10 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +#ifndef _ARM64_CRASH_CORE_H > +#define _ARM64_CRASH_CORE_H > + > +/* Current arm64 boot protocol requires 2MB alignment */ > +#define CRASH_ALIGN SZ_2M > + > +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > +#define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) > +#endif > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 4ad637508b75..48ab23531bb6 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -64,15 +64,6 @@ EXPORT_SYMBOL(memstart_addr); > */ > phys_addr_t __ro_after_init arm64_dma_phys_limit; > > -/* Current arm64 boot protocol requires 2MB alignment */ > -#define CRASH_ALIGN SZ_2M > - > -#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > -#define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) > -#define CRASH_HIGH_SEARCH_BASE SZ_4G > - > -#define DEFAULT_CRASH_KERNEL_LOW_SIZE(128UL << 20) > - > /* > * To make optimal use of block mappings when laying out the linear > * mapping, round down the base of physical memory to a size that can > @@ -100,140 +91,25 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit; > #define ARM64_MEMSTART_ALIGN (1UL << ARM64_MEMSTART_SHIFT) > #endif > > -static int __init reserve_crashkernel_low(unsigned long long low_size) > -{ > - unsigned long long low_base; > - > - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, > CRASH_ADDR_LOW_MAX); > - if (!low_base) { > - pr_err("cannot allocate crashkernel low memory > (size:0x%llx).\n", low_size); > - return -ENOMEM; > - } > - > - pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld > MB)\n", > - low_base, low_base + low_size, low_size >> 20); > - > - crashk_low_res.start = low_base; > - crashk_low_res.end = low_base + low_size - 1; > - insert_resource(&iomem_resource, &crashk_low_res); > - > - return 0; > -} > - > -/* > - * reserve_crashkernel() - reserves memory for crash kernel > - * > - * This function reserves memory area given in "crashkernel=" kernel command > - * line parameter. The memory reserved is used by dump capture kernel when > - * primary kernel is crashing. > - */ > -static void __init reserve_crashkernel(void) > +static void __init arch_reserve_crashkernel(void) > { > - unsigned long long crash_low_size = 0, search_base = 0; > - unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > + unsigned long long low_size = 0; > unsigned long long crash_base, crash_size; > char *cmdline = boot_command_line; > - bool fixed_base = false; > bool high = false; > int ret; > > if (!IS_ENABLED(CONFIG_KEXEC_CORE)) > return; > > - /* crashkernel=X[@offset] */ > ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), > - &crash_size, &crash_base, NULL, NULL); > - if (ret == -ENOENT) { > - ret = parse_crashkernel_high(cmdline, 0, &crash_size, > &crash_base); > - if (ret || !crash_size) > - return; > - > - /* > - * crashkernel=Y,low can be specified or not, but invalid value > -
Re: [PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code
On 2023/8/29 20:16, Baoquan He wrote: > With the help of newly changed function parse_crashkernel() and > generic reserve_crashkernel_generic(), crashkernel reservation can be > simplified by steps: > > 1) Add a new header file , and define CRASH_ALIGN, >CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and >DEFAULT_CRASH_KERNEL_LOW_SIZE in ; > > 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and >reserve_crashkernel_generic(), and do the ARCH specific work if >needed. > > 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in >arch/x86/Kconfig. > > When adding DEFAULT_CRASH_KERNEL_LOW_SIZE, add crash_low_size_default() > to calculate crashkernel low memory because x86_64 has special > requirement. > > The old reserve_crashkernel_low() and reserve_crashkernel() can be > removed. > > Signed-off-by: Baoquan He > --- > arch/x86/Kconfig | 3 + > arch/x86/include/asm/crash_core.h | 34 +++ > arch/x86/kernel/setup.c | 144 -- > 3 files changed, 53 insertions(+), 128 deletions(-) > create mode 100644 arch/x86/include/asm/crash_core.h > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 8d9e4b362572..c4539dc35985 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2037,6 +2037,9 @@ config KEXEC_FILE > config ARCH_HAS_KEXEC_PURGATORY > def_bool KEXEC_FILE > > +config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > + def_bool CRASH_CORE > + > config KEXEC_SIG > bool "Verify kernel signature during kexec_file_load() syscall" > depends on KEXEC_FILE > diff --git a/arch/x86/include/asm/crash_core.h > b/arch/x86/include/asm/crash_core.h > new file mode 100644 > index ..5fc5e4f94521 > --- /dev/null > +++ b/arch/x86/include/asm/crash_core.h > @@ -0,0 +1,34 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _X86_CRASH_CORE_H > +#define _X86_CRASH_CORE_H > + > +/* 16M alignment for crash kernel regions */ > +#define CRASH_ALIGN SZ_16M > + > +/* > + * Keep the crash kernel below this limit. > + * > + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range > + * due to mapping restrictions. > + * > + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is > + * the upper limit of system RAM in 4-level paging mode. Since the kdump > + * jump could be from 5-level paging to 4-level paging, the jump will fail if > + * the kernel is put above 64 TB, and during the 1st kernel bootup there's > + * no good way to detect the paging mode of the target kernel which will be > + * loaded for dumping. > + */ > + > +#ifdef CONFIG_X86_32 > +# define CRASH_ADDR_LOW_MAX SZ_512M > +# define CRASH_ADDR_HIGH_MAXSZ_512M > +#else > +# define CRASH_ADDR_LOW_MAX SZ_4G > +# define CRASH_ADDR_HIGH_MAXSZ_64T > +#endif > + > +# define DEFAULT_CRASH_KERNEL_LOW_SIZE crash_low_size_default() > + > +extern unsigned long crash_low_size_default(void); > + > +#endif /* _X86_CRASH_CORE_H */ > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 382c66d2cf71..559a5c4141db 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -474,152 +474,40 @@ static void __init > memblock_x86_reserve_range_setup_data(void) > /* > * - Crashkernel reservation -- > */ > - > -/* 16M alignment for crash kernel regions */ > -#define CRASH_ALIGN SZ_16M > - > -/* > - * Keep the crash kernel below this limit. > - * > - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range > - * due to mapping restrictions. > - * > - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is > - * the upper limit of system RAM in 4-level paging mode. Since the kdump > - * jump could be from 5-level paging to 4-level paging, the jump will fail if > - * the kernel is put above 64 TB, and during the 1st kernel bootup there's > - * no good way to detect the paging mode of the target kernel which will be > - * loaded for dumping. > - */ > -#ifdef CONFIG_X86_32 > -# define CRASH_ADDR_LOW_MAX SZ_512M > -# define CRASH_ADDR_HIGH_MAX SZ_512M > -#else > -# define CRASH_ADDR_LOW_MAX SZ_4G > -# define CRASH_ADDR_HIGH_MAX SZ_64T > -#endif > - > -static int __init reserve_crashkernel_low(void) > +unsigned long crash_low_size_default(void) > { > #ifdef CONFIG_X86_64 > - unsigned long long base, low_base = 0, low_size = 0; > - unsigned long low_mem_limit; > - int ret; > - > - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); > - > - /* crashkernel=Y,low */ > - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, > &low_size, &base); > - if (ret) { > - /* > - * two parts from kernel/dma/swiotlb.c: > - * -swiotlb size: user-specified with swiotlb= or default. > - * > - * -swiotlb overflow buffer: now hardcoded to 32k. We round it > - * to 8M for
Re: [PATCH v2 4/8] crash_core: add generic function to do reservation
On 2023/8/29 20:16, Baoquan He wrote: > In architecture like x86_64, arm64 and riscv, they have vast virtual > address space and usually have huge physical memory RAM. Their > crashkernel reservation doesn't have to be limited under 4G RAM, > but can be extended to the whole physical memory via crashkernel=,high > support. > > Now add function reserve_crashkernel_generic() to reserve crashkernel > memory if users specify any case of kernel pamameters, like > crashkernel=xM[@offset] or crashkernel=,high|low. > > This is preparation to simplify code of crashkernel=,high support > in architecutures. > > Signed-off-by: Baoquan He > --- > include/linux/crash_core.h | 34 ++-- > kernel/crash_core.c| 109 - > 2 files changed, 136 insertions(+), 7 deletions(-) > > diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h > index 85260bf4a734..2f732493e922 100644 > --- a/include/linux/crash_core.h > +++ b/include/linux/crash_core.h > @@ -77,12 +77,6 @@ Elf_Word *append_elf_note(Elf_Word *buf, char *name, > unsigned int type, > void *data, size_t data_len); > void final_note(Elf_Word *buf); > > -#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > -#ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE > -#define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) > -#endif > -#endif > - > int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, > unsigned long long *crash_size, unsigned long long *crash_base, > unsigned long long *low_size, bool *high); > @@ -91,4 +85,32 @@ int parse_crashkernel_high(char *cmdline, unsigned long > long system_ram, > int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, > unsigned long long *crash_size, unsigned long long *crash_base); > > +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > +#ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE > +#define DEFAULT_CRASH_KERNEL_LOW_SIZE(128UL << 20) > +#endif > +#ifndef CRASH_ALIGN > +#define CRASH_ALIGN SZ_2M > +#endif > +#ifndef CRASH_ADDR_LOW_MAX > +#define CRASH_ADDR_LOW_MAX SZ_4G > +#endif > +#ifndef CRASH_ADDR_HIGH_MAX > +#define CRASH_ADDR_HIGH_MAX memblock_end_of_DRAM() > +#endif > + > +void __init reserve_crashkernel_generic(char *cmdline, > + unsigned long long crash_size, > + unsigned long long crash_base, > + unsigned long long crash_low_size, > + bool high); > +#else > +static inline void __init reserve_crashkernel_generic(char *cmdline, > + unsigned long long crash_size, > + unsigned long long crash_base, > + unsigned long long crash_low_size, > + bool high) > +{} > +#endif > + > #endif /* LINUX_CRASH_CORE_H */ > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index 355b0ab5189c..6bc00cc390b5 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -5,11 +5,13 @@ > */ > > #include > -#include > #include > #include > #include > #include > +#include > +#include > +#include > > #include > #include > @@ -349,6 +351,111 @@ static int __init parse_crashkernel_dummy(char *arg) > } > early_param("crashkernel", parse_crashkernel_dummy); > > +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > +static int __init reserve_crashkernel_low(unsigned long long low_size) > +{ > +#ifdef CONFIG_64BIT > + unsigned long long low_base; > + > + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, > CRASH_ADDR_LOW_MAX); > + if (!low_base) { > + pr_err("cannot allocate crashkernel low memory > (size:0x%llx).\n", low_size); > + return -ENOMEM; > + } > + > + pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld > MB)\n", > + low_base, low_base + low_size, low_size >> 20); > + > + crashk_low_res.start = low_base; > + crashk_low_res.end = low_base + low_size - 1; > + insert_resource(&iomem_resource, &crashk_low_res); > +#endif > + return 0; > +} > + > +void __init reserve_crashkernel_generic(char *cmdline, > + unsigned long long crash_size, > + unsigned long long crash_base, > + unsigned long long crash_low_size, > + bool high) > +{ > + unsigned long long search_end = CRASH_ADDR_LOW_MAX, search_base = 0; > + bool fixed_base = false; > + > + /* User specifies base address explicitly. */ > + if (crash_base) { > + fixed_base = true; > + search_base = crash_base; > + search_end = crash_base + crash_size; > + } > + > + if (high) { It might be a little clearer to use "else if (high) {" > + search_base = CRASH_ADDR_LOW_MAX; > + search_end = CRASH_ADDR_HIGH_MAX; > + } > + > +retry: > + crash_base = memblock_phys_alloc_range(cr
Re: [PATCH v2 3/8] crash_core: change parse_crashkernel() to support crashkernel=,high|low parsing
On 2023/8/29 20:16, Baoquan He wrote: > Now parse_crashkernel() is a real entry point for all kinds of > crahskernel parsing on any architecture. > > And wrap the crahskernel=,high|low handling inside > CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope. > > Signed-off-by: Baoquan He > --- > include/linux/crash_core.h | 6 ++ > kernel/crash_core.c| 28 +++- > 2 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h > index 2e76289699ff..85260bf4a734 100644 > --- a/include/linux/crash_core.h > +++ b/include/linux/crash_core.h > @@ -77,6 +77,12 @@ Elf_Word *append_elf_note(Elf_Word *buf, char *name, > unsigned int type, > void *data, size_t data_len); > void final_note(Elf_Word *buf); > > +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > +#ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE > +#define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) > +#endif > +#endif > + > int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, > unsigned long long *crash_size, unsigned long long *crash_base, > unsigned long long *low_size, bool *high); > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index f6a5c219e2e1..355b0ab5189c 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -276,6 +276,9 @@ static int __init __parse_crashkernel(char *cmdline, > /* > * That function is the entry point for command line parsing and should be > * called from the arch-specific code. > + * > + * If crashkernel=,high|low is supported on architecture, non-NULL values > + * should be passed to parameters 'low_size' and 'high'. > */ > int __init parse_crashkernel(char *cmdline, >unsigned long long system_ram, > @@ -291,7 +294,30 @@ int __init parse_crashkernel(char *cmdline, > crash_base, NULL); > if (!high) > return ret; > - > +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > + else if (ret == -ENOENT) { > + ret = __parse_crashkernel(cmdline, 0, crash_size, > + crash_base, suffix_tbl[SUFFIX_HIGH]); > + if (ret || !*crash_size) > + return -1; Change -1 to -EINVAL? > + > + /* > + * crashkernel=Y,low can be specified or not, but invalid value > + * is not allowed. > + */ > + ret = __parse_crashkernel(cmdline, 0, low_size, > + crash_base, suffix_tbl[SUFFIX_LOW]); > + if (ret == -ENOENT) > + *low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > + else if (ret) > + return -1; return ret; > + > + *high = true; > + } else if (ret || !*crash_size) { This check can be moved outside of #ifdef. Because even '!high', it's completely applicable. The overall adjustment is as follows: - if (!high) - return ret; #ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION if (high && ret == -ENOENT) { ... ... if (ret || !*crash_size)//parse HIGH ... ... } //At this point, *crash_size is not 0 and ret is 0. //We can also delete if (!*crash_size) above because it will be checked later. #endif if (!*crash_size) ret = -EINVAL; return ret; - return 0; > + /* The specified value is invalid */ > + return -1; > + } > +#endif > return 0; > } > > -- Regards, Zhen Lei ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 2/8] crash_core: change the prototype of function parse_crashkernel()
On 2023/8/29 20:16, Baoquan He wrote: > Add two parameters 'low_size' and 'high' to function parse_crashkernel(), > later crashkernel=,high|low parsing will be added. Make adjustments in all > call sites of parse_crashkernel() in arch. Reviewed-by: Zhen Lei > > Signed-off-by: Baoquan He > --- > arch/arm/kernel/setup.c | 3 ++- > arch/arm64/mm/init.c | 2 +- > arch/ia64/kernel/setup.c | 2 +- > arch/loongarch/kernel/setup.c| 4 +++- > arch/mips/kernel/setup.c | 3 ++- > arch/powerpc/kernel/fadump.c | 2 +- > arch/powerpc/kexec/core.c| 2 +- > arch/powerpc/mm/nohash/kaslr_booke.c | 2 +- > arch/riscv/mm/init.c | 2 +- > arch/s390/kernel/setup.c | 4 ++-- > arch/sh/kernel/machine_kexec.c | 2 +- > arch/x86/kernel/setup.c | 3 ++- > include/linux/crash_core.h | 3 ++- > kernel/crash_core.c | 15 --- > 14 files changed, 32 insertions(+), 17 deletions(-) > > diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c > index c66b560562b3..e2bb7afd0683 100644 > --- a/arch/arm/kernel/setup.c > +++ b/arch/arm/kernel/setup.c > @@ -1010,7 +1010,8 @@ static void __init reserve_crashkernel(void) > > total_mem = get_total_mem(); > ret = parse_crashkernel(boot_command_line, total_mem, > - &crash_size, &crash_base); > + &crash_size, &crash_base, > + NULL, NULL); > /* invalid value specified or crashkernel=0 */ > if (ret || !crash_size) > return; > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 4fcb88a445ef..4ad637508b75 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -142,7 +142,7 @@ static void __init reserve_crashkernel(void) > > /* crashkernel=X[@offset] */ > ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), > - &crash_size, &crash_base); > + &crash_size, &crash_base, NULL, NULL); > if (ret == -ENOENT) { > ret = parse_crashkernel_high(cmdline, 0, &crash_size, > &crash_base); > if (ret || !crash_size) > diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c > index 5a55ac82c13a..4faea2d2cf07 100644 > --- a/arch/ia64/kernel/setup.c > +++ b/arch/ia64/kernel/setup.c > @@ -277,7 +277,7 @@ static void __init setup_crashkernel(unsigned long total, > int *n) > int ret; > > ret = parse_crashkernel(boot_command_line, total, > - &size, &base); > + &size, &base, NULL, NULL); > if (ret == 0 && size > 0) { > if (!base) { > sort_regions(rsvd_region, *n); > diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c > index 9d830ab4e302..776a068d8718 100644 > --- a/arch/loongarch/kernel/setup.c > +++ b/arch/loongarch/kernel/setup.c > @@ -267,7 +267,9 @@ static void __init arch_parse_crashkernel(void) > unsigned long long crash_base, crash_size; > > total_mem = memblock_phys_mem_size(); > - ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, > &crash_base); > + ret = parse_crashkernel(boot_command_line, total_mem, > + &crash_size, &crash_base, > + NULL, NULL); > if (ret < 0 || crash_size <= 0) > return; > > diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c > index cb871eb784a7..08321c945ac4 100644 > --- a/arch/mips/kernel/setup.c > +++ b/arch/mips/kernel/setup.c > @@ -460,7 +460,8 @@ static void __init mips_parse_crashkernel(void) > > total_mem = memblock_phys_mem_size(); > ret = parse_crashkernel(boot_command_line, total_mem, > - &crash_size, &crash_base); > + &crash_size, &crash_base, > + NULL, NULL); > if (ret != 0 || crash_size <= 0) > return; > > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c > index ea0a073abd96..7dbdeba56e74 100644 > --- a/arch/powerpc/kernel/fadump.c > +++ b/arch/powerpc/kernel/fadump.c > @@ -313,7 +313,7 @@ static __init u64 fadump_calculate_reserve_size(void) >* memory at a predefined offset. >*/ > ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), > - &size, &base); > + &size, &base, NULL, NULL); > if (ret == 0 && size > 0) { > unsigned long max_size; > > diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c > index de64c7962991..9346c960b296 100644 > --- a/arch/powerpc/kexec/core.c > +++ b/arch/powerpc/kexec/core.c > @@ -109,7 +109,7 @@ void __init reserve_crashkernel(void) > total_mem_sz = memory_limit ? memory_limit :
Re: [PATCH v2 1/8] crash_core.c: remove unnecessary parameter of function
On 2023/8/29 20:16, Baoquan He wrote: > In all call sites of __parse_crashkernel(), the parameter 'name' is > hardcoded as "crashkernel=". So remove the unnecessary parameter 'name', > add local varibale 'name' inside __parse_crashkernel() instead. Reviewed-by: Zhen Lei > > Signed-off-by: Baoquan He > --- > kernel/crash_core.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index 90ce1dfd591c..f27b4e45d410 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -241,11 +241,11 @@ static int __init __parse_crashkernel(char *cmdline, >unsigned long long system_ram, >unsigned long long *crash_size, >unsigned long long *crash_base, > - const char *name, >const char *suffix) > { > char*first_colon, *first_space; > char*ck_cmdline; > + char*name = "crashkernel="; > > BUG_ON(!crash_size || !crash_base); > *crash_size = 0; > @@ -283,7 +283,7 @@ int __init parse_crashkernel(char *cmdline, >unsigned long long *crash_base) > { > return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, > - "crashkernel=", NULL); > + NULL); > } > > int __init parse_crashkernel_high(char *cmdline, > @@ -292,7 +292,7 @@ int __init parse_crashkernel_high(char *cmdline, >unsigned long long *crash_base) > { > return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, > - "crashkernel=", suffix_tbl[SUFFIX_HIGH]); > + suffix_tbl[SUFFIX_HIGH]); > } > > int __init parse_crashkernel_low(char *cmdline, > @@ -301,7 +301,7 @@ int __init parse_crashkernel_low(char *cmdline, >unsigned long long *crash_base) > { > return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, > - "crashkernel=", suffix_tbl[SUFFIX_LOW]); > + suffix_tbl[SUFFIX_LOW]); > } > > /* > -- Regards, Zhen Lei ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC PATCH] Introduce persistent memory pool
On Wed, Aug 30, 2023, at 03:20, Alexander Graf wrote: > On 30.08.23 00:07, Stanislav Kinsburskii wrote: >> On Mon, Aug 28, 2023 at 10:50:19PM +0200, Alexander Graf wrote: >> Device tree or ACPI are options indeed. However AFAIU in case of DT user >> space has to involved into the picture to modify and complie it, while >> ACPI isn't flexible or easily extendable. >> Also, AFAIU both these standards were designed with passing >> hardware-specific data in mind from bootstrap software to an OS kernel >> and thus were never really intended to be used for creating a persistent >> state accross kexec. >> To me, an attempt to use either of them to pass kernel-specific data looks >> like an abuse (or misuse) excused by the simplicity of implementation. > > > What I was describing above is that the Linux boot protocol already has > natural ways to pass a DT (arm) or set of ACPI tables (x86) to the > target kernel. Whatever we do here should either piggy back on top of > those natural mechanisms (e.g. /chosen node in DT) or be on the same > level (e.g. pass DT in one register, pass metadata structure in another > register). > > When it comes to the actual content of the metadata, I'm personally also > leaning towards DT. We already have libfdt inside the kernel. It gives > is a very simple, well understood structured file format that you can > extend, version, etc etc. And the kernel has mechanisms to modify fdt > contents. Agreed. This also makes a lot of sense since the fdt format was originally introduced for this exact purpose, to be a key-value store to pass data from the running kernel to the next one after kexec when the original source of the data (originally open firmware) is gone. It only turned into the generic way to describe embedded systems later on, but both the fdt binary format and the kexec infrastructure for manipulating and passing the blob should be easy to reuse for additional purposes as long as the contents are put into appropriate namespaces that don't clash with existing usage. Arnd ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal
On Wed, Aug 30, 2023 at 7:07 PM Mimi Zohar wrote: > On Wed, 2023-08-30 at 18:23 -0400, Paul Moore wrote: > > On Wed, Aug 30, 2023 at 6:21 PM Paul Moore wrote: > > > On Wed, Aug 30, 2023 at 5:50 PM Mimi Zohar wrote: > > > > On Wed, 2023-08-30 at 16:47 -0400, Paul Moore wrote: > > > > > On Wed, Aug 30, 2023 at 4:25 PM Mimi Zohar > > > > > wrote: > > > > > > Your initial question was "what happens if the file/filesystem > > > > > > becomes > > > > > > inaccessible at some point and an attestation client attempts to > > > > > > read > > > > > > the entire log?". For what reason would it be inaccessible? For > > > > > > the > > > > > > original single tmpfs file, what would make it inaccessible? > > > > > > > > > > In your reply that I had responded to you had mentioned that the > > > > > kernel was simply being passed a fd and taking ownership of it, the fd > > > > > could either be a tmpfs backed file or some form of persistent storage > > > > > as both were discussed in this thread. I imagine a tmpfs filesystem > > > > > could still be forcibly unmounted, resulting in problems, but I can't > > > > > say that for certain. However, there are definitely cases where a fd > > > > > backed against an arbitrary filesystem could run into problems: > > > > > storage device issues for local filesystems, networking issues for > > > > > network filesystems, and good old fashioned user/admin intervention in > > > > > both cases. > > > > > > > > "I imagine tmpfs filesystem could still be forcibly unmounted" sounds > > > > like an attack. Not being able to verify the measurement list against a > > > > quote is probably a good thing. > > > > > > Okay, can you answer the question for an arbitrary persistent > > > filesystem? That was always the more important question, ... > > The original proposal, not mine, suggested using a tmpfs file. The > idea of writing the measurements to a file on a persistent filesystem > wasn't mine either. Sush/Tushar were pushing for writing to a > persistent file(s). No argument from me that writing to a file on an > arbitrary persistent filesystem is not a good idea. Progress. Okay, so we all now agree that the kernel writing to an arbitrary filesystem is not a good idea, and using tmpfs doesn't solve the problem of general memory pressure (from previously in this thread), that's all helpful and good to clarify. Assuming Sush and Tushar rework the document to clarify the motivation/purpose for the work, as you suggested earlier, I'm assuming we can revisit this problem and solutions? -- paul-moore.com ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal
On Wed, 2023-08-30 at 18:23 -0400, Paul Moore wrote: > On Wed, Aug 30, 2023 at 6:21 PM Paul Moore wrote: > > On Wed, Aug 30, 2023 at 5:50 PM Mimi Zohar wrote: > > > On Wed, 2023-08-30 at 16:47 -0400, Paul Moore wrote: > > > > On Wed, Aug 30, 2023 at 4:25 PM Mimi Zohar wrote: > > > > > Your initial question was "what happens if the file/filesystem becomes > > > > > inaccessible at some point and an attestation client attempts to read > > > > > the entire log?". For what reason would it be inaccessible? For the > > > > > original single tmpfs file, what would make it inaccessible? > > > > > > > > In your reply that I had responded to you had mentioned that the > > > > kernel was simply being passed a fd and taking ownership of it, the fd > > > > could either be a tmpfs backed file or some form of persistent storage > > > > as both were discussed in this thread. I imagine a tmpfs filesystem > > > > could still be forcibly unmounted, resulting in problems, but I can't > > > > say that for certain. However, there are definitely cases where a fd > > > > backed against an arbitrary filesystem could run into problems: > > > > storage device issues for local filesystems, networking issues for > > > > network filesystems, and good old fashioned user/admin intervention in > > > > both cases. > > > > > > "I imagine tmpfs filesystem could still be forcibly unmounted" sounds > > > like an attack. Not being able to verify the measurement list against a > > > quote is probably a good thing. > > > > Okay, can you answer the question for an arbitrary persistent > > filesystem? That was always the more important question, ... The original proposal, not mine, suggested using a tmpfs file. The idea of writing the measurements to a file on a persistent filesystem wasn't mine either. Sush/Tushar were pushing for writing to a persistent file(s). No argument from me that writing to a file on an arbitrary persistent filesystem is not a good idea. -- thanks, Mimi ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal
On Wed, Aug 30, 2023 at 6:21 PM Paul Moore wrote: > On Wed, Aug 30, 2023 at 5:50 PM Mimi Zohar wrote: > > On Wed, 2023-08-30 at 16:47 -0400, Paul Moore wrote: > > > On Wed, Aug 30, 2023 at 4:25 PM Mimi Zohar wrote: > > > > Your initial question was "what happens if the file/filesystem becomes > > > > inaccessible at some point and an attestation client attempts to read > > > > the entire log?". For what reason would it be inaccessible? For the > > > > original single tmpfs file, what would make it inaccessible? > > > > > > In your reply that I had responded to you had mentioned that the > > > kernel was simply being passed a fd and taking ownership of it, the fd > > > could either be a tmpfs backed file or some form of persistent storage > > > as both were discussed in this thread. I imagine a tmpfs filesystem > > > could still be forcibly unmounted, resulting in problems, but I can't > > > say that for certain. However, there are definitely cases where a fd > > > backed against an arbitrary filesystem could run into problems: > > > storage device issues for local filesystems, networking issues for > > > network filesystems, and good old fashioned user/admin intervention in > > > both cases. > > > > "I imagine tmpfs filesystem could still be forcibly unmounted" sounds > > like an attack. Not being able to verify the measurement list against a > > quote is probably a good thing. > > Okay, can you answer the question for an arbitrary persistent > filesystem? That was always the more important question, and your > continued avoidance is getting me increasingly annoyed. Speaking of being annoyed, I'm fixing Tushar's email as the bounces are also driving me nuts. -- paul-moore.com ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal
On Wed, Aug 30, 2023 at 5:50 PM Mimi Zohar wrote: > On Wed, 2023-08-30 at 16:47 -0400, Paul Moore wrote: > > On Wed, Aug 30, 2023 at 4:25 PM Mimi Zohar wrote: > > > Your initial question was "what happens if the file/filesystem becomes > > > inaccessible at some point and an attestation client attempts to read > > > the entire log?". For what reason would it be inaccessible? For the > > > original single tmpfs file, what would make it inaccessible? > > > > In your reply that I had responded to you had mentioned that the > > kernel was simply being passed a fd and taking ownership of it, the fd > > could either be a tmpfs backed file or some form of persistent storage > > as both were discussed in this thread. I imagine a tmpfs filesystem > > could still be forcibly unmounted, resulting in problems, but I can't > > say that for certain. However, there are definitely cases where a fd > > backed against an arbitrary filesystem could run into problems: > > storage device issues for local filesystems, networking issues for > > network filesystems, and good old fashioned user/admin intervention in > > both cases. > > "I imagine tmpfs filesystem could still be forcibly unmounted" sounds > like an attack. Not being able to verify the measurement list against a > quote is probably a good thing. Okay, can you answer the question for an arbitrary persistent filesystem? That was always the more important question, and your continued avoidance is getting me increasingly annoyed. -- paul-moore.com ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal
On Wed, 2023-08-30 at 16:47 -0400, Paul Moore wrote: > On Wed, Aug 30, 2023 at 4:25 PM Mimi Zohar wrote: > > Your initial question was "what happens if the file/filesystem becomes > > inaccessible at some point and an attestation client attempts to read > > the entire log?". For what reason would it be inaccessible? For the > > original single tmpfs file, what would make it inaccessible? > > In your reply that I had responded to you had mentioned that the > kernel was simply being passed a fd and taking ownership of it, the fd > could either be a tmpfs backed file or some form of persistent storage > as both were discussed in this thread. I imagine a tmpfs filesystem > could still be forcibly unmounted, resulting in problems, but I can't > say that for certain. However, there are definitely cases where a fd > backed against an arbitrary filesystem could run into problems: > storage device issues for local filesystems, networking issues for > network filesystems, and good old fashioned user/admin intervention in > both cases. "I imagine tmpfs filesystem could still be forcibly unmounted" sounds like an attack. Not being able to verify the measurement list against a quote is probably a good thing. > > > In the > > "snapshotting" design this problem becomes a userspace issue. > > Yes, exactly. Userspace is almost always going to have an easier time > recovering from these types of errors ... or at least I believe so, > perhaps you have a clever solution for how the kernel can handle the > file/filesystem disappearing from under the fd? Nothing changes. New measurements are initially stored in kernel memory, until they are successfully copied. > > The first sentence of the cover letter is "Depending on the IMA policy, > > the IMA log can consume a lot of Kernel memory on the device." ... > > As I'm still looking for an answer to my question, let's stay focused > on that before we start worrying too much about the phrasing of the > design proposal that was submitted. It's more than just phrasing, but the purpose/motivation of the proposed changes. -- thanks, Mimi ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal
On Wed, Aug 30, 2023 at 4:25 PM Mimi Zohar wrote: > Your initial question was "what happens if the file/filesystem becomes > inaccessible at some point and an attestation client attempts to read > the entire log?". For what reason would it be inaccessible? For the > original single tmpfs file, what would make it inaccessible? In your reply that I had responded to you had mentioned that the kernel was simply being passed a fd and taking ownership of it, the fd could either be a tmpfs backed file or some form of persistent storage as both were discussed in this thread. I imagine a tmpfs filesystem could still be forcibly unmounted, resulting in problems, but I can't say that for certain. However, there are definitely cases where a fd backed against an arbitrary filesystem could run into problems: storage device issues for local filesystems, networking issues for network filesystems, and good old fashioned user/admin intervention in both cases. > In the > "snapshotting" design this problem becomes a userspace issue. Yes, exactly. Userspace is almost always going to have an easier time recovering from these types of errors ... or at least I believe so, perhaps you have a clever solution for how the kernel can handle the file/filesystem disappearing from under the fd? > The first sentence of the cover letter is "Depending on the IMA policy, > the IMA log can consume a lot of Kernel memory on the device." ... As I'm still looking for an answer to my question, let's stay focused on that before we start worrying too much about the phrasing of the design proposal that was submitted. -- paul-moore.com ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal
On Tue, 2023-08-29 at 19:15 -0400, Paul Moore wrote: > On Tue, Aug 29, 2023 at 5:54 PM Mimi Zohar wrote: > > On Tue, 2023-08-29 at 17:30 -0400, Paul Moore wrote: > > > On Tue, Aug 29, 2023 at 5:05 PM Mimi Zohar wrote: > > > > On Tue, 2023-08-29 at 15:34 -0400, Paul Moore wrote: > > > > > On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar > > > > > wrote: > > > > > > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote: > > > > > > > On 8/14/2023 3:02 PM, Mimi Zohar wrote: > > > > > > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote: > > > > > > > >>> This design seems overly complex and requires synchronization > > > > > > > >>> between > > > > > > > >>> the "snapshot" record and exporting the records from the > > > > > > > >>> measurement > > > > > > > >>> list. None of this would be necessary if the measurements > > > > > > > >>> were copied > > > > > > > >>> from kernel memory to a backing file (e.g. tmpfs), as > > > > > > > >>> described in [1]. > > > > > > > Even if the Kernel maintains the link between a tmpfs exported > > > > > > > and an > > > > > > > in-memory IMA log - it still has to copy the tmpfs portion to the > > > > > > > Kernel memory during kexec soft boot. tmpfs is cleared during > > > > > > > kexec, > > > > > > > so this copying of tmpfs back to kernel memory is necessary to > > > > > > > preserve > > > > > > > the integrity of the log during kexec. But the copying would add > > > > > > > back > > > > > > > the memory pressure on the node during kexec (which may result in > > > > > > > out-of-memory), defeating the purpose of the overall > > > > > > > effort/feature. > > > > > > > Copying to a regular *persistent* protected file seems a cleaner > > > > > > > approach, compared to tmpfs. > > > > > > > > > > > > From a kernel perspective, it doesn't make a difference if userspace > > > > > > provides a tmpfs or persistent file. As per the discussion > > > > > > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t > > > > > > , userspace provides the kernel with the file descriptor of the > > > > > > opened > > > > > > file. > > > > > > > > > > > > > We prototyped this solution, however it > > > > > > > does not seem to be a common pattern within the Kernel to write > > > > > > > state > > > > > > > directly to files on disk file systems. We considered two > > > > > > > potential > > > > > > > options: > > > > > > > > > > > > If no file descriptor is provided, then the measurements aren't > > > > > > copied > > > > > > and removed from the securityfs file. If there are write errors, > > > > > > the > > > > > > measurements aren't removed from the securityfs file until the write > > > > > > errors are resolved. > > > > > > > > > > It sounds like this approach would require the file/filesystem to be > > > > > continuously available for the life of the system once the log was > > > > > snapshotted/overflowed to persistent storage, yes? Assuming that is > > > > > the case, what happens if the file/filesystem becomes inaccessible at > > > > > some point and an attestation client attempts to read the entire log? > > > > > > > > The main purpose of the change is to addres kernel memory pressure. > > > > Two designs are being discussed: Sush's "snapshotting" design and > > > > Amir's original suggestion of continously exporting the measurement > > > > records to a tmpfs or regular file. Both designs require verifying the > > > > initial attestation quote by walking the entire measurement list, > > > > calculating the expected TPM PCR value(s). That doesn't change. > > > > > > Sure, but my question is about what happens if portions of the > > > measurement list disappear due to file/filesystem problems? How is > > > that handled? > > > > With the "snapshotting" solution there could be multiple files, so > > portions could be missing. The other solution, the preferred solution, > > would be one file. > > With the snapshotting approach the kernel doesn't need to maintain > ongoing access to a file, that is left up to the user process > performing the attestation (or simply inspecting the logs). I have to > ask, for the third time now in as many hours, how does the proposed > kernel-holds-an-fd-open solution handle the case where the > file/filesystem is no longer accessible? The snapshotting approach > should be much more resilient here as the error recovery paths can be > much more involved than what we would have available in the kernel, > not to mention the flexibility of allowing a user process to determine > how to store and manage the snapshotted log. > > Considering that the snapshotting approach is opt-in (userspace has to > initiate the snapshot), I'm not sure the concern over log offsets is a > significant objection: existing userspace will never trigger a > snapshot, and new userspace that could potentially trigger a snapshot > should be written to take that into account. Your initial
Re: [PATCH 1/5] string.h: add array-wrappers for (v)memdup_user()
On Wed, 2023-08-30 at 17:29 +0300, Andy Shevchenko wrote: > On Wed, Aug 30, 2023 at 5:19 PM wrote: > > On Wed, 2023-08-30 at 17:11 +0300, Andy Shevchenko wrote: > > > On Wed, Aug 30, 2023 at 4:46 PM Philipp Stanner > > > > > > wrote: > > > > > --- a/include/linux/string.h > > > > +++ b/include/linux/string.h > > > > > > I'm wondering if this has no side-effects as string.h/string.c > > > IIRC > > > is > > > used also for early stages where some of the APIs are not > > > available. > > > > > > > @@ -6,6 +6,8 @@ > > > > #include /* for size_t */ > > > > #include /* for NULL */ > > > > #include /* for E2BIG */ > > > > +#include /* for check_mul_overflow() */ > > > > +#include /* for ERR_PTR() */ > > > > > > Can we preserve order (to some extent)? > > > > Sure. I just put it there so the comments build a congruent block. > > Which order would you prefer? > > Alphabetical. > > compiler.h > err.h > overflow.h > ...the rest that is a bit unordered... > > > > > #include > > > > #include > > ... I mean I could include my own in a sorted manner – but the existing ones are not sorted: /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_STRING_H_ #define _LINUX_STRING_H_ #include /* for inline */ #include /* for size_t */ #include/* for NULL */ #include /* for E2BIG */ #include #include extern char *strndup_user(const char __user *, long); We could sort them all, but I'd prefer to do that in a separate patch so that this commit does not make the impression of doing anything else than including the two new headers Such a separate patch could also unify the docstring style, see below > > > > > +/** > > > > + * memdup_array_user - duplicate array from user space > > > > > > > + * > > > > > > Do we need this blank line? > > > > I more or less directly copied the docstring format from the > > original > > functions (v)memdup_user() in mm/util.c > > I guess this is common style? > > I think it's not. But you may grep kernel source tree and tell which > one occurs more often with or without this (unneeded) blank line. It seems to be very much mixed. string.h itself is mixed. When you look at the bottom of string.h, you'll find functions such as kbasename() that have the extra line. That's not really a super decisive point for me, though. We can remove the line I guess P. > > > > > + * @src: source address in user space > > > > + * @n: number of array members to copy > > > > + * @size: size of one array member > > > > + * > > > > + * Return: an ERR_PTR() on failure. Result is physically > > > > + * contiguous, to be freed by kfree(). > > > > + */ > > ... > > > > > +/** > > > > + * vmemdup_array_user - duplicate array from user space > > > > > > > + * > > > > > > Redundant? > > > > No, there are two functions: > > * memdup_array_user() > > * vmemdup_array_user() > > > > On the deeper layers they utilize kmalloc() or kvmalloc(), > > respectively. > > I guess you misunderstood my comment. I was talking about kernel doc > (as in the previous function). > > > > > + * @src: source address in user space > > > > + * @n: number of array members to copy > > > > + * @size: size of one array member > > > > + * > > > > + * Return: an ERR_PTR() on failure. Result may be not > > > > + * physically contiguous. Use kvfree() to free. > > > > + */ > > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal - unseal
On 8/1/2023 3:12 PM, Sush Shringarputale wrote: For remote attestation to work, the service will need to know how to validate the snapshot_aggregate entry in the IMA log. It will have to read the PCR values present in the template data of snapshot_aggregate event in the latest IMA log, and ensure that the PCR quotes align with the contents of the past UM_snapshot_file(s). This will re-establish the chain of trust needed for the device to pass remote attestation. This will also maintain the ability of the remote-attestation-service to seal the secrets, if the client-server use TPM unseal mechanism to attest the state of the device. I think that seal/unseal to IMA PCRs is futile. Since boot is multi-threaded, the IMA PCR is unpredictable even when valid. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal - aggregate
On 8/1/2023 3:12 PM, Sush Shringarputale wrote: - A user-mode process will trigger the snapshot by opening a file in SysFS say /sys/kernel/security/ima/snapshot (referred to as sysk_ima_snapshot_file here onwards). - The Kernel will get the current TPM PCR values and PCR update counter [2] and store them as template data in a new IMA event "snapshot_aggregate". If this is relying on a user-mode process, is there a concern that the process doesn't run. Might it be safer to have the kernel trigger the snapshot. PCR reads are not atomic, with each other and with event log appends. Is this an issue? The PCR update counter can change between PCR reads. What is its purpose? What is the purpose of the snapshot aggregate? Since the entire event log has to be retained and sent to the verifier, is the aggregate redundant? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal - network bandwidth
On 8/1/2023 3:12 PM, Sush Shringarputale wrote: In addition, a large IMA log can add pressure on the network bandwidth when the attestation client sends it to remote-attestation-service. I would not worry too much about network bandwidth. 1. Every solution eventually realizes that sending the entire log each time hurts performance. The verifier will ask the attestor, "give me everything since record n", and the number of new entries approaches zero. 2. My benchmarks show that On the client, the TPM quote time swamps everything else. On the server, verifying the IMA entry signatures swamps everything else. The network transfer time is negligible. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/5] string.h: add array-wrappers for (v)memdup_user()
On Wed, Aug 30, 2023 at 4:46 PM Philipp Stanner wrote: > + if (unlikely(check_mul_overflow(n, size, &nbytes))) > + return ERR_PTR(-EINVAL); > + if (unlikely(check_mul_overflow(n, size, &nbytes))) > + return ERR_PTR(-EINVAL); Btw, why not -EOVERFLOW ? -- With Best Regards, Andy Shevchenko ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/5] string.h: add array-wrappers for (v)memdup_user()
On Wed, Aug 30, 2023 at 5:19 PM wrote: > On Wed, 2023-08-30 at 17:11 +0300, Andy Shevchenko wrote: > > On Wed, Aug 30, 2023 at 4:46 PM Philipp Stanner > > wrote: > > > --- a/include/linux/string.h > > > +++ b/include/linux/string.h > > > > I'm wondering if this has no side-effects as string.h/string.c IIRC > > is > > used also for early stages where some of the APIs are not available. > > > > > @@ -6,6 +6,8 @@ > > > #include/* for size_t */ > > > #include /* for NULL */ > > > #include/* for E2BIG */ > > > +#include /* for check_mul_overflow() */ > > > +#include /* for ERR_PTR() */ > > > > Can we preserve order (to some extent)? > > Sure. I just put it there so the comments build a congruent block. > Which order would you prefer? Alphabetical. compiler.h err.h overflow.h ...the rest that is a bit unordered... > > > #include > > > #include ... > > > +/** > > > + * memdup_array_user - duplicate array from user space > > > > > + * > > > > Do we need this blank line? > > I more or less directly copied the docstring format from the original > functions (v)memdup_user() in mm/util.c > I guess this is common style? I think it's not. But you may grep kernel source tree and tell which one occurs more often with or without this (unneeded) blank line. > > > + * @src: source address in user space > > > + * @n: number of array members to copy > > > + * @size: size of one array member > > > + * > > > + * Return: an ERR_PTR() on failure. Result is physically > > > + * contiguous, to be freed by kfree(). > > > + */ ... > > > +/** > > > + * vmemdup_array_user - duplicate array from user space > > > > > + * > > > > Redundant? > > No, there are two functions: > * memdup_array_user() > * vmemdup_array_user() > > On the deeper layers they utilize kmalloc() or kvmalloc(), > respectively. I guess you misunderstood my comment. I was talking about kernel doc (as in the previous function). > > > + * @src: source address in user space > > > + * @n: number of array members to copy > > > + * @size: size of one array member > > > + * > > > + * Return: an ERR_PTR() on failure. Result may be not > > > + * physically contiguous. Use kvfree() to free. > > > + */ -- With Best Regards, Andy Shevchenko ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/5] string.h: add array-wrappers for (v)memdup_user()
On Wed, 2023-08-30 at 17:15 +0300, Andy Shevchenko wrote: > On Wed, Aug 30, 2023 at 4:46 PM Philipp Stanner > wrote: > > > + if (unlikely(check_mul_overflow(n, size, &nbytes))) > > + return ERR_PTR(-EINVAL); > > > + if (unlikely(check_mul_overflow(n, size, &nbytes))) > > + return ERR_PTR(-EINVAL); > > Btw, why not -EOVERFLOW ? > Good question, actually. To be honest I wasn't quite sure which code to pick (-E2BIG was also once I candidate). -EINVAL was picked because the idea was that a request overflowing a size_t could surely be expected to contain an invalid parameter, because no one would ever request an array _that_ large ? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/5] string.h: add array-wrappers for (v)memdup_user()
On Wed, 2023-08-30 at 17:11 +0300, Andy Shevchenko wrote: > On Wed, Aug 30, 2023 at 4:46 PM Philipp Stanner > wrote: > > > > Currently, user array duplications are sometimes done without an > > overflow check. Sometimes the checks are done manually; sometimes > > the > > array size is calculated with array_size() and sometimes by > > calculating > > n * size directly in code. > > > > Introduce wrappers for arrays for memdup_user() and vmemdup_user() > > to > > provide a standardized and safe way for duplicating user arrays. > > > > This is both for new code as well as replacing usage of > > (v)memdup_user() > > in existing code that uses, e.g., n * size to calculate array > > sizes. > > ... > > > --- a/include/linux/string.h > > +++ b/include/linux/string.h > > I'm wondering if this has no side-effects as string.h/string.c IIRC > is > used also for early stages where some of the APIs are not available. > > > @@ -6,6 +6,8 @@ > > #include /* for size_t */ > > #include /* for NULL */ > > #include /* for E2BIG */ > > +#include /* for check_mul_overflow() */ > > +#include /* for ERR_PTR() */ > > Can we preserve order (to some extent)? Sure. I just put it there so the comments build a congruent block. Which order would you prefer? > > > #include > > #include > > ... > > > +/** > > + * memdup_array_user - duplicate array from user space > > > + * > > Do we need this blank line? I more or less directly copied the docstring format from the original functions (v)memdup_user() in mm/util.c I guess this is common style? > > > + * @src: source address in user space > > + * @n: number of array members to copy > > + * @size: size of one array member > > + * > > + * Return: an ERR_PTR() on failure. Result is physically > > + * contiguous, to be freed by kfree(). > > + */ > > ... > > > +/** > > + * vmemdup_array_user - duplicate array from user space > > > + * > > Redundant? No, there are two functions: * memdup_array_user() * vmemdup_array_user() On the deeper layers they utilize kmalloc() or kvmalloc(), respectively. Greetings, P. > > > + * @src: source address in user space > > + * @n: number of array members to copy > > + * @size: size of one array member > > + * > > + * Return: an ERR_PTR() on failure. Result may be not > > + * physically contiguous. Use kvfree() to free. > > + */ > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/5] string.h: add array-wrappers for (v)memdup_user()
On Wed, Aug 30, 2023 at 4:46 PM Philipp Stanner wrote: > > Currently, user array duplications are sometimes done without an > overflow check. Sometimes the checks are done manually; sometimes the > array size is calculated with array_size() and sometimes by calculating > n * size directly in code. > > Introduce wrappers for arrays for memdup_user() and vmemdup_user() to > provide a standardized and safe way for duplicating user arrays. > > This is both for new code as well as replacing usage of (v)memdup_user() > in existing code that uses, e.g., n * size to calculate array sizes. ... > --- a/include/linux/string.h > +++ b/include/linux/string.h I'm wondering if this has no side-effects as string.h/string.c IIRC is used also for early stages where some of the APIs are not available. > @@ -6,6 +6,8 @@ > #include/* for size_t */ > #include /* for NULL */ > #include/* for E2BIG */ > +#include /* for check_mul_overflow() */ > +#include /* for ERR_PTR() */ Can we preserve order (to some extent)? > #include > #include ... > +/** > + * memdup_array_user - duplicate array from user space > + * Do we need this blank line? > + * @src: source address in user space > + * @n: number of array members to copy > + * @size: size of one array member > + * > + * Return: an ERR_PTR() on failure. Result is physically > + * contiguous, to be freed by kfree(). > + */ ... > +/** > + * vmemdup_array_user - duplicate array from user space > + * Redundant? > + * @src: source address in user space > + * @n: number of array members to copy > + * @size: size of one array member > + * > + * Return: an ERR_PTR() on failure. Result may be not > + * physically contiguous. Use kvfree() to free. > + */ -- With Best Regards, Andy Shevchenko ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 3/5] kernel: watch_queue: copy user-array safely
Currently, there is no overflow-check with memdup_user(). Use the new function memdup_array_user() instead of memdup_user() for duplicating the user-space array safely. Suggested-by: David Airlie Signed-off-by: Philipp Stanner --- kernel/watch_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c index d0b6b390ee42..778b4056700f 100644 --- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -331,7 +331,7 @@ long watch_queue_set_filter(struct pipe_inode_info *pipe, filter.__reserved != 0) return -EINVAL; - tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + tf = memdup_array_user(_filter->filters, filter.nr_filters, sizeof(*tf)); if (IS_ERR(tf)) return PTR_ERR(tf); -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 4/5] drm_lease.c: copy user-array safely
Currently, there is no overflow-check with memdup_user(). Use the new function memdup_array_user() instead of memdup_user() for duplicating the user-space array safely. Suggested-by: David Airlie Signed-off-by: Philipp Stanner --- drivers/gpu/drm/drm_lease.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_lease.c b/drivers/gpu/drm/drm_lease.c index 150fe1555068..94375c6a5425 100644 --- a/drivers/gpu/drm/drm_lease.c +++ b/drivers/gpu/drm/drm_lease.c @@ -510,8 +510,8 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev, /* Handle leased objects, if any */ idr_init(&leases); if (object_count != 0) { - object_ids = memdup_user(u64_to_user_ptr(cl->object_ids), -array_size(object_count, sizeof(__u32))); + object_ids = memdup_array_user(u64_to_user_ptr(cl->object_ids), + object_count, sizeof(__u32)); if (IS_ERR(object_ids)) { ret = PTR_ERR(object_ids); idr_destroy(&leases); -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/5] kernel: kexec: copy user-array safely
Currently, there is no overflow-check with memdup_user(). Use the new function memdup_array_user() instead of memdup_user() for duplicating the user-space array safely. Suggested-by: David Airlie Signed-off-by: Philipp Stanner --- kernel/kexec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index 92d301f98776..f6067c1bb089 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -242,7 +242,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT)) return -EINVAL; - ksegments = memdup_user(segments, nr_segments * sizeof(ksegments[0])); + ksegments = memdup_array_user(segments, nr_segments, sizeof(ksegments[0])); if (IS_ERR(ksegments)) return PTR_ERR(ksegments); -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/5] string.h: add array-wrappers for (v)memdup_user()
Currently, user array duplications are sometimes done without an overflow check. Sometimes the checks are done manually; sometimes the array size is calculated with array_size() and sometimes by calculating n * size directly in code. Introduce wrappers for arrays for memdup_user() and vmemdup_user() to provide a standardized and safe way for duplicating user arrays. This is both for new code as well as replacing usage of (v)memdup_user() in existing code that uses, e.g., n * size to calculate array sizes. Suggested-by: David Airlie Signed-off-by: Philipp Stanner --- include/linux/string.h | 42 ++ 1 file changed, 42 insertions(+) diff --git a/include/linux/string.h b/include/linux/string.h index dbfc66400050..0e8e7a40bae7 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -6,6 +6,8 @@ #include/* for size_t */ #include /* for NULL */ #include/* for E2BIG */ +#include /* for check_mul_overflow() */ +#include /* for ERR_PTR() */ #include #include @@ -14,6 +16,46 @@ extern void *memdup_user(const void __user *, size_t); extern void *vmemdup_user(const void __user *, size_t); extern void *memdup_user_nul(const void __user *, size_t); +/** + * memdup_array_user - duplicate array from user space + * + * @src: source address in user space + * @n: number of array members to copy + * @size: size of one array member + * + * Return: an ERR_PTR() on failure. Result is physically + * contiguous, to be freed by kfree(). + */ +static inline void *memdup_array_user(const void __user *src, size_t n, size_t size) +{ + size_t nbytes; + + if (unlikely(check_mul_overflow(n, size, &nbytes))) + return ERR_PTR(-EINVAL); + + return memdup_user(src, nbytes); +} + +/** + * vmemdup_array_user - duplicate array from user space + * + * @src: source address in user space + * @n: number of array members to copy + * @size: size of one array member + * + * Return: an ERR_PTR() on failure. Result may be not + * physically contiguous. Use kvfree() to free. + */ +static inline void *vmemdup_array_user(const void __user *src, size_t n, size_t size) +{ + size_t nbytes; + + if (unlikely(check_mul_overflow(n, size, &nbytes))) + return ERR_PTR(-EINVAL); + + return vmemdup_user(src, nbytes); +} + /* * Include machine specific inline routines */ -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 5/5] drm: vmgfx_surface.c: copy user-array safely
Currently, there is no overflow-check with memdup_user(). Use the new function memdup_array_user() instead of memdup_user() for duplicating the user-space array safely. Suggested-by: David Airlie Signed-off-by: Philipp Stanner --- drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c index 5db403ee8261..9be185b094cb 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c @@ -777,9 +777,9 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void *data, sizeof(metadata->mip_levels)); metadata->num_sizes = num_sizes; metadata->sizes = - memdup_user((struct drm_vmw_size __user *)(unsigned long) + memdup_array_user((struct drm_vmw_size __user *)(unsigned long) req->size_addr, - sizeof(*metadata->sizes) * metadata->num_sizes); + metadata->num_sizes, sizeof(*metadata->sizes)); if (IS_ERR(metadata->sizes)) { ret = PTR_ERR(metadata->sizes); goto out_no_sizes; -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 0/5] Introduce new wrappers to copy user-arrays
Hi! David Airlie suggested that we could implement new wrappers around (v)memdup_user() for duplicating user arrays. This small patch series first implements the two new wrapper functions memdup_array_user() and vmemdup_array_user(). They calculate the array-sizes safely, i.e., they return an error in case of an overflow. It then implements the new wrappers in two components in kernel/ and two in the drm-subsystem. In total, there are 18 files in the kernel that use (v)memdup_user() to duplicate arrays. My plan is to provide patches for the other 14 successively once this series has been merged. P. Philipp Stanner (5): string.h: add array-wrappers for (v)memdup_user() kernel: kexec: copy user-array safely kernel: watch_queue: copy user-array safely drm_lease.c: copy user-array safely drm: vmgfx_surface.c: copy user-array safely drivers/gpu/drm/drm_lease.c | 4 +-- drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 4 +-- include/linux/string.h | 42 + kernel/kexec.c | 2 +- kernel/watch_queue.c| 2 +- 5 files changed, 48 insertions(+), 6 deletions(-) -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -next v9 0/2] support allocating crashkernel above 4G explicitly on riscv
Hello: This series was applied to riscv/linux.git (for-next) by Palmer Dabbelt : On Wed, 26 Jul 2023 17:49:58 + you wrote: > On riscv, the current crash kernel allocation logic is trying to > allocate within 32bit addressible memory region by default, if > failed, try to allocate without 4G restriction. > > In need of saving DMA zone memory while allocating a relatively large > crash kernel region, allocating the reserved memory top down in > high memory, without overlapping the DMA zone, is a mature solution. > Hence this patchset introduces the parameter option crashkernel=X,[high,low]. > > [...] Here is the summary with links: - [-next,v9,1/2] riscv: kdump: Implement crashkernel=X,[high,low] https://git.kernel.org/riscv/c/5882e5acf18d - [-next,v9,2/2] docs: kdump: Update the crashkernel description for riscv https://git.kernel.org/riscv/c/33f0dd973d4e You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code
On 08/30/23 at 09:49am, kernel test robot wrote: > Hi Baoquan, > > kernel test robot noticed the following build errors: > > [auto build test ERROR on arm64/for-next/core] > [also build test ERROR on tip/x86/core powerpc/next powerpc/fixes v6.5] > [cannot apply to linus/master next-20230829] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: > https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/crash_core-c-remove-unnecessary-parameter-of-function/20230829-201942 > base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git > for-next/core > patch link: > https://lore.kernel.org/r/20230829121610.138107-7-bhe%40redhat.com > patch subject: [PATCH v2 6/8] x86: kdump: use generic interface to simplify > crashkernel reservation code > config: x86_64-randconfig-r022-20230830 > (https://download.01.org/0day-ci/archive/20230830/202308300910.e0i4pijt-...@intel.com/config) > compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 > reproduce (this is a W=1 build): > (https://download.01.org/0day-ci/archive/20230830/202308300910.e0i4pijt-...@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version > of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot > | Closes: > https://lore.kernel.org/oe-kbuild-all/202308300910.e0i4pijt-...@intel.com/ > > All errors (new ones prefixed by >>): > >ld: vmlinux.o: in function `reserve_crashkernel_low': >kernel/crash_core.c:369: undefined reference to `crashk_low_res' >ld: kernel/crash_core.c:369: undefined reference to `crashk_low_res' >ld: kernel/crash_core.c:370: undefined reference to `crashk_low_res' >ld: kernel/crash_core.c:369: undefined reference to `crashk_low_res' >ld: kernel/crash_core.c:370: undefined reference to `crashk_low_res' >ld: vmlinux.o:kernel/crash_core.c:371: more undefined references to > `crashk_low_res' follow >ld: vmlinux.o: in function `reserve_crashkernel_generic': > >> kernel/crash_core.c:453: undefined reference to `crashk_res' > >> ld: kernel/crash_core.c:453: undefined reference to `crashk_res' >ld: kernel/crash_core.c:454: undefined reference to `crashk_res' > >> ld: kernel/crash_core.c:453: undefined reference to `crashk_res' >ld: kernel/crash_core.c:454: undefined reference to `crashk_res' >ld: vmlinux.o:kernel/crash_core.c:454: more undefined references to > `crashk_res' follow Thanks for reporting. This one has the same root cause as the i386-randconfig building. It's because crashk_res|crashk_low_res are defined in kernel/kexec_core.c, but referenced in generic reservation code in crash_core.c, while in this lkp's randconfig, CONFIG_KEXEC_CORE is unset, CONFIG_CRASH_CORE=on. Then undefined reference to `crashk_res' and `crashk_low_res' are reported. Below patch can fix it. I will post v3 to contain this. >From 5a5bda3b5b1e370b789489d87516c8f1fb966c46 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Tue, 29 Aug 2023 22:38:15 -0400 Subject: [PATCH] crash_core: move crashk_res and crashk_low_res definition into crash_core.c Content-type: text/plain Both crashk_res and crashk_low_res are used to mark the reserved crashkernel regions in iomem_resource tree. And later the generic crashkernel resrvation which references them will be added into crash_core.c. So move crashk_res and crashk_low_res definition into crash_core.c to avoid compiling error when CONFIG_CRASH_CORE=on while CONFIG_KEXEC_CORE is unset. Signed-off-by: Baoquan He --- include/linux/crash_core.h | 5 + include/linux/kexec.h | 4 kernel/crash_core.c| 16 kernel/kexec_core.c| 17 - 4 files changed, 21 insertions(+), 21 deletions(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index d64006c4bd43..daa87b8730d3 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -9,6 +9,11 @@ #include #endif +/* Location of a reserved region to hold the crash kernel. + */ +extern struct resource crashk_res; +extern struct resource crashk_low_res; + #define CRASH_CORE_NOTE_NAME "CORE" #define CRASH_CORE_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4) #define CRASH_CORE_NOTE_NAME_BYTES ALIGN(sizeof(CRASH_CORE_NOTE_NAME), 4) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 22b5cd24f581..e762b0435c39 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -22,10 +22,6 @@ #include #include -/* Location of a reserved region to hold the cras
Re: [RFC PATCH] Introduce persistent memory pool
Hi Stanislav, On 30.08.23 00:07, Stanislav Kinsburskii wrote: On Mon, Aug 28, 2023 at 10:50:19PM +0200, Alexander Graf wrote: +kexec, iommu, kvm On 23.08.23 04:45, Stanislav Kinsburskii wrote: +akpm, +linux-mm On Fri, Aug 25, 2023 at 01:32:40PM +, Gowans, James wrote: On Fri, 2023-08-25 at 10:05 +0200, Greg Kroah-Hartman wrote: Thanks for adding me to this thread Greg! On Tue, Aug 22, 2023 at 11:34:34AM -0700, Stanislav Kinsburskii wrote: This patch addresses the need for a memory allocator dedicated to persistent memory within the kernel. This allocator will preserve kernel-specific states like DMA passthrough device states, IOMMU state, and more across kexec. The proposed solution offers a foundational implementation for potential custom solutions that might follow. Though the implementation is intentionally kept concise and straightforward to foster discussion and feedback, it's fully functional in its current state. Hi Stanislav, it looks like we're working on similar things. I'm looking to develop a mechanism to support hypervisor live update for when KVM is running VMs with PCI device passthrough. VMs with device passthrough also necessitates passing and re-hydrating IOMMU state so that DMA can continue during live update. Planning on having an LPC session on this topic: https://lpc.events/event/17/abstracts/1629/ (currently it's only a submitted abstract so not sure if visible, hopefully it will be soon). We are looking at implementing persistence across kexec via an in-memory filesystem on top of reserved memory. This would have files for anything that needs to be persisted. That includes files for IOMMU pgtables, for guest memory or userspace-accessible memory. It may be nice to solve all kexec persistence requirements with one solution, but we can consider IOMMU separately. There are at least three ways that this can be done: a) carving out reserved memory for pgtables. This is done by your proposal here, as well as my suggestion of a filesystem. b) pre/post kexec hooks for drivers to serialise state and pass it across in a structured format from old to new kernel. c) Reconstructing IOMMU state in the new kernel by starting at the hardware registers and walking the page tables. No state passing needed. Have you considered option (b) and (c) here? One of the implications of (b) and (c) are that they would need to hook into the buddy allocator really early to be able to carve out the reconstructed page tables before the allocator is used. Similar to how pkram [0] hooks in early to carve out pages used for its filesystem. Hi James, We are indeed working on similar things, so thanks for chiming in. I've seen pkram proposal as well as your comments there. I think (b) will need some persistent-over-kexec memory to pass the state across kexec as well as some key-value store persisted as well. And the proposed persistent memory pool is aimed exactly for this purpose. Or do you imply some other way to pass driver's data accross kexec? If I had to build this, I'd probably do it just like device tree passing on ARM. It's a single, physically contiguous blob of data whose entry point you pass to the target kernel. IIRC ACPI passing works similarly. This would just be one more opaque data structure that then needs very strict versioning and forward/backward compat guarantees. Device tree or ACPI are options indeed. However AFAIU in case of DT user space has to involved into the picture to modify and complie it, while ACPI isn't flexible or easily extendable. Also, AFAIU both these standards were designed with passing hardware-specific data in mind from bootstrap software to an OS kernel and thus were never really intended to be used for creating a persistent state accross kexec. To me, an attempt to use either of them to pass kernel-specific data looks like an abuse (or misuse) excused by the simplicity of implementation. What I was describing above is that the Linux boot protocol already has natural ways to pass a DT (arm) or set of ACPI tables (x86) to the target kernel. Whatever we do here should either piggy back on top of those natural mechanisms (e.g. /chosen node in DT) or be on the same level (e.g. pass DT in one register, pass metadata structure in another register). When it comes to the actual content of the metadata, I'm personally also leaning towards DT. We already have libfdt inside the kernel. It gives is a very simple, well understood structured file format that you can extend, version, etc etc. And the kernel has mechanisms to modify fdt contents. I dind't consider (c) yet, thanks for for the pointer. I have a question in this scope: how is PCI devices registers state is persisted across kexec with the files system you are working on? I.e. how does driver know, that the device shouldn't not be reinitialized? The easiest way to do it initially would be kernel command line options that hack up the drivers. But I suppose depending on the optio