Re: [PATCH v3 1/2] x86/mm: rename the confusing local variable in early_memremap_is_setup_data()
On 11/06/24 at 12:20pm, Borislav Petkov wrote: > On Sat, Nov 02, 2024 at 12:06:18PM +0100, Borislav Petkov wrote: > > Ok, I'll take your 2/2 next week and you can then send the cleanup ontop. > > OMG what a mess this is. Please test the below before I apply it. Just got a machine and building kernel, will report here when testing is done. > > Then, when you do the cleanup, do the following: > > - merge early_memremap_is_setup_data() with memremap_is_setup_data() into > a common __memremap_is_setup_data() and then add a bool early which > determines which memremap variant is called. > > - unify the @size argument by dropping it and using a function local size. > What we have there now is the definition of bitrot. :-\ > > - replace all sizeof(*data), sizeof(struct setup_data) with a macro definition > above the functions to unify it properly. > > What an ugly mess... :-\ Will clean them all up as suggested. Thanks. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390
On 10/25/24 at 05:11pm, David Hildenbrand wrote: > This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds > virtio-mem support on s390. > > The only "different than everything else" thing about virtio-mem on s390 > is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr > during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the > crash kernel must detect memory ranges of the crashed/panicked kernel to > include via PT_LOAD in the vmcore. > > On other architectures, all RAM regions (boot + hotplugged) can easily be > observed on the old (to crash) kernel (e.g., using /proc/iomem) to create > the elfcore hdr. > > On s390, information about "ordinary" memory (heh, "storage") can be > obtained by querying the hypervisor/ultravisor via SCLP/diag260, and > that information is stored early during boot in the "physmem" memblock > data structure. > > But virtio-mem memory is always detected by as device driver, which is > usually build as a module. So in the crash kernel, this memory can only be > properly detected once the virtio-mem driver started up. > > The virtio-mem driver already supports the "kdump mode", where it won't > hotplug any memory but instead queries the device to implement the > pfn_is_ram() callback, to avoid reading unplugged memory holes when reading > the vmcore. > > With this series, if the virtio-mem driver is included in the kdump > initrd -- which dracut already takes care of under Fedora/RHEL -- it will > now detect the device RAM ranges on s390 once it probes the devices, to add > them to the vmcore using the same callback mechanism we already have for > pfn_is_ram(). > > To add these device RAM ranges to the vmcore ("patch the vmcore"), we will > add new PT_LOAD entries that describe these memory ranges, and update > all offsets vmcore size so it is all consistent. > > Note that makedumfile is shaky with v6.12-rcX, I made the "obvious" things > (e.g., free page detection) work again while testing as documented in [2]. > > Creating the dumps using makedumpfile seems to work fine, and the > dump regions (PT_LOAD) are as expected. I yet have to check in more detail > if the created dumps are good (IOW, the right memory was dumped, but it > looks like makedumpfile reads the right memory when interpreting the > kernel data structures, which is promising). > > Patch #1 -- #6 are vmcore preparations and cleanups Thanks for CC-ing me, I will review the patch 1-6, vmcore part next week.
Re: [PATCH v6 0/7] Support kdump with LUKS encryption by reusing LUKS volume keys
Hi Coiby, On 10/29/24 at 01:52pm, Coiby Xu wrote: > LUKS is the standard for Linux disk encryption, widely adopted by users, > and in some cases, such as Confidential VMs, it is a requirement. With > kdump enabled, when the first kernel crashes, the system can boot into > the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) > to a specified target. However, there are two challenges when dumping > vmcore to a LUKS-encrypted device: I am doing RHEL code rebasing to upstream kernel, will review this next week. Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/2] x86/mm: rename the confusing local variable in early_memremap_is_setup_data()
On 11/01/24 at 05:18pm, Borislav Petkov wrote: > On Thu, Oct 31, 2024 at 11:41:12AM +0800, Baoquan He wrote: > > Should I send the fixing patch alone and clean up the useless argument > > 'size' later, or squash them into one patch? > > First the fix, then the cleanup. Sure, will do. Thanks a lot. > > Btw, that fix wants to go to stable no? Seeing how it breaks certain machines > with IMA and kdump and SMe... Yeah, it should be added to stable. Distros may get both SME/IMA set not as early as the bug introduced, while anyone doing so in an earlier kernel will see the problem. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/2] x86/mm: rename the confusing local variable in early_memremap_is_setup_data()
On 10/30/24 at 07:49am, Tom Lendacky wrote: > On 10/29/24 19:53, Baoquan He wrote: > > On 10/29/24 at 07:11pm, Borislav Petkov wrote: > >> On Wed, Sep 11, 2024 at 04:16:14PM +0800, Baoquan He wrote: > >>> In function early_memremap_is_setup_data(), parameter 'size' passed has > >>> the same name as the local variable inside the while loop. That > >>> confuses people who sometime mix up them when reading code. > >>> > >>> Here rename the local variable 'size' inside while loop to 'sd_size'. > >>> > >>> And also add one local variable 'sd_size' likewise in function > >>> memremap_is_setup_data() to simplify code. In later patch, this can also > >>> be used. > >>> > >>> Signed-off-by: Baoquan He > >>> Acked-by: Tom Lendacky > >>> --- > >>> arch/x86/mm/ioremap.c | 18 +++--- > >>> 1 file changed, 11 insertions(+), 7 deletions(-) > >>> > >>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c > >>> index aa7d279321ea..f1ee8822ddf1 100644 > >>> --- a/arch/x86/mm/ioremap.c > >>> +++ b/arch/x86/mm/ioremap.c > >>> @@ -640,7 +640,7 @@ static bool memremap_is_setup_data(resource_size_t > >>> phys_addr, > >> > >> Huh? > > > > Thanks for looking into this. > > > > I ever doubted this, guess it could use the unused 'size' to avoid > > warning? Noticed Tom introduced it at the beginning. It's better idea to > > remove it if it's useless. > > > > commit 8f716c9b5febf6ed0f5fedb7c9407cd0c25b2796 > > Author: Tom Lendacky > > Date: Mon Jul 17 16:10:16 2017 -0500 > > > > x86/mm: Add support to access boot related data in the clear > > > > Hi Tom, > > > > Can you help check and tell your intention why the argument 'size' is > > added into early_memremap_is_setup_data() and memremap_is_setup_data(). > > That was a long time ago... I probably used it while I was developing the > support and then never removed it in the final version where it wasn't used. Thanks for confirming. Then we can remove it to avoid confusion. Hi Boris, Should I send the fixing patch alone and clean up the useless argument 'size' later, or squash them into one patch? Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 0/2] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
On 10/29/24 at 06:23pm, Andrew Morton wrote: > On Wed, 11 Sep 2024 16:16:13 +0800 Baoquan He wrote: > > > Currently, distros like Fedora/RHEL have enabled CONFIG_IMA_KEXEC by > > default. This makes kexec/kdump kernel always fail to boot up on SME > > platform because of a code bug. By debugging, the root cause is found > > out and bug is fixed with this patchset. > > [1/1] is a cleanup. [2/2] fixes a bug which appears to go all the way > back to 5.10. The bugfix patch has a dependency on the cleanup, which > is unfortunate. > > We could add the Fixes: to [1/1] and add cc:stable to both patches so > they get backported into -stable kernels together. But I think it's > nicer to just concentrate on the single bugfix patch (with Fixes: and > cc:stable) and do the cleanup later, in the usual fashion. Totally agree, thanks a lot. > > So can I suggest a resend please? Will send a standalone patch to fix the bug, and send clean up patch after the fix patch is settled. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/2] x86/mm: rename the confusing local variable in early_memremap_is_setup_data()
On 10/29/24 at 07:11pm, Borislav Petkov wrote: > On Wed, Sep 11, 2024 at 04:16:14PM +0800, Baoquan He wrote: > > In function early_memremap_is_setup_data(), parameter 'size' passed has > > the same name as the local variable inside the while loop. That > > confuses people who sometime mix up them when reading code. > > > > Here rename the local variable 'size' inside while loop to 'sd_size'. > > > > And also add one local variable 'sd_size' likewise in function > > memremap_is_setup_data() to simplify code. In later patch, this can also > > be used. > > > > Signed-off-by: Baoquan He > > Acked-by: Tom Lendacky > > --- > > arch/x86/mm/ioremap.c | 18 +++--- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c > > index aa7d279321ea..f1ee8822ddf1 100644 > > --- a/arch/x86/mm/ioremap.c > > +++ b/arch/x86/mm/ioremap.c > > @@ -640,7 +640,7 @@ static bool memremap_is_setup_data(resource_size_t > > phys_addr, > > Huh? Thanks for looking into this. I ever doubted this, guess it could use the unused 'size' to avoid warning? Noticed Tom introduced it at the beginning. It's better idea to remove it if it's useless. commit 8f716c9b5febf6ed0f5fedb7c9407cd0c25b2796 Author: Tom Lendacky Date: Mon Jul 17 16:10:16 2017 -0500 x86/mm: Add support to access boot related data in the clear Hi Tom, Can you help check and tell your intention why the argument 'size' is added into early_memremap_is_setup_data() and memremap_is_setup_data(). Thanks Baoquan > > --- > diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c > index 70b02fc61d93..e461d8e26871 100644 > --- a/arch/x86/mm/ioremap.c > +++ b/arch/x86/mm/ioremap.c > @@ -632,8 +632,7 @@ static bool memremap_is_efi_data(resource_size_t > phys_addr, > * Examine the physical address to determine if it is boot data by checking > * it against the boot params setup_data chain. > */ > -static bool memremap_is_setup_data(resource_size_t phys_addr, > -unsigned long size) > +static bool memremap_is_setup_data(resource_size_t phys_addr) > { > struct setup_indirect *indirect; > struct setup_data *data; > @@ -769,7 +768,7 @@ bool arch_memremap_can_ram_remap(resource_size_t > phys_addr, unsigned long size, > return false; > > if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) { > - if (memremap_is_setup_data(phys_addr, size) || > + if (memremap_is_setup_data(phys_addr) || > memremap_is_efi_data(phys_addr, size)) > return false; > } > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 0/2] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
On 09/30/24 at 10:59am, Baoquan He wrote: > Hi, > > On 09/11/24 at 04:16pm, Baoquan He wrote: > > Currently, distros like Fedora/RHEL have enabled CONFIG_IMA_KEXEC by > > default. This makes kexec/kdump kernel always fail to boot up on SME > > platform because of a code bug. By debugging, the root cause is found > > out and bug is fixed with this patchset. > > PING. > > Can this be added into 6.12 so that SME system is available? Please > tell if there's any concern or comment. Ping again. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [linus:master] [kexec] 816d334afa: Mem-Info
Hi Yuntao, On 10/23/24 at 04:30pm, kernel test robot wrote: > > > Hello, > > kernel test robot noticed "Mem-Info" on: > > commit: 816d334afa85c836080b41bb6238aea845615ad9 ("kexec: modify the meaning > of the end parameter in kimage_is_destination_range()") Can you check what happened with your patch? Thanks Baoquan > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > [test failed on linus/master c2ee9f594da826bea183ed14f2cc029c719bf4da] > [test failed on linux-next/master 7436324ebd147598f940dde1335b7979dbccc339] > > in testcase: trinity > version: > with following parameters: > > runtime: 600s > > > > config: x86_64-randconfig-r032-20220801 > compiler: gcc-12 > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > we observed the issue happen randomly but keep clean on parent > > > 5c28913ed04b29ef 816d334afa85c836080b41bb623 > --- >fail:runs %reproductionfail:runs >| | | >:100 41% 41:100 dmesg.Mem-Info > > > and from below Call Trace, seems be related to changes in 816d334afa. FYI > > > If you fix the issue in a separate patch/commit (i.e. not just a new version > of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot > | Closes: https://lore.kernel.org/oe-lkp/202410231602.fecf96df-...@intel.com > > > > [ 183.284967][ T2438] trinity-c2: page allocation failure: order:1, > mode:0x10cc0(GFP_KERNEL|__GFP_NORETRY), > nodemask=(null),cpuset=/,mems_allowed=0 > [ 183.287021][ T2438] CPU: 0 PID: 2438 Comm: trinity-c2 Not tainted > 6.7.0-rc4-00178-g816d334afa85 #1 > [ 183.288291][ T2438] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 183.289719][ T2438] Call Trace: > [ 183.290202][ T2438] > [ 183.290556][ T2438] dump_stack_lvl (lib/dump_stack.c:107) > [ 183.291233][ T2438] dump_stack (lib/dump_stack.c:114) > [ 183.291773][ T2438] warn_alloc (mm/page_alloc.c:3391) > [ 183.292370][ T2438] ? zone_watermark_ok_safe (mm/page_alloc.c:3370) > [ 183.293136][ T2438] ? get_page_from_freelist (mm/page_alloc.c:3513) > [ 183.293882][ T2438] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) > [ 183.294602][ T2438] ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4423) > [ 183.295385][ T2438] __alloc_pages_slowpath+0x17d2/0x1b00 > [ 183.296285][ T2438] ? __zone_watermark_ok (mm/page_alloc.c:2968) > [ 183.297011][ T2438] ? ftrace_likely_update (arch/x86/include/asm/smap.h:56 > kernel/trace/trace_branch.c:229) > [ 183.297719][ T2438] ? warn_alloc (mm/page_alloc.c:4041) > [ 183.298306][ T2438] ? get_page_from_freelist (include/linux/mmzone.h:1651 > mm/page_alloc.c:3187) > [ 183.299017][ T2438] ? ftrace_likely_update (arch/x86/include/asm/smap.h:56 > kernel/trace/trace_branch.c:229) > [ 183.299639][ T2438] __alloc_pages (mm/page_alloc.c:4581) > [ 183.300162][ T2438] ? asm_sysvec_apic_timer_interrupt > (arch/x86/include/asm/idtentry.h:645) > [ 183.300851][ T2438] ? __alloc_pages_slowpath+0x1b00/0x1b00 > [ 183.301627][ T2438] ? kimage_alloc_pages (arch/x86/include/asm/bitops.h:55 > (discriminator 3) include/asm-generic/bitops/instrumented-atomic.h:29 > (discriminator 3) include/linux/page-flags.h:492 (discriminator 3) > kernel/kexec_core.c:303 (discriminator 3)) > [ 183.302230][ T2438] ? ftrace_likely_update (arch/x86/include/asm/smap.h:56 > kernel/trace/trace_branch.c:229) > [ 183.302840][ T2438] kimage_alloc_pages (include/linux/gfp.h:238 > include/linux/gfp.h:261 include/linux/gfp.h:274 kernel/kexec_core.c:295) > [ 183.303436][ T2438] kimage_alloc_control_pages (kernel/kexec_core.c:369 > kernel/kexec_core.c:480) > [ 183.304126][ T2438] ? kimage_free_page_list (kernel/kexec_core.c:475) > [ 183.304774][ T2438] kimage_alloc_init (kernel/kexec.c:63) > [ 183.305362][ T2438] do_kexec_load (kernel/kexec.c:125) > [ 183.305910][ T2438] ? kimage_alloc_init (kernel/kexec.c:89) > [ 183.306496][ T2438] ? ftrace_likely_update (arch/x86/include/asm/smap.h:56 > kernel/trace/trace_branch.c:229) > [ 183.307107][ T2438] __x64_sys_kexec_load (kernel/kexec.c:255 > kernel/kexec.c:235 kernel/kexec.c:235) > [ 183.307750][ T2438] do_syscall_64 (arch/x86/entry/common.c:51 > arch/x86/entry/common.c:82) > [ 183.308284][ T2438] entry_SYSCALL_64_after_hwframe > (arch/x86/entry/entry_64.S:129) > [ 183.308968][ T2438] RIP: 0033:0x463519 > [ 183.309465][ T2438] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 > 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 > <48> 3d 01 f0 ff ff 0f 83 db 59 00 00 c3 66 2e 0f 1f 84 00 00 00 00 > All code > >0: 00 f3 add%dh,%bl >2: c3 ret >3: 66 2e 0f 1f 84 00 00cs nopw 0x0(%rax,%rax,1) >a: 00 00 00 >
Re: [PATCH] resource,kexec: walk_system_ram_res_rev must retain resource flags
On 10/18/24 at 05:52pm, Andy Shevchenko wrote: > On Fri, Oct 18, 2024 at 05:51:09PM +0300, Andy Shevchenko wrote: > > On Fri, Oct 18, 2024 at 09:52:47PM +0800, Baoquan He wrote: > > > On 10/18/24 at 03:22pm, Andy Shevchenko wrote: > > > > On Fri, Oct 18, 2024 at 10:18:42AM +0800, Baoquan He wrote: > > > > > On 10/17/24 at 03:03pm, Gregory Price wrote: > > > > > > walk_system_ram_res_rev() erroneously discards resource flags when > > > > > > passing the information to the callback. > > > > > > > > > > > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > > > > > > have these resources selected during kexec to store kexec buffers > > > > > > if that memory happens to be at placed above normal system ram. > > > > > > > > > > Sorry about that. I haven't checked IORESOURCE_SYSRAM_DRIVER_MANAGED > > > > > memory carefully, wondering if res could be set as > > > > > 'IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY' plus > > > > > IORESOURCE_SYSRAM_DRIVER_MANAGED in iomem_resource tree. > > > > > > > > > > Anyway, the change in this patch is certainly better. Thanks. > > > > > > > > Can we get more test cases in the respective module, please? > > > > > > Do you mean testing CXL memory in kexec/kdump? No, we can't. Kexec/kdump > > > test cases basically is system testing, not unit test or module test. It > > > needs run system and then jump to 2nd kernel, vm can be used but it > > > can't cover many cases existing only on baremetal. Currenly, Redhat's > > > CKI is heavily relied on to test them, however I am not sure if system > > > with CXL support is available in our LAB. > > > > > > Not sure if I got you right. > > > > I meant since we touch resource.c, we should really touch resource_kunit.c > > *in addition to*. > > And to be more clear, there is no best time to add test cases than > as early as possible. So, can we add the test cases to the (new) APIs, > so we want have an issue like the one this patch fixes? I will have a look at kernel/resource_kunit.c to see if I can add something for walk_system_ram_res_rev(). Thanks. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] resource,kexec: walk_system_ram_res_rev must retain resource flags
On 10/18/24 at 03:22pm, Andy Shevchenko wrote: > On Fri, Oct 18, 2024 at 10:18:42AM +0800, Baoquan He wrote: > > HI Gregory, > > > > On 10/17/24 at 03:03pm, Gregory Price wrote: > > > walk_system_ram_res_rev() erroneously discards resource flags when > > > passing the information to the callback. > > > > > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > > > have these resources selected during kexec to store kexec buffers > > > if that memory happens to be at placed above normal system ram. > > > > Sorry about that. I haven't checked IORESOURCE_SYSRAM_DRIVER_MANAGED > > memory carefully, wondering if res could be set as > > 'IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY' plus > > IORESOURCE_SYSRAM_DRIVER_MANAGED in iomem_resource tree. > > > > Anyway, the change in this patch is certainly better. Thanks. > > Can we get more test cases in the respective module, please? Do you mean testing CXL memory in kexec/kdump? No, we can't. Kexec/kdump test cases basically is system testing, not unit test or module test. It needs run system and then jump to 2nd kernel, vm can be used but it can't cover many cases existing only on baremetal. Currenly, Redhat's CKI is heavily relied on to test them, however I am not sure if system with CXL support is available in our LAB. Not sure if I got you right. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC PATCH v1 19/57] crash: Remove PAGE_SIZE compile-time constant assumption
On 10/15/24 at 12:13pm, Ryan Roberts wrote: > On 15/10/2024 04:47, Baoquan He wrote: > > On 10/14/24 at 11:58am, Ryan Roberts wrote: > >> To prepare for supporting boot-time page size selection, refactor code > >> to remove assumptions about PAGE_SIZE being compile-time constant. Code > >> intended to be equivalent when compile-time page size is active. > >> > >> Updated BUILD_BUG_ON() to test against limit. > >> > >> Signed-off-by: Ryan Roberts > >> --- > >> > >> ***NOTE*** > >> Any confused maintainers may want to read the cover note here for context: > >> https://lore.kernel.org/all/20241014105514.3206191-1-ryan.robe...@arm.com/ > >> > >> kernel/crash_core.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/kernel/crash_core.c b/kernel/crash_core.c > >> index 63cf89393c6eb..978c600a47ac8 100644 > >> --- a/kernel/crash_core.c > >> +++ b/kernel/crash_core.c > >> @@ -465,7 +465,7 @@ static int __init crash_notes_memory_init(void) > >> * Break compile if size is bigger than PAGE_SIZE since crash_notes > >> * definitely will be in 2 pages with that. > >> */ > >> - BUILD_BUG_ON(size > PAGE_SIZE); > >> + BUILD_BUG_ON(size > PAGE_SIZE_MIN); > > > > This should be OK. While one thing which could happen is if selected size > > is 64K, PAGE_SIZE_MIN is 4K, it will issue a false-positive warning when > > compiling while actual it's not a problem during running. > > PAGE_SIZE can only ever be bigger than PAGE_SIZE_MIN if compiling a "boot-time > page size" build. And in this case, you need to know that size is small enough > to work with any of the boot-time selectable page sizes. Since size > (=sizeof(note_buf_t)) is invariant to PAGE_SIZE, we can do this by checking > against PAGE_SIZE_MIN. > > So I don't think this could ever lead to a false-positive. Makes sense, thanks for your explanation. > > > Not sure if > > that could happen on arm64. Anyway, we can check the crash_notes to get > > why it's so big when it really happens. So, > > > > Acked-by: Baoquan He > > Thanks! > > > > >> > >>crash_notes = __alloc_percpu(size, align); > >>if (!crash_notes) { > >> -- > >> 2.43.0 > >> > >> > > > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] resource,kexec: walk_system_ram_res_rev must retain resource flags
HI Gregory, On 10/17/24 at 03:03pm, Gregory Price wrote: > walk_system_ram_res_rev() erroneously discards resource flags when > passing the information to the callback. > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > have these resources selected during kexec to store kexec buffers > if that memory happens to be at placed above normal system ram. Sorry about that. I haven't checked IORESOURCE_SYSRAM_DRIVER_MANAGED memory carefully, wondering if res could be set as 'IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY' plus IORESOURCE_SYSRAM_DRIVER_MANAGED in iomem_resource tree. Anyway, the change in this patch is certainly better. Thanks. Acked-by: Baoquan He > > This leads to undefined behavior after reboot. If the kexec buffer > is never touched, nothing happens. If the kexec buffer is touched, > it could lead to a crash (like below) or undefined behavior. > > Tested on a system with CXL memory expanders with driver managed > memory, TPM enabled, and CONFIG_IMA_KEXEC=y. Adding printk's > showed the flags were being discarded and as a result the check > for IORESOURCE_SYSRAM_DRIVER_MANAGED passes. > > find_next_iomem_res: name(System RAM (kmem)) >start(100) >end(1034fff) >flags(83000200) > > locate_mem_hole_top_down: start(100) end(1034fff) flags(0) > > [.] BUG: unable to handle page fault for address: 89834000 > [.] #PF: supervisor read access in kernel mode > [.] #PF: error_code(0x) - not-present page > [.] PGD c04c8bf067 P4D c04c8bf067 PUD c04c8be067 PMD 0 > [.] Oops: [#1] SMP > [.] RIP: 0010:ima_restore_measurement_list+0x95/0x4b0 > [.] RSP: 0018:c90d3a80 EFLAGS: 00010286 > [.] RAX: 1000 RBX: RCX: 89834000 > [.] RDX: 0018 RSI: 89834000 RDI: 89834018 > [.] RBP: c90d3ba0 R08: 0020 R09: 888132b8a900 > [.] R10: 4000 R11: 3a616d69 R12: > [.] R13: 8404ac28 R14: R15: 89834000 > [.] FS: () GS:893d4464() > knlGS: > [.] CS: 0010 DS: ES: CR0: 80050033 > [.] ata5: SATA link down (SStatus 0 SControl 300) > [.] CR2: 89834000 CR3: 01034d00f001 CR4: 00770ef0 > [.] PKRU: 5554 > [.] Call Trace: > [.] > [.] ? __die+0x78/0xc0 > [.] ? page_fault_oops+0x2a8/0x3a0 > [.] ? exc_page_fault+0x84/0x130 > [.] ? asm_exc_page_fault+0x22/0x30 > [.] ? ima_restore_measurement_list+0x95/0x4b0 > [.] ? template_desc_init_fields+0x317/0x410 > [.] ? crypto_alloc_tfm_node+0x9c/0xc0 > [.] ? init_ima_lsm+0x30/0x30 > [.] ima_load_kexec_buffer+0x72/0xa0 > [.] ima_init+0x44/0xa0 > [.] __initstub__kmod_ima__373_1201_init_ima7+0x1e/0xb0 > [.] ? init_ima_lsm+0x30/0x30 > [.] do_one_initcall+0xad/0x200 > [.] ? idr_alloc_cyclic+0xaa/0x110 > [.] ? new_slab+0x12c/0x420 > [.] ? new_slab+0x12c/0x420 > [.] ? number+0x12a/0x430 > [.] ? sysvec_apic_timer_interrupt+0xa/0x80 > [.] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > [.] ? parse_args+0xd4/0x380 > [.] ? parse_args+0x14b/0x380 > [.] kernel_init_freeable+0x1c1/0x2b0 > [.] ? rest_init+0xb0/0xb0 > [.] kernel_init+0x16/0x1a0 > [.] ret_from_fork+0x2f/0x40 > [.] ? rest_init+0xb0/0xb0 > [.] ret_from_fork_asm+0x11/0x20 > [.] > > Link: https://lore.kernel.org/all/20231114091658.228030-1-...@redhat.com/ > Fixes: 7acf164b259d ("resource: add walk_system_ram_res_rev()") > Cc: sta...@vger.kernel.org > Signed-off-by: Gregory Price > --- > kernel/resource.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index b730bd28b422..4101016e8b20 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -459,9 +459,7 @@ int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > rams_size += 16; > } > > - rams[i].start = res.start; > - rams[i++].end = res.end; > - > + rams[i++] = res; > start = res.end + 1; > } > > -- > 2.43.0 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC PATCH v1 19/57] crash: Remove PAGE_SIZE compile-time constant assumption
On 10/14/24 at 11:58am, Ryan Roberts wrote: > To prepare for supporting boot-time page size selection, refactor code > to remove assumptions about PAGE_SIZE being compile-time constant. Code > intended to be equivalent when compile-time page size is active. > > Updated BUILD_BUG_ON() to test against limit. > > Signed-off-by: Ryan Roberts > --- > > ***NOTE*** > Any confused maintainers may want to read the cover note here for context: > https://lore.kernel.org/all/20241014105514.3206191-1-ryan.robe...@arm.com/ > > kernel/crash_core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index 63cf89393c6eb..978c600a47ac8 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -465,7 +465,7 @@ static int __init crash_notes_memory_init(void) >* Break compile if size is bigger than PAGE_SIZE since crash_notes >* definitely will be in 2 pages with that. >*/ > - BUILD_BUG_ON(size > PAGE_SIZE); > + BUILD_BUG_ON(size > PAGE_SIZE_MIN); This should be OK. While one thing which could happen is if selected size is 64K, PAGE_SIZE_MIN is 4K, it will issue a false-positive warning when compiling while actual it's not a problem during running. Not sure if that could happen on arm64. Anyway, we can check the crash_notes to get why it's so big when it really happens. So, Acked-by: Baoquan He > > crash_notes = __alloc_percpu(size, align); > if (!crash_notes) { > -- > 2.43.0 > > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/e820: update code comment about e820_table_kexec
On 09/14/24 at 07:20pm, Dave Young wrote: > The setup_data ranges are not reserved for kexec any more after > commit fc7f27cda843 ("x86/kexec: Do not update E820 kexec table > for setup_data"), so update the code comment here. > > Signed-off-by: Dave Young > --- > arch/x86/kernel/e820.c |6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > Index: linux-x86/arch/x86/kernel/e820.c > === > --- linux-x86.orig/arch/x86/kernel/e820.c 2024-09-14 10:39:57.423551301 > +0800 > +++ linux-x86/arch/x86/kernel/e820.c 2024-09-14 18:56:30.158316496 +0800 > @@ -36,10 +36,8 @@ > * > * - 'e820_table_kexec': a slightly modified (by the kernel) firmware version > * passed to us by the bootloader - the major difference between > - * e820_table_firmware[] and this one is that, the latter marks the > setup_data > - * list created by the EFI boot stub as reserved, so that kexec can reuse > the > - * setup_data information in the second kernel. Besides, e820_table_kexec[] > - * might also be modified by the kexec itself to fake a mptable. > + * e820_table_firmware[] and this one is that e820_table_kexec[] > + * might be modified by the kexec itself to fake a mptable. > * We use this to: LGTM, Acked-by: Baoquan He > * > * - kexec, which is a bootloader in disguise, uses the original E820 > > > ___ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 0/2] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
Hi, On 09/11/24 at 04:16pm, Baoquan He wrote: > Currently, distros like Fedora/RHEL have enabled CONFIG_IMA_KEXEC by > default. This makes kexec/kdump kernel always fail to boot up on SME > platform because of a code bug. By debugging, the root cause is found > out and bug is fixed with this patchset. PING. Can this be added into 6.12 so that SME system is available? Please tell if there's any concern or comment. Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] crash, powerpc: Default to CRASH_DUMP=n on PPC_BOOK3S_32
On 09/17/24 at 12:37pm, Dave Vasilevsky wrote: > Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using > Open Firmware. On these machines, the kernel refuses to boot > from non-zero PHYSICAL_START, which occurs when CRASH_DUMP is on. > > Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should > default to off for them. Users booting via some other mechanism > can still turn it on explicitly. > > Does not change the default on any other architectures for the > time being. > > Signed-off-by: Dave Vasilevsky > Reported-by: Reimar Döffinger > Closes: https://lists.debian.org/debian-powerpc/2024/07/msg1.html > Fixes: 75bc255a7444 ("crash: clean up kdump related config items") > --- > arch/arm/Kconfig | 3 +++ > arch/arm64/Kconfig | 3 +++ > arch/loongarch/Kconfig | 3 +++ > arch/mips/Kconfig | 3 +++ > arch/powerpc/Kconfig | 4 > arch/riscv/Kconfig | 3 +++ > arch/s390/Kconfig | 3 +++ > arch/sh/Kconfig| 3 +++ > arch/x86/Kconfig | 3 +++ > kernel/Kconfig.kexec | 2 +- > 10 files changed, 29 insertions(+), 1 deletion(-) Thanks for the effort. Acked-by: Baoquan He > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 0ec034933cae..4cc31467298b 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -1598,6 +1598,9 @@ config ATAGS_PROC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config AUTO_ZRELADDR > bool "Auto calculation of the decompressed kernel image address" if > !ARCH_MULTIPLATFORM > default !(ARCH_FOOTBRIDGE || ARCH_RPC || ARCH_SA1100) > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index ed15b876fa74..8c67b76347d3 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1559,6 +1559,9 @@ config ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_SIG > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > def_bool CRASH_RESERVE > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 0e3abf7b0bd3..7ba3baee859e 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -600,6 +600,9 @@ config ARCH_SUPPORTS_KEXEC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config ARCH_SELECTS_CRASH_DUMP > def_bool y > depends on CRASH_DUMP > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > index 60077e576935..b547f4304d0c 100644 > --- a/arch/mips/Kconfig > +++ b/arch/mips/Kconfig > @@ -2881,6 +2881,9 @@ config ARCH_SUPPORTS_KEXEC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config PHYSICAL_START > hex "Physical address where the kernel is loaded" > default "0x8400" > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 8a4ee57cd4ef..c04f7bb543cc 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -682,6 +682,10 @@ config RELOCATABLE_TEST > config ARCH_SUPPORTS_CRASH_DUMP > def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) > > +config ARCH_DEFAULT_CRASH_DUMP > + bool > + default y if !PPC_BOOK3S_32 > + > config ARCH_SELECTS_CRASH_DUMP > def_bool y > depends on CRASH_DUMP > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 86d1f1cea571..341ef759870a 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -882,6 +882,9 @@ config ARCH_SUPPORTS_KEXEC_PURGATORY > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > def_bool CRASH_RESERVE > > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > index c60e699e99f5..fff371b89e41 100644 > --- a/arch/s390/Kconfig > +++ b/arch/s390/Kconfig > @@ -275,6 +275,9 @@ config ARCH_SUPPORTS_CRASH_DUMP > This option also enables s390 zfcpdump. > See also > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > menu "Processor type and features" > > config HAVE_MARCH_Z10_FEATURES > diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig > index e9103998cca9..04ff5fb9242e 100644 > --- a/arch/sh/Kconfig > +++ b/arch/sh/Kconfig > @@ -550,6 +550,9 @@ config ARCH_SUPPORTS_KEXEC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool BROKEN_ON_SMP > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y >
Re: [PATCH 1/1] kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
Hi Eric, On 08/16/24 at 07:54am, Eric W. Biederman wrote: > Petr Tesarik writes: > > > From: Petr Tesarik > > > > Fix the condition to exclude the elfcorehdr segment from the SHA digest > > calculation. > > > > The j iterator is an index into the output sha_regions[] array, not into > > the input image->segment[] array. Once it reaches image->elfcorehdr_index, > > all subsequent segments are excluded. Besides, if the purgatory segment > > precedes the elfcorehdr segment, the elfcorehdr may be wrongly included in > > the calculation. > > I would rather make CONFIG_CRASH_HOTPLUG depend on broken. > > The hash is supposed to include everything we depend upon so when > a borken machine corrupts something we can detect that corruption > and not attempt to take a crash dump. > > The elfcorehdr is definitely something that needs to be part of the > hash. > > So please go back to the drawing board and find a way to include the > program header in the hash even with CONFIG_CRASH_HOTPLUG. Thanks for checking this and adding your advice, and sorry for late reply. It's me who suggested Eric DeVolder not adding elfcorehdr into kdump kernel iamge hash during reviewing his patch. I need explain this if people has concern. When I suggested this, what I considered are: 1) The code change will be much simpler. As you can see, later Eric DeVolder's patchset experienced rounds of reviewing and finally merged. Below is his final round: - [PATCH v28 0/8] crash: Kernel handling of CPU and memory hot un/plug 2) The efficiency will be improved very much relative to adding elfcorehdr to the entire hash. When cpu/mem hotplug triggered, we only touch elfcorehdr area, but don't need access the entire loading segments. 3) The elfcorehdr size is very tiny relative to kernel image and initrd. E.g on x86, it's less than 1M, which is tiny relative to dozens of kernel image and initrd. Surely, adding all loading segments into hash is the best. While attracted by above benefits, I tend to not add for the time being. I am open to this, if anyone has concern about the security and is interested in the adding as a kernel project practice in the future, it's welcomed. Here I'd like to request comment from Sourabh since he and other IBM dev added the support to ppc too. Different than generic ARCH, IBM dev can be seen as a end user, maybe we can hear how they evaluate the balance between the risk and benefit. Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH RESEND v2] kexec/crash: no crash update when kexec in progress
On 09/12/24 at 01:33pm, Sourabh Jain wrote: > Hello Baoquan, > > On 11/09/24 19:50, Baoquan He wrote: > > On 09/11/24 at 04:51pm, Sourabh Jain wrote: > > > The following errors are observed when kexec is done with SMT=off on > > > powerpc. > > > > > > [ 358.458385] Removing IBM Power 842 compression device > > > [ 374.795734] kexec_core: Starting new kernel > > > [ 374.795748] kexec: Waking offline cpu 1. > > > [ 374.875695] crash hp: kexec_trylock() failed, elfcorehdr may be > > > inaccurate > > > [ 374.935833] kexec: Waking offline cpu 2. > > > [ 375.015664] crash hp: kexec_trylock() failed, elfcorehdr may be > > > inaccurate > > > snip.. > > > [ 375.515823] kexec: Waking offline cpu 6. > > > [ 375.635667] crash hp: kexec_trylock() failed, elfcorehdr may be > > > inaccurate > > > [ 375.695836] kexec: Waking offline cpu 7. > > > > > > To avoid kexec kernel boot failure on PowerPC, all the present CPUs that > > > are offline are brought online during kexec. For more information, refer > > > to commit e8e5c2155b00 ("powerpc/kexec: Fix orphaned offline CPUs across > > > kexec"). Bringing the CPUs online triggers the crash hotplug handler, > > > crash_handle_hotplug_event(), to update the kdump image. Since the > > > system is on the kexec kernel boot path and the kexec lock is held, the > > > crash_handle_hotplug_event() function fails to acquire the same lock to > > > update the kdump image, resulting in the error messages mentioned above. > > > > > > To fix this, return from crash_handle_hotplug_event() without printing > > > the error message if kexec is in progress. > > > > > > The same applies to the crash_check_hotplug_support() function. Return > > > 0 if kexec is in progress because kernel is not in a position to update > > > the kdump image. > > LGTM, thanks. > > > > Acked-by: Baoquan he > > Thank you for the Ack! > > My understanding is that this patch will go upstream via the linux-next > tree, as it is based on > https://lore.kernel.org/all/20240902034708.88ec1c4c...@smtp.kernel.org/ > which is already part of the linux-next master branch. - Sourabh Jain Then you should mark it as [PATCH linux-next] in subject. Since this patch is in generic code, it needs Andrew's help to pick it. Let's wait and see if Andrew need a new post to change the subject. Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH V2] Documentation: Improve crash_kexec_post_notifiers description
On 09/11/24 at 05:09pm, Guilherme G. Piccoli wrote: > On 02/09/2024 05:05, Baoquan He wrote: > > On 08/30/24 at 03:21pm, Guilherme G. Piccoli wrote: > >> Be more clear about the downsides, the upsides (yes, there are some!) > >> and about code that unconditionally sets that. > >> > >> Reviewed-by: Stephen Brennan > >> Signed-off-by: Guilherme G. Piccoli > >> > >> --- > >> > >> V2: Some wording improvements from Stephen, thanks! > >> Also added his review tag. > >> > >> V1 link: > >> https://lore.kernel.org/r/20240830140401.458542-1-gpicc...@igalia.com/ > >> > >> > >> Documentation/admin-guide/kernel-parameters.txt | 16 ++-- > >> 1 file changed, 10 insertions(+), 6 deletions(-) > >> > >> diff --git a/Documentation/admin-guide/kernel-parameters.txt > >> b/Documentation/admin-guide/kernel-parameters.txt > >> index efc52ddc6864..351730108c58 100644 > >> --- a/Documentation/admin-guide/kernel-parameters.txt > >> +++ b/Documentation/admin-guide/kernel-parameters.txt > >> @@ -913,12 +913,16 @@ > >>the parameter has no effect. > >> > >>crash_kexec_post_notifiers > >> - Run kdump after running panic-notifiers and dumping > >> - kmsg. This only for the users who doubt kdump always > >> - succeeds in any situation. > >> - Note that this also increases risks of kdump failure, > >> - because some panic notifiers can make the crashed > >> - kernel more unstable. > >> + Only jump to kdump kernel after running the panic > >> + notifiers and dumping kmsg. This option increases the > >> + risks of a kdump failure, since some panic notifiers > >> + can make the crashed kernel more unstable. In the > >> + configurations where kdump may not be reliable, > >> + running the panic notifiers can allow collecting more > >> + data on dmesg, like stack traces from other CPUS or > >> + extra data dumped by panic_print. Notice that some > >> + code enables this option unconditionally, like > >> + Hyper-V, PowerPC (fadump) and AMD SEV. > > ~ > > I know Hyper-V enable panic-notifiers by default, but don't remember how > > PowerPC and AMD SEC behave in this aspect. While at it, can you add a > > little more words to state them in log so that people can learn it? > > Thanks. > > > Hi Baoquan, tnx for the suggestion! You mean mention that in the commit > message? If so, I can certainly do - will sent a new version soon(tm) > and include it =) You are right, thanks for the effort. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH RESEND v2] kexec/crash: no crash update when kexec in progress
On 09/11/24 at 04:51pm, Sourabh Jain wrote: > The following errors are observed when kexec is done with SMT=off on > powerpc. > > [ 358.458385] Removing IBM Power 842 compression device > [ 374.795734] kexec_core: Starting new kernel > [ 374.795748] kexec: Waking offline cpu 1. > [ 374.875695] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate > [ 374.935833] kexec: Waking offline cpu 2. > [ 375.015664] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate > snip.. > [ 375.515823] kexec: Waking offline cpu 6. > [ 375.635667] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate > [ 375.695836] kexec: Waking offline cpu 7. > > To avoid kexec kernel boot failure on PowerPC, all the present CPUs that > are offline are brought online during kexec. For more information, refer > to commit e8e5c2155b00 ("powerpc/kexec: Fix orphaned offline CPUs across > kexec"). Bringing the CPUs online triggers the crash hotplug handler, > crash_handle_hotplug_event(), to update the kdump image. Since the > system is on the kexec kernel boot path and the kexec lock is held, the > crash_handle_hotplug_event() function fails to acquire the same lock to > update the kdump image, resulting in the error messages mentioned above. > > To fix this, return from crash_handle_hotplug_event() without printing > the error message if kexec is in progress. > > The same applies to the crash_check_hotplug_support() function. Return > 0 if kexec is in progress because kernel is not in a position to update > the kdump image. LGTM, thanks. Acked-by: Baoquan he > > Cc: Hari Bathini > Cc: Michael Ellerman > Cc: kexec@lists.infradead.org > Cc: linuxppc-...@lists.ozlabs.org > Cc: linux-ker...@vger.kernel.org > Cc: x...@kernel.org > Reported-by: Sachin P Bappalige > Signed-off-by: Sourabh Jain > --- > Changelog: > > Since v1: > - Keep the kexec_in_progress check within kexec_trylock() - Baoquan He > - Include the reason why PowerPC brings offline CPUs online >during the kexec kernel boot path - Baoquan He > - Rebased on top of #next-20240910 to avoid conflict with the patch below > > https://lore.kernel.org/all/20240812041651.703156-1-sourabhj...@linux.ibm.com/T/#u > > V2 RESEND: > - Update linuxppc-dev mailing list address > --- > kernel/crash_core.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index c1048893f4b6..078fe5bc5a74 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -505,7 +505,8 @@ int crash_check_hotplug_support(void) > crash_hotplug_lock(); > /* Obtain lock while reading crash information */ > if (!kexec_trylock()) { > - pr_info("kexec_trylock() failed, kdump image may be > inaccurate\n"); > + if (!kexec_in_progress) > + pr_info("kexec_trylock() failed, kdump image may be > inaccurate\n"); > crash_hotplug_unlock(); > return 0; > } > @@ -547,7 +548,8 @@ static void crash_handle_hotplug_event(unsigned int > hp_action, unsigned int cpu, > crash_hotplug_lock(); > /* Obtain lock while changing crash information */ > if (!kexec_trylock()) { > - pr_info("kexec_trylock() failed, kdump image may be > inaccurate\n"); > + if (!kexec_in_progress) > + pr_info("kexec_trylock() failed, kdump image may be > inaccurate\n"); > crash_hotplug_unlock(); > return; > } > -- > 2.46.0 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v3 1/2] x86/mm: rename the confusing local variable in early_memremap_is_setup_data()
In function early_memremap_is_setup_data(), parameter 'size' passed has the same name as the local variable inside the while loop. That confuses people who sometime mix up them when reading code. Here rename the local variable 'size' inside while loop to 'sd_size'. And also add one local variable 'sd_size' likewise in function memremap_is_setup_data() to simplify code. In later patch, this can also be used. Signed-off-by: Baoquan He Acked-by: Tom Lendacky --- arch/x86/mm/ioremap.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index aa7d279321ea..f1ee8822ddf1 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -640,7 +640,7 @@ static bool memremap_is_setup_data(resource_size_t phys_addr, paddr = boot_params.hdr.setup_data; while (paddr) { - unsigned int len; + unsigned int len, sd_size; if (phys_addr == paddr) return true; @@ -652,6 +652,8 @@ static bool memremap_is_setup_data(resource_size_t phys_addr, return false; } + sd_size = sizeof(*data); + paddr_next = data->next; len = data->len; @@ -662,7 +664,9 @@ static bool memremap_is_setup_data(resource_size_t phys_addr, if (data->type == SETUP_INDIRECT) { memunmap(data); - data = memremap(paddr, sizeof(*data) + len, + + sd_size += len; + data = memremap(paddr, sd_size, MEMREMAP_WB | MEMREMAP_DEC); if (!data) { pr_warn("failed to memremap indirect setup_data\n"); @@ -701,7 +705,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, paddr = boot_params.hdr.setup_data; while (paddr) { - unsigned int len, size; + unsigned int len, sd_size; if (phys_addr == paddr) return true; @@ -712,7 +716,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, return false; } - size = sizeof(*data); + sd_size = sizeof(*data); paddr_next = data->next; len = data->len; @@ -723,9 +727,9 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, } if (data->type == SETUP_INDIRECT) { - size += len; + sd_size += len; early_memunmap(data, sizeof(*data)); - data = early_memremap_decrypted(paddr, size); + data = early_memremap_decrypted(paddr, sd_size); if (!data) { pr_warn("failed to early memremap indirect setup_data\n"); return false; @@ -739,7 +743,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, } } - early_memunmap(data, size); + early_memunmap(data, sd_size); if ((phys_addr > paddr) && (phys_addr < (paddr + len))) return true; -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v3 0/2] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
Currently, distros like Fedora/RHEL have enabled CONFIG_IMA_KEXEC by default. This makes kexec/kdump kernel always fail to boot up on SME platform because of a code bug. By debugging, the root cause is found out and bug is fixed with this patchset. Changelog: v2->v3: === - Add how the miscaculation is caused into patch 2 log according to Tom's suggestion. - Add Tom's tag. v1->v2: === - Add patch 1 to clean up the confusing local varibale naming because people may mix up the local variable 'size' with the passed in parameter in function early_memremap_is_setup_data(). Suggested by Dave and Tom during v1 patch reviewing. Baoquan He (2): x86/mm: rename the confusing local variable in early_memremap_is_setup_data() x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y arch/x86/mm/ioremap.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v3 2/2] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
Recently, it's reported that kdump kernel is broken during bootup on SME system when CONFIG_IMA_KEXEC=y. When debugging, I noticed this can be traced back to commit ("b69a2afd5afc x86/kexec: Carry forward IMA measurement log on kexec"). Just nobody ever tested it on SME system when enabling CONFIG_IMA_KEXEC. -- ima: No TPM chip found, activating TPM-bypass! Loading compiled-in module X.509 certificates Loaded X.509 cert 'Build time autogenerated kernel key: 18ae0bc7e79b64700122bb1d6a904b070fef2656' ima: Allocated hash algorithm: sha256 Oops: general protection fault, probably for non-canonical address 0xcfacfdfe6660003e: [#1] PREEMPT SMP NOPTI CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc2+ #14 Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.20.0 05/03/2023 RIP: 0010:ima_restore_measurement_list+0xdc/0x420 Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 RSP: 0018:c9053c80 EFLAGS: 00010286 RAX: RBX: c9053d03 RCX: RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 RBP: c9053d80 R08: R09: 82de1a88 R10: c9053da0 R11: 0003 R12: 01a4 R13: c9053df0 R14: R15: FS: () GS:88804020() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 Call Trace: ? show_trace_log_lvl+0x1b0/0x2f0 ? show_trace_log_lvl+0x1b0/0x2f0 ? ima_load_kexec_buffer+0x6e/0xf0 ? __die_body.cold+0x8/0x12 ? die_addr+0x3c/0x60 ? exc_general_protection+0x178/0x410 ? asm_exc_general_protection+0x26/0x30 ? ima_restore_measurement_list+0xdc/0x420 ? vprintk_emit+0x1f0/0x270 ? ima_load_kexec_buffer+0x6e/0xf0 ima_load_kexec_buffer+0x6e/0xf0 ima_init+0x52/0xb0 ? __pfx_init_ima+0x10/0x10 init_ima+0x26/0xc0 ? __pfx_init_ima+0x10/0x10 do_one_initcall+0x5b/0x300 do_initcalls+0xdf/0x100 ? __pfx_kernel_init+0x10/0x10 kernel_init_freeable+0x147/0x1a0 kernel_init+0x1a/0x140 ret_from_fork+0x34/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Modules linked in: ---[ end trace ]--- RIP: 0010:ima_restore_measurement_list+0xdc/0x420 Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 RSP: 0018:c9053c80 EFLAGS: 00010286 RAX: RBX: c9053d03 RCX: RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 RBP: c9053d80 R08: R09: 82de1a88 R10: c9053da0 R11: 0003 R12: 01a4 R13: c9053df0 R14: R15: FS: () GS:88804020() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Rebooting in 10 seconds.. -- >From debugging printing, the stored addr and size of ima_kexec buffer are not decrypted correctly like: -- ima: ima_load_kexec_buffer, buffer:0xcfacfdfe6660003e, size:0xe48066052d5df359 -- There are three pieces of setup_data info passed to kexec/kdump kernel: SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED. However, among them, only ima_kexec buffer suffered from the incorrect decryption. After debugging, it's because of a code bug in early_memremap_is_setup_data() where checking the embedded content inside setup_data takes wrong range calculation. The "len" variable in struct setup_data is the length of the "data" field and does not include the size of the struct, which is the reason for the miscalculation. In this case, the length of efi data, rng_seed and ima_kexec are 0x70, 0x20, 0x10, and the length of setup_data is 0x10. When checking if data is inside the embedded conent of setup_data, the starting address of efi data and rng_seed happened to land in the wrong calculated range. While the ima_kexec's starting address unluckily doesn't pass the checking, then error occurred. Here fix the code bug to make kexec/kdump kernel boot up successfully. And also fix the similar buggy code in memremap_is_setup_data() which are found out during code reviewing. Fixes: b3c72fc9a78e ("x86/boot: Introduce setup_indirect") Signed-off-by: Baoquan He Acked-by: Tom Lendacky --- arch/x86/mm/ioremap.c | 4 ++-- 1 file changed, 2 inse
Re: [PATCH v2 2/2] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
On 09/09/24 at 09:53am, Tom Lendacky wrote: > On 8/29/24 05:40, Baoquan He wrote: ..snip.. > > Here fix the code bug to make kexec/kdump kernel boot up successfully. > > > > And also fix the similar buggy code in memremap_is_setup_data() which > > are found out during code reviewing. > > I think you should add something along the lines that the "len" variable > in struct setup_data is the length of the "data" field and does not > include the size of the struct, which is the reason for the miscalculation. Fair enough. I will send v3 to add the reason of miscalculation. > > Otherwise: > > Reviewed-by: Tom Lendacky Thanks a lot for careful checking. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec/crash: no crash update when kexec in progress
On 09/09/24 at 10:35am, Sourabh Jain wrote: > > > On 08/09/24 16:00, Baoquan He wrote: > > On 09/05/24 at 02:07pm, Sourabh Jain wrote: > > > Hello Baoquan, > > > > > > On 05/09/24 08:53, Baoquan He wrote: > > > > On 09/04/24 at 02:55pm, Sourabh Jain wrote: > > > > > Hello Baoquan, > > > > > > > > > > On 30/08/24 16:47, Baoquan He wrote: > > > > > > On 08/20/24 at 12:10pm, Sourabh Jain wrote: > > > > > > > Hello Baoquan, > > > > > > > > > > > ..snip... > > > > > > > 2. A patch to return early from the > > > > > > > `crash_handle_hotplug_event()` function > > > > > > > if `kexec_in_progress` is > > > > > > >  set to True. This is essentially my original patch. > > > > > > There's a race gap between the kexec_in_progress checking and the > > > > > > setting it to true which Michael has mentioned. > > > > > The window where kernel is holding kexec_lock to do kexec boot > > > > > but kexec_in_progress is yet not set to True. > > > > > > > > > > If kernel needs to handle crash hotplug event, the function > > > > > crash_handle_hotplug_event() will not get the kexec_lock and > > > > > error out by printing error message about not able to update > > > > > kdump image. > > > > But you wanted to avoid the erroring out if it's being in > > > > kernel_kexec(). Now you are seeing at least one the noising > > > > message, aren't you? > > > Yes, but it is very rare to encounter. > > > > > > My comments on your updated code are inline below. > > > > > > > > I think it should be fine. Given that lock is already taken for > > > > > kexec kernel boot. > > > > > > > > > > Am I missing something major? > > > > > > > > > > > That's why I think > > > > > > maybe checking kexec_in_progress after failing to retriving > > > > > > __kexec_lock is a little better, not very sure. > > > > > Try for kexec lock before kexec_in_progress check will not solve > > > > > the original problem this patch trying to solve. > > > > > > > > > > You proposed the below changes earlier: > > > > > > > > > > - if (!kexec_trylock()) { > > > > > + if (!kexec_trylock() && kexec_in_progress) { > > > > > pr_info("kexec_trylock() failed, elfcorehdr may be > > > > > inaccurate\n"); > > > > > crash_hotplug_unlock(); > > > > Ah, I meant as below, but wrote it mistakenly. > > > > > > > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > > > > index 63cf89393c6e..e7c7aa761f46 100644 > > > > --- a/kernel/crash_core.c > > > > +++ b/kernel/crash_core.c > > > > @@ -504,7 +504,7 @@ int crash_check_hotplug_support(void) > > > > crash_hotplug_lock(); > > > > /* Obtain lock while reading crash information */ > > > > - if (!kexec_trylock()) { > > > > + if (!kexec_trylock() && !kexec_in_progress) { > > > > pr_info("kexec_trylock() failed, elfcorehdr may be > > > > inaccurate\n"); > > > > crash_hotplug_unlock(); > > > > return 0; > > > > > > > > > > > > > Once the kexec_in_progress is set to True there is no way one can get > > > > > kexec_lock. So kexec_trylock() before kexec_in_progress is not helpful > > > > > for the problem I am trying to solve. > > > > With your patch, you could still get the error message if the race gap > > > > exist. With above change, you won't get it. Please correct me if I am > > > > wrong. > > > The above code will print an error message during the race gap. Here's > > > why: > > > > > > Let’s say the kexec lock is acquired in the kernel_kexec() function, > > > but kexec_in_progress is not yet set to True. In this scenario, the code > > > will print > > > an error message. > > > > > > There is another issue I see with the above code: > > > > > > Consider that the system is on the kexec kernel boot path, and
Re: [PATCH] crash: Default to CRASH_DUMP=n when support for it is unlikely
On 09/08/24 at 03:57pm, Dave Vasilevsky wrote: > I received a notification from Patchwork that my patch is now in the state > "Handled Elsewhere".[0] Does that mean someone merged it somewhere? Or that I > should be using a different mailing list? Or something else? I guess it's powerpc dev's patchwork which automatically grabs this patch to do some testing? Becuase ppc list is added in the CC. I don't think this patch has been picked by people because this is an old v1 and there's concern about it. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec/crash: no crash update when kexec in progress
On 09/05/24 at 02:07pm, Sourabh Jain wrote: > Hello Baoquan, > > On 05/09/24 08:53, Baoquan He wrote: > > On 09/04/24 at 02:55pm, Sourabh Jain wrote: > > > Hello Baoquan, > > > > > > On 30/08/24 16:47, Baoquan He wrote: > > > > On 08/20/24 at 12:10pm, Sourabh Jain wrote: > > > > > Hello Baoquan, > > > > > > > ..snip... > > > > > 2. A patch to return early from the `crash_handle_hotplug_event()` > > > > > function > > > > > if `kexec_in_progress` is > > > > >   set to True. This is essentially my original patch. > > > > There's a race gap between the kexec_in_progress checking and the > > > > setting it to true which Michael has mentioned. > > > The window where kernel is holding kexec_lock to do kexec boot > > > but kexec_in_progress is yet not set to True. > > > > > > If kernel needs to handle crash hotplug event, the function > > > crash_handle_hotplug_event() will not get the kexec_lock and > > > error out by printing error message about not able to update > > > kdump image. > > But you wanted to avoid the erroring out if it's being in > > kernel_kexec(). Now you are seeing at least one the noising > > message, aren't you? > > Yes, but it is very rare to encounter. > > My comments on your updated code are inline below. > > > > > > I think it should be fine. Given that lock is already taken for > > > kexec kernel boot. > > > > > > Am I missing something major? > > > > > > > That's why I think > > > > maybe checking kexec_in_progress after failing to retriving > > > > __kexec_lock is a little better, not very sure. > > > Try for kexec lock before kexec_in_progress check will not solve > > > the original problem this patch trying to solve. > > > > > > You proposed the below changes earlier: > > > > > > - if (!kexec_trylock()) { > > > + if (!kexec_trylock() && kexec_in_progress) { > > > pr_info("kexec_trylock() failed, elfcorehdr may be > > > inaccurate\n"); > > > crash_hotplug_unlock(); > > Ah, I meant as below, but wrote it mistakenly. > > > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > > index 63cf89393c6e..e7c7aa761f46 100644 > > --- a/kernel/crash_core.c > > +++ b/kernel/crash_core.c > > @@ -504,7 +504,7 @@ int crash_check_hotplug_support(void) > > crash_hotplug_lock(); > > /* Obtain lock while reading crash information */ > > - if (!kexec_trylock()) { > > + if (!kexec_trylock() && !kexec_in_progress) { > > pr_info("kexec_trylock() failed, elfcorehdr may be > > inaccurate\n"); > > crash_hotplug_unlock(); > > return 0; > > > > > > > > > > Once the kexec_in_progress is set to True there is no way one can get > > > kexec_lock. So kexec_trylock() before kexec_in_progress is not helpful > > > for the problem I am trying to solve. > > With your patch, you could still get the error message if the race gap > > exist. With above change, you won't get it. Please correct me if I am > > wrong. > > The above code will print an error message during the race gap. Here's why: > > Let’s say the kexec lock is acquired in the kernel_kexec() function, > but kexec_in_progress is not yet set to True. In this scenario, the code > will print > an error message. > > There is another issue I see with the above code: > > Consider that the system is on the kexec kernel boot path, and > kexec_in_progress > is set to True. If crash_hotplug_unlock() is called, the kernel will not > only update > the kdump image without acquiring the kexec lock, but it will also release > the > kexec lock in the out label. I believe this is incorrect. > > Please share your thoughts. How about this? diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 63cf89393c6e..8ba7b1da0ded 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -505,7 +505,8 @@ int crash_check_hotplug_support(void) crash_hotplug_lock(); /* Obtain lock while reading crash information */ if (!kexec_trylock()) { - pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); + if (!kexec_in_progress) + pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock(); return 0; } @@ -540,7 +541,8 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, crash_hotplug_lock(); /* Obtain lock while changing crash information */ if (!kexec_trylock()) { - pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); + if (!kexec_in_progress) + pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock(); return; } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] [RFC] crash: Lock-free crash hotplug support reporting
On 09/07/24 at 10:30am, Sourabh Jain wrote: > Hello Baoquan, > > Do you think this patch would help reduce lock contention when > CPU/Memory resources are removed in bulk from a system? .snip... -- > > include/linux/kexec.h | 11 --- > > kernel/crash_core.c | 27 +-- > > kernel/kexec.c| 5 - > > kernel/kexec_file.c | 7 ++- > > 4 files changed, 23 insertions(+), 27 deletions(-) > > > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h > > index f0e9f8eda7a3..bd755ba6bac4 100644 > > --- a/include/linux/kexec.h > > +++ b/include/linux/kexec.h > > @@ -318,13 +318,6 @@ struct kimage { > > unsigned int preserve_context : 1; > > /* If set, we are using file mode kexec syscall */ > > unsigned int file_mode:1; > > -#ifdef CONFIG_CRASH_HOTPLUG > > - /* If set, it is safe to update kexec segments that are > > -* excluded from SHA calculation. > > -*/ > > - unsigned int hotplug_support:1; > > -#endif > > - > > #ifdef ARCH_HAS_KIMAGE_ARCH > > struct kimage_arch arch; > > #endif > > @@ -370,6 +363,10 @@ struct kimage { > > unsigned long elf_load_addr; > > }; > > +#ifdef CONFIG_CRASH_HOTPLUG > > +extern unsigned int crash_hotplug_support; > > +#endif > > + > > /* kexec interface functions */ > > extern void machine_kexec(struct kimage *image); > > extern int machine_kexec_prepare(struct kimage *image); > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > > index 63cf89393c6e..3428deba0070 100644 > > --- a/kernel/crash_core.c > > +++ b/kernel/crash_core.c > > @@ -30,6 +30,13 @@ > > #include "kallsyms_internal.h" > > #include "kexec_internal.h" > > +#ifdef CONFIG_CRASH_HOTPLUG > > +/* if set, it is safe to update kexec segments that are > > + * excluded from sha calculation. > > + */ > > +unsigned int crash_hotplug_support; > > +#endif > > + > > /* Per cpu memory for storing cpu states in case of system crash. */ > > note_buf_t __percpu *crash_notes; > > @@ -500,23 +507,7 @@ static DEFINE_MUTEX(__crash_hotplug_lock); > >*/ > > int crash_check_hotplug_support(void) > > { > > - int rc = 0; > > - > > - crash_hotplug_lock(); > > - /* Obtain lock while reading crash information */ > > - if (!kexec_trylock()) { > > - pr_info("kexec_trylock() failed, elfcorehdr may be > > inaccurate\n"); > > - crash_hotplug_unlock(); > > - return 0; > > - } > > - if (kexec_crash_image) { > > - rc = kexec_crash_image->hotplug_support; > > - } > > - /* Release lock now that update complete */ > > - kexec_unlock(); > > - crash_hotplug_unlock(); > > - > > - return rc; > > + return crash_hotplug_support; I may not understand this well. Both kexec_load and kexec_file_load set hotplug_support, crash_check_hotplug_support and crash_handle_hotplug_event are to check the flag. How do you guarantee the cpu/memory sysfs checking won't have race with kexec_load and kexec_file_load? And here I see taking crash_hotplug_lock() is unnecessary in crash_check_hotplug_support() because it does't have race with crash_handle_hotplug_event(). > > } > > /* > > @@ -552,7 +543,7 @@ static void crash_handle_hotplug_event(unsigned int > > hp_action, unsigned int cpu, > > image = kexec_crash_image; > > /* Check that kexec segments update is permitted */ > > - if (!image->hotplug_support) > > + if (!crash_hotplug_support) > > goto out; > > if (hp_action == KEXEC_CRASH_HP_ADD_CPU || > > diff --git a/kernel/kexec.c b/kernel/kexec.c > > index a6b3f96bb50c..d5c6b51eaa8b 100644 > > --- a/kernel/kexec.c > > +++ b/kernel/kexec.c > > @@ -116,6 +116,9 @@ static int do_kexec_load(unsigned long entry, unsigned > > long nr_segments, > > /* Uninstall image */ > > kimage_free(xchg(dest_image, NULL)); > > ret = 0; > > +#ifdef CONFIG_CRASH_HOTPLUG > > + crash_hotplug_support = 0; > > +#endif > > goto out_unlock; > > } > > if (flags & KEXEC_ON_CRASH) { > > @@ -136,7 +139,7 @@ static int do_kexec_load(unsigned long entry, unsigned > > long nr_segments, > > #ifdef CONFIG_CRASH_HOTPLUG > > if ((flags & KEXEC_ON_CRASH) && arch_crash_hotplug_support(image, > > flags)) > > - image->hotplug_support = 1; > > + crash_hotplug_support = 1; > > #endif > > ret = machine_kexec_prepare(image); > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > > index 3d64290d24c9..b326edb90fd7 100644 > > --- a/kernel/kexec_file.c > > +++ b/kernel/kexec_file.c > > @@ -378,7 +378,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, > > initrd_fd, > > #ifdef CONFIG_CRASH_HOTPLUG > > if ((flags & KEXEC_FILE_ON_CRASH) && arch_crash_hotplug_support(image, > > flags)) > > - image->hotplug_support = 1; > > + crash_hotplug_support = 1; > > #endif > > ret = machine_kexec_prepare(image); > > @@ -432,6 +432,11 @@ SYSCALL_DEFINE5(kexec_file_load,
Re: [PATCH] kexec/crash: no crash update when kexec in progress
On 09/04/24 at 02:55pm, Sourabh Jain wrote: > Hello Baoquan, > > On 30/08/24 16:47, Baoquan He wrote: > > On 08/20/24 at 12:10pm, Sourabh Jain wrote: > > > Hello Baoquan, > > > ..snip... > > > 2. A patch to return early from the `crash_handle_hotplug_event()` > > > function > > > if `kexec_in_progress` is > > > Â Â set to True. This is essentially my original patch. > > There's a race gap between the kexec_in_progress checking and the > > setting it to true which Michael has mentioned. > > The window where kernel is holding kexec_lock to do kexec boot > but kexec_in_progress is yet not set to True. > > If kernel needs to handle crash hotplug event, the function > crash_handle_hotplug_event()Â will not get the kexec_lock and > error out by printing error message about not able to update > kdump image. But you wanted to avoid the erroring out if it's being in kernel_kexec(). Now you are seeing at least one the noising message, aren't you? > > I think it should be fine. Given that lock is already taken for > kexec kernel boot. > > Am I missing something major? > > > That's why I think > > maybe checking kexec_in_progress after failing to retriving > > __kexec_lock is a little better, not very sure. > > Try for kexec lock before kexec_in_progress check will not solve > the original problem this patch trying to solve. > > You proposed the below changes earlier: > > - if (!kexec_trylock()) { > + if (!kexec_trylock() && kexec_in_progress) { > pr_info("kexec_trylock() failed, elfcorehdr may be > inaccurate\n"); > crash_hotplug_unlock(); Ah, I meant as below, but wrote it mistakenly. diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 63cf89393c6e..e7c7aa761f46 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -504,7 +504,7 @@ int crash_check_hotplug_support(void) crash_hotplug_lock(); /* Obtain lock while reading crash information */ - if (!kexec_trylock()) { + if (!kexec_trylock() && !kexec_in_progress) { pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock(); return 0; > > > Once the kexec_in_progress is set to True there is no way one can get > kexec_lock. So kexec_trylock() before kexec_in_progress is not helpful > for the problem I am trying to solve. With your patch, you could still get the error message if the race gap exist. With above change, you won't get it. Please correct me if I am wrong. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] Add support for soft reserved memory range
On 09/03/24 at 09:47am, Jacek Tomaka wrote: > Hi Boaquan, > > > Wondering what use cases you have encountered and want to use this patch > > to resolve. Could you say more about it? > > Sure, we are using genders to store kernel arguments for each machine > so that they are version controlled. > In order to apply them we boot buildroot kernel, download destination > kernel, read arguments to be passed, > then kexec. > > In general it works fine, but some of our machines are Sapphire Rapids > Max and without kexec understanding soft > reservations we end up with 2GB less total memory, which in general > would not be a big deal but I think it is the > HBM memory that is wasted. Ok, thanks for these details. So you are using kexec_load interface, does kexec_file_load interface work well in this case? > > > On Fri, Aug 30, 2024 at 10:45 PM Baoquan He wrote: > > > > Hi, > > > > On 08/14/24 at 01:33pm, Jacek Tomaka wrote: > > > Essentially catch up with e820 related changes in the kernel. > > > Intel Sapphire Rappids MAX has high bandwidth memory which is > > > precious resource that is better not allocated by the kernel. > > > > Wondering what use cases you have encountered and want to use this patch > > to resolve. Could you say more about it? > > > > > > > > Userspace later can enable soft reserved range using daxctl. > > > > > > Signed-off-by: Jacek Tomaka > > > --- > > > include/x86/x86-linux.h | 2 ++ > > > kexec/arch/i386/crashdump-x86.c | 7 +++ > > > kexec/arch/i386/kexec-multiboot-x86.c | 1 + > > > kexec/arch/i386/kexec-x86-common.c| 5 + > > > kexec/arch/i386/x86-linux-setup.c | 3 +++ > > > kexec/firmware_memmap.c | 2 ++ > > > kexec/kexec.h | 1 + > > > 7 files changed, 21 insertions(+) > > > > > > diff --git a/include/x86/x86-linux.h b/include/x86/x86-linux.h > > > index 9646102835..fbde93df94 100644 > > > --- a/include/x86/x86-linux.h > > > +++ b/include/x86/x86-linux.h > > > @@ -23,6 +23,8 @@ struct e820entry { > > > #define E820_NVS 4 > > > #define E820_PMEM 7 > > > #define E820_PRAM 12 > > > +#define E820_SOFT_RESERVED 0xefff > > > + > > > } __attribute__((packed)); > > > #endif > > > > > > diff --git a/kexec/arch/i386/crashdump-x86.c > > > b/kexec/arch/i386/crashdump-x86.c > > > index a01031e570..49108b2032 100644 > > > --- a/kexec/arch/i386/crashdump-x86.c > > > +++ b/kexec/arch/i386/crashdump-x86.c > > > @@ -288,6 +288,10 @@ static int get_crash_memory_ranges(struct > > > memory_range **range, int *ranges, > > > type = RANGE_RESERVED; > > > } else if (memcmp(str, "Reserved\n", 9) == 0) { > > > type = RANGE_RESERVED; > > > + } else if (memcmp(str, "soft reserved\n", 14) == 0 ) { > > > + type = RANGE_SOFT_RESERVED; > > > + } else if (memcmp(str, "Soft Reserved\n", 14) == 0 ) { > > > + type = RANGE_SOFT_RESERVED; > > > } else if (memcmp(str, "GART\n", 5) == 0) { > > > gart_start = start; > > > gart_end = end; > > > @@ -615,6 +619,8 @@ static void cmdline_add_memmap_internal(char > > > *cmdline, unsigned long startk, > > > strcat (str_mmap, "K@"); > > > else if (type == RANGE_RESERVED) > > > strcat (str_mmap, "K$"); > > > + else if (type == RANGE_SOFT_RESERVED) > > > + strcat (str_mmap, "K*"); > > > else if (type == RANGE_ACPI || type == RANGE_ACPI_NVS) > > > strcat (str_mmap, "K#"); > > > else if (type == RANGE_PRAM) > > > @@ -985,6 +991,7 @@ int load_crashdump_segments(struct kexec_info *info, > > > char* mod_cmdline, > > > if ( !( mem_range[i].type == RANGE_ACPI > > > || mem_range[i].type == RANGE_ACPI_NVS > > > || mem_range[i].type == RANGE_RESERVED > > > + || mem_range[i].type == RANGE_SOFT_RESERVED > > > || mem_range[i].type == RANGE_PMEM > > > || mem_range[i].type == RANGE_PRAM)) > > > continue; > > > d
Re: [PATCH V2] Documentation: Improve crash_kexec_post_notifiers description
On 08/30/24 at 03:21pm, Guilherme G. Piccoli wrote: > Be more clear about the downsides, the upsides (yes, there are some!) > and about code that unconditionally sets that. > > Reviewed-by: Stephen Brennan > Signed-off-by: Guilherme G. Piccoli > > --- > > V2: Some wording improvements from Stephen, thanks! > Also added his review tag. > > V1 link: > https://lore.kernel.org/r/20240830140401.458542-1-gpicc...@igalia.com/ > > > Documentation/admin-guide/kernel-parameters.txt | 16 ++-- > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index efc52ddc6864..351730108c58 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -913,12 +913,16 @@ > the parameter has no effect. > > crash_kexec_post_notifiers > - Run kdump after running panic-notifiers and dumping > - kmsg. This only for the users who doubt kdump always > - succeeds in any situation. > - Note that this also increases risks of kdump failure, > - because some panic notifiers can make the crashed > - kernel more unstable. > + Only jump to kdump kernel after running the panic > + notifiers and dumping kmsg. This option increases the > + risks of a kdump failure, since some panic notifiers > + can make the crashed kernel more unstable. In the > + configurations where kdump may not be reliable, > + running the panic notifiers can allow collecting more > + data on dmesg, like stack traces from other CPUS or > + extra data dumped by panic_print. Notice that some > + code enables this option unconditionally, like > + Hyper-V, PowerPC (fadump) and AMD SEV. ~ I know Hyper-V enable panic-notifiers by default, but don't remember how PowerPC and AMD SEC behave in this aspect. While at it, can you add a little more words to state them in log so that people can learn it? Thanks. > > crashkernel=size[KMG][@offset[KMG]] > [KNL,EARLY] Using kexec, Linux can switch to a 'crash > kernel' > -- > 2.46.0 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] Add support for soft reserved memory range
Hi, On 08/14/24 at 01:33pm, Jacek Tomaka wrote: > Essentially catch up with e820 related changes in the kernel. > Intel Sapphire Rappids MAX has high bandwidth memory which is > precious resource that is better not allocated by the kernel. Wondering what use cases you have encountered and want to use this patch to resolve. Could you say more about it? > > Userspace later can enable soft reserved range using daxctl. > > Signed-off-by: Jacek Tomaka > --- > include/x86/x86-linux.h | 2 ++ > kexec/arch/i386/crashdump-x86.c | 7 +++ > kexec/arch/i386/kexec-multiboot-x86.c | 1 + > kexec/arch/i386/kexec-x86-common.c| 5 + > kexec/arch/i386/x86-linux-setup.c | 3 +++ > kexec/firmware_memmap.c | 2 ++ > kexec/kexec.h | 1 + > 7 files changed, 21 insertions(+) > > diff --git a/include/x86/x86-linux.h b/include/x86/x86-linux.h > index 9646102835..fbde93df94 100644 > --- a/include/x86/x86-linux.h > +++ b/include/x86/x86-linux.h > @@ -23,6 +23,8 @@ struct e820entry { > #define E820_NVS 4 > #define E820_PMEM 7 > #define E820_PRAM 12 > +#define E820_SOFT_RESERVED 0xefff > + > } __attribute__((packed)); > #endif > > diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c > index a01031e570..49108b2032 100644 > --- a/kexec/arch/i386/crashdump-x86.c > +++ b/kexec/arch/i386/crashdump-x86.c > @@ -288,6 +288,10 @@ static int get_crash_memory_ranges(struct memory_range > **range, int *ranges, > type = RANGE_RESERVED; > } else if (memcmp(str, "Reserved\n", 9) == 0) { > type = RANGE_RESERVED; > + } else if (memcmp(str, "soft reserved\n", 14) == 0 ) { > + type = RANGE_SOFT_RESERVED; > + } else if (memcmp(str, "Soft Reserved\n", 14) == 0 ) { > + type = RANGE_SOFT_RESERVED; > } else if (memcmp(str, "GART\n", 5) == 0) { > gart_start = start; > gart_end = end; > @@ -615,6 +619,8 @@ static void cmdline_add_memmap_internal(char *cmdline, > unsigned long startk, > strcat (str_mmap, "K@"); > else if (type == RANGE_RESERVED) > strcat (str_mmap, "K$"); > + else if (type == RANGE_SOFT_RESERVED) > + strcat (str_mmap, "K*"); > else if (type == RANGE_ACPI || type == RANGE_ACPI_NVS) > strcat (str_mmap, "K#"); > else if (type == RANGE_PRAM) > @@ -985,6 +991,7 @@ int load_crashdump_segments(struct kexec_info *info, > char* mod_cmdline, > if ( !( mem_range[i].type == RANGE_ACPI > || mem_range[i].type == RANGE_ACPI_NVS > || mem_range[i].type == RANGE_RESERVED > + || mem_range[i].type == RANGE_SOFT_RESERVED > || mem_range[i].type == RANGE_PMEM > || mem_range[i].type == RANGE_PRAM)) > continue; > diff --git a/kexec/arch/i386/kexec-multiboot-x86.c > b/kexec/arch/i386/kexec-multiboot-x86.c > index 33c885a2fa..49d57cb5ae 100644 > --- a/kexec/arch/i386/kexec-multiboot-x86.c > +++ b/kexec/arch/i386/kexec-multiboot-x86.c > @@ -379,6 +379,7 @@ int multiboot_x86_load(int argc, char **argv, const char > *buf, off_t len, > mmap[i].Type = 4; > break; > case RANGE_RESERVED: > + case RANGE_SOFT_RESERVED: > default: > mmap[i].Type = 2; /* Not RAM (reserved) */ > } > diff --git a/kexec/arch/i386/kexec-x86-common.c > b/kexec/arch/i386/kexec-x86-common.c > index ffc95a9e43..116c4f4fd3 100644 > --- a/kexec/arch/i386/kexec-x86-common.c > +++ b/kexec/arch/i386/kexec-x86-common.c > @@ -99,6 +99,9 @@ static int get_memory_ranges_proc_iomem(struct memory_range > **range, int *ranges > else if (strncasecmp(str, "reserved\n", 9) == 0) { > type = RANGE_RESERVED; > } > + else if (strncasecmp(str, "soft reserved\n", 9) == 0) { > + type = RANGE_SOFT_RESERVED; > + } > else if (memcmp(str, "ACPI Tables\n", 12) == 0) { > type = RANGE_ACPI; > } > @@ -170,6 +173,8 @@ unsigned xen_e820_to_kexec_type(uint32_t type) > return RANGE_PMEM; > case E820_PRAM: > return RANGE_PRAM; > + case E820_SOFT_RESERVED; > + return RANGE_SOFT_RESERVED; > case E820_RESERVED: > default: > return RANGE_RESERVED; > diff --git a/kexec/arch/i386/x86-linux-setup.c > b/kexec/arch/i386/x86-linux-setup.c > index 73251b9339..afc83fe729 100644 > --- a/kexec/arch/i386/x86-linux-setup.c > +++ b/kexec/arch/i386/x86-linux-setup.c > @@ -755,6 +755,9 @@ static void add_e820_map_from_mr(struct >
Re: [PATCH] kexec/crash: no crash update when kexec in progress
On 08/20/24 at 12:10pm, Sourabh Jain wrote: > Hello Baoquan, > > On 19/08/24 11:45, Baoquan He wrote: > > On 08/19/24 at 09:45am, Sourabh Jain wrote: > > > Hello Michael and Boaquan > > > > > > On 01/08/24 12:21, Sourabh Jain wrote: > > > > Hello Michael, > > > > > > > > On 01/08/24 08:04, Michael Ellerman wrote: > > > > > Sourabh Jain writes: > > > > > > The following errors are observed when kexec is done with SMT=off on > > > > > > powerpc. > > > > > > > > > > > > [Â 358.458385] Removing IBM Power 842 compression device > > > > > > [Â 374.795734] kexec_core: Starting new kernel > > > > > > [Â 374.795748] kexec: Waking offline cpu 1. > > > > > > [Â 374.875695] crash hp: kexec_trylock() failed, elfcorehdr may > > > > > > be inaccurate > > > > > > [Â 374.935833] kexec: Waking offline cpu 2. > > > > > > [Â 375.015664] crash hp: kexec_trylock() failed, elfcorehdr may > > > > > > be inaccurate > > > > > > snip.. > > > > > > [Â 375.515823] kexec: Waking offline cpu 6. > > > > > > [Â 375.635667] crash hp: kexec_trylock() failed, elfcorehdr may > > > > > > be inaccurate > > > > > > [Â 375.695836] kexec: Waking offline cpu 7. > > > > > Are they actually errors though? Do they block the actual kexec from > > > > > happening? Or are they just warnings in dmesg? > > > > The kexec kernel boots fine. > > > > > > > > This warning appears regardless of whether the kdump kernel is loaded. > > > > > > > > However, when the kdump kernel is loaded, we will not be able to update > > > > the kdump image (FDT). > > > > I think this should be fine given that kexec is in progress. > > > > > > > > Please let me know your opinion. > > > > > > > > > Because the fix looks like it could be racy. > > > > It seems like it is racy, but given that kexec takes the lock first and > > > > then > > > > brings the CPU up, which triggers the kdump image, which always fails to > > > > update the kdump image because it could not take the same lock. > > > > > > > > Note: the kexec lock is not released unless kexec boot fails. > > > Any comments or suggestions on this fix? > > Is this a little better? > > > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > > index 63cf89393c6e..0355ffb712f4 100644 > > --- a/kernel/crash_core.c > > +++ b/kernel/crash_core.c > > @@ -504,7 +504,7 @@ int crash_check_hotplug_support(void) > > crash_hotplug_lock(); > > /* Obtain lock while reading crash information */ > > - if (!kexec_trylock()) { > > + if (!kexec_trylock() && kexec_in_progress) { > > pr_info("kexec_trylock() failed, elfcorehdr may be > > inaccurate\n"); > > crash_hotplug_unlock(); > > return 0; > > @@ -539,7 +539,7 @@ static void crash_handle_hotplug_event(unsigned int > > hp_action, unsigned int cpu, > > crash_hotplug_lock(); > > /* Obtain lock while changing crash information */ > > - if (!kexec_trylock()) { > > + if (!kexec_trylock() && kexec_in_progress) { > > pr_info("kexec_trylock() failed, elfcorehdr may be > > inaccurate\n"); > > crash_hotplug_unlock(); > > return; > > Ideally, when `kexec_in_progress` is True, there should be no way to acquire > the kexec lock. > Therefore, calling `kexec_trylock()` before checking `kexec_in_progress` is > not helpful. > The kernel will print the error message "kexec_trylock() failed, elfcorehdr > may be inaccurate." > So, with the above changes, the original problem remains unsolved. > > However, after closely inspecting the `kernel/kexec_core.c:kernel_kexec()` > function, I discovered > an exceptional case where my patch needs an update. The issue arises when > the system returns > from the `machine_kexec()` function, which indicates that kexec has failed. > > In this scenario, the kexec lock is released, but `kexec_in_progress` > remains True. > > I am unsure why `kexec_in_progress` is NOT set to False when kexec fails. > Was this by design, > or was it an oversight because returning from the `machine_kexec()` function > is highly unlikely? > > Here is my proposal to address the original problem alon
Re: [PATCH] crash: Default to CRASH_DUMP=n when support for it is unlikely
On 08/23/24 at 08:51am, Dave Vasilevsky wrote: > Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using > Open Firmware. On these machines, the kernel refuses to boot > from non-zero PHYSICAL_START, which occurs when CRASH_DUMP is on. > > Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should > default to off for them. Users booting via some other mechanism > can still turn it on explicitly. > > Also defaults to CRASH_DUMP=n on sh. The overrall looks good to me except of the CRASH_DUMP=n on sh, do you have a comment about the reasoning since you have discussed with John? Is it because of below config items? arch/sh/Kconfig: config ARCH_SUPPORTS_CRASH_DUMP def_bool BROKEN_ON_SMP ... config PHYSICAL_START hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP) default MEMORY_START ... > > Signed-off-by: Dave Vasilevsky > Reported-by: Reimar Döffinger > Closes: https://lists.debian.org/debian-powerpc/2024/07/msg1.html > Fixes: 75bc255a7444 ("crash: clean up kdump related config items") > --- > arch/arm/Kconfig | 3 +++ > arch/arm64/Kconfig | 3 +++ > arch/loongarch/Kconfig | 3 +++ > arch/mips/Kconfig | 3 +++ > arch/powerpc/Kconfig | 4 > arch/riscv/Kconfig | 3 +++ > arch/s390/Kconfig | 3 +++ > arch/sh/Kconfig| 3 +++ > arch/x86/Kconfig | 3 +++ > kernel/Kconfig.kexec | 2 +- > 10 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 54b2bb817a7f..200995052690 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -1597,6 +1597,9 @@ config ATAGS_PROC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config AUTO_ZRELADDR > bool "Auto calculation of the decompressed kernel image address" if > !ARCH_MULTIPLATFORM > default !(ARCH_FOOTBRIDGE || ARCH_RPC || ARCH_SA1100) > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a2f8ff354ca6..43e08cc8204f 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1558,6 +1558,9 @@ config ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_SIG > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > def_bool CRASH_RESERVE > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 70f169210b52..ce232ddcd27d 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -599,6 +599,9 @@ config ARCH_SUPPORTS_KEXEC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config ARCH_SELECTS_CRASH_DUMP > def_bool y > depends on CRASH_DUMP > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > index 60077e576935..b547f4304d0c 100644 > --- a/arch/mips/Kconfig > +++ b/arch/mips/Kconfig > @@ -2881,6 +2881,9 @@ config ARCH_SUPPORTS_KEXEC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config PHYSICAL_START > hex "Physical address where the kernel is loaded" > default "0x8400" > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index d7b09b064a8a..0f3c1f958eac 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -682,6 +682,10 @@ config RELOCATABLE_TEST > config ARCH_SUPPORTS_CRASH_DUMP > def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) > > +config ARCH_DEFAULT_CRASH_DUMP > + bool > + default y if !PPC_BOOK3S_32 > + > config ARCH_SELECTS_CRASH_DUMP > def_bool y > depends on CRASH_DUMP > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 0f3cd7c3a436..eb247b5ee569 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -880,6 +880,9 @@ config ARCH_SUPPORTS_KEXEC_PURGATORY > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > def_bool CRASH_RESERVE > > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > index a822f952f64a..05a1fb408471 100644 > --- a/arch/s390/Kconfig > +++ b/arch/s390/Kconfig > @@ -275,6 +275,9 @@ config ARCH_SUPPORTS_CRASH_DUMP > This option also enables s390 zfcpdump. > See also > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > menu "Processor type and features" > > config HAVE_MARCH_Z10_FEATURES > diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig > index 1aa3c4a0c5b2..b04cfa23378c 100644 > --- a/arch/sh/Kconfig > +++ b/arch/sh/Kconfig > @@ -549,6 +549,9 @@ config ARCH_SUPPORTS_KEXEC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool BROKEN_ON_SMP > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool n > + > config ARCH_SUPPORTS_KEXEC_JUMP > def_bool y > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 007bab9f2a0e..aa4666bb9e9c 1006
Re: [PATCH] crash: Default to CRASH_DUMP=n when support for it is unlikely
On 08/29/24 at 11:37pm, Dave Vasilevsky wrote: > On 2024-08-29 23:15, Baoquan He wrote: > >> +config ARCH_DEFAULT_CRASH_DUMP > >> + def_bool n > > > > If we don't add ARCH_DEFAULT_CRASH_DUMP at all in sh arch, the > > CRASH_DUMP will be off by default according to the below new definition > > of CRASH_DUMP? > > Yes, that's true. But if we don't add it at all in sh arch, it looks confusing > in the search feature of menuconfig: > > > Symbol: ARCH_DEFAULT_CRASH_DUMP [=ARCH_DEFAULT_CRASH_DUMP] > > Type : unknown > > So I thought it was better to explicitly set it to 'n'. What do you think? If so, better adding it. Thanks. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] crash: Default to CRASH_DUMP=n when support for it is unlikely
Hi Dave, On 08/23/24 at 08:51am, Dave Vasilevsky wrote: ..snip.. > diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig > index 1aa3c4a0c5b2..b04cfa23378c 100644 > --- a/arch/sh/Kconfig > +++ b/arch/sh/Kconfig > @@ -549,6 +549,9 @@ config ARCH_SUPPORTS_KEXEC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool BROKEN_ON_SMP > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool n If we don't add ARCH_DEFAULT_CRASH_DUMP at all in sh arch, the CRASH_DUMP will be off by default according to the below new definition of CRASH_DUMP? Thanks Baoquan > + > config ARCH_SUPPORTS_KEXEC_JUMP > def_bool y > .. > diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec > index 6c34e63c88ff..4d111f871951 100644 > --- a/kernel/Kconfig.kexec > +++ b/kernel/Kconfig.kexec > @@ -97,7 +97,7 @@ config KEXEC_JUMP > > config CRASH_DUMP > bool "kernel crash dumps" > - default y > + default ARCH_DEFAULT_CRASH_DUMP > depends on ARCH_SUPPORTS_CRASH_DUMP > depends on KEXEC_CORE > select VMCORE_INFO > -- > 2.34.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v2 2/2] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
Recently, it's reported that kdump kernel is broken during bootup on SME system when CONFIG_IMA_KEXEC=y. When debugging, I noticed this can be traced back to commit ("b69a2afd5afc x86/kexec: Carry forward IMA measurement log on kexec"). Just nobody ever tested it on SME system when enabling CONFIG_IMA_KEXEC. -- ima: No TPM chip found, activating TPM-bypass! Loading compiled-in module X.509 certificates Loaded X.509 cert 'Build time autogenerated kernel key: 18ae0bc7e79b64700122bb1d6a904b070fef2656' ima: Allocated hash algorithm: sha256 Oops: general protection fault, probably for non-canonical address 0xcfacfdfe6660003e: [#1] PREEMPT SMP NOPTI CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc2+ #14 Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.20.0 05/03/2023 RIP: 0010:ima_restore_measurement_list+0xdc/0x420 Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 RSP: 0018:c9053c80 EFLAGS: 00010286 RAX: RBX: c9053d03 RCX: RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 RBP: c9053d80 R08: R09: 82de1a88 R10: c9053da0 R11: 0003 R12: 01a4 R13: c9053df0 R14: R15: FS: () GS:88804020() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 Call Trace: ? show_trace_log_lvl+0x1b0/0x2f0 ? show_trace_log_lvl+0x1b0/0x2f0 ? ima_load_kexec_buffer+0x6e/0xf0 ? __die_body.cold+0x8/0x12 ? die_addr+0x3c/0x60 ? exc_general_protection+0x178/0x410 ? asm_exc_general_protection+0x26/0x30 ? ima_restore_measurement_list+0xdc/0x420 ? vprintk_emit+0x1f0/0x270 ? ima_load_kexec_buffer+0x6e/0xf0 ima_load_kexec_buffer+0x6e/0xf0 ima_init+0x52/0xb0 ? __pfx_init_ima+0x10/0x10 init_ima+0x26/0xc0 ? __pfx_init_ima+0x10/0x10 do_one_initcall+0x5b/0x300 do_initcalls+0xdf/0x100 ? __pfx_kernel_init+0x10/0x10 kernel_init_freeable+0x147/0x1a0 kernel_init+0x1a/0x140 ret_from_fork+0x34/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Modules linked in: ---[ end trace ]--- RIP: 0010:ima_restore_measurement_list+0xdc/0x420 Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 RSP: 0018:c9053c80 EFLAGS: 00010286 RAX: RBX: c9053d03 RCX: RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 RBP: c9053d80 R08: R09: 82de1a88 R10: c9053da0 R11: 0003 R12: 01a4 R13: c9053df0 R14: R15: FS: () GS:88804020() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Rebooting in 10 seconds.. >From debugging printing, the stored addr and size of ima_kexec buffer are not decrypted correctly like: -- ima: ima_load_kexec_buffer, buffer:0xcfacfdfe6660003e, size:0xe48066052d5df359 -- There are three pieces of setup_data info passed to kexec/kdump kernel: SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED. However, among them, only ima_kexec buffer suffered from the incorrect decryption. After debugging, it's because of the code bug in early_memremap_is_setup_data() where checking the embedded content inside setup_data takes wrong range calculation. The length of efi data, rng_seed and ima_kexec are 0x70, 0x20, 0x10, and the length of setup_data is 0x10. When checking if data is inside the embedded conent of setup_data, the starting address of efi data and rng_seed happened to land in the wrong calculated range. While the ima_kexec's starting address unluckily doesn't pass the checking, then error occurred. Here fix the code bug to make kexec/kdump kernel boot up successfully. And also fix the similar buggy code in memremap_is_setup_data() which are found out during code reviewing. Fixes: b3c72fc9a78e ("x86/boot: Introduce setup_indirect") Signed-off-by: Baoquan He --- arch/x86/mm/ioremap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index f1ee8822ddf1..4cadc7ef1cb4 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -657,7 +657,7 @@ static bool memremap_is_setup_data(resource_size_t phys_addr,
[PATCH v2 1/2] x86/mm: rename the confusing local variable in early_memremap_is_setup_data()
In function early_memremap_is_setup_data(), parameter 'size' passed has the same name as the local variable inside the while loop. That confuses people who sometime mix up them when reading code. Here rename the local variable 'size' inside while loop to 'sd_size'. And also add one local variable 'sd_size' likewise in function memremap_is_setup_data() to simplify code. In later patch, this can also be used. Signed-off-by: Baoquan He --- arch/x86/mm/ioremap.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index aa7d279321ea..f1ee8822ddf1 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -640,7 +640,7 @@ static bool memremap_is_setup_data(resource_size_t phys_addr, paddr = boot_params.hdr.setup_data; while (paddr) { - unsigned int len; + unsigned int len, sd_size; if (phys_addr == paddr) return true; @@ -652,6 +652,8 @@ static bool memremap_is_setup_data(resource_size_t phys_addr, return false; } + sd_size = sizeof(*data); + paddr_next = data->next; len = data->len; @@ -662,7 +664,9 @@ static bool memremap_is_setup_data(resource_size_t phys_addr, if (data->type == SETUP_INDIRECT) { memunmap(data); - data = memremap(paddr, sizeof(*data) + len, + + sd_size += len; + data = memremap(paddr, sd_size, MEMREMAP_WB | MEMREMAP_DEC); if (!data) { pr_warn("failed to memremap indirect setup_data\n"); @@ -701,7 +705,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, paddr = boot_params.hdr.setup_data; while (paddr) { - unsigned int len, size; + unsigned int len, sd_size; if (phys_addr == paddr) return true; @@ -712,7 +716,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, return false; } - size = sizeof(*data); + sd_size = sizeof(*data); paddr_next = data->next; len = data->len; @@ -723,9 +727,9 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, } if (data->type == SETUP_INDIRECT) { - size += len; + sd_size += len; early_memunmap(data, sizeof(*data)); - data = early_memremap_decrypted(paddr, size); + data = early_memremap_decrypted(paddr, sd_size); if (!data) { pr_warn("failed to early memremap indirect setup_data\n"); return false; @@ -739,7 +743,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, } } - early_memunmap(data, size); + early_memunmap(data, sd_size); if ((phys_addr > paddr) && (phys_addr < (paddr + len))) return true; -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v2 0/2] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
In this v2 posting, - patch 2 is to fix the kdump kernel breakage; - patch 1 is added to clean up the confusing local varibale naming because people may mix up the local variable 'size' with the passed in parameter in function early_memremap_is_setup_data(). This cleanup is suggested by Dave and Tom during v1 patch reviewing. V1 post can be found here: === [PATCH] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y https://lore.kernel.org/all/20240826024457.22423-1-...@redhat.com/T/#u Baoquan He (2): x86/mm: rename the confusing local variable in early_memremap_is_setup_data() x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y arch/x86/mm/ioremap.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] arm64/vmcore: Add pgtable_l5_enabled information in vmcoreinfo
On 08/27/24 at 01:24pm, Will Deacon wrote: > On Mon, Aug 26, 2024 at 02:52:02PM +0800, Kuan-Ying Lee wrote: > > Since arm64 supports 5-level page tables, we need to add this > > information to vmcoreinfo to make debug tools know if 5-level > > page table is enabled or not. > > > > Missing this information will break the debug tool like crash [1]. > > > > [1] https://github.com/crash-utility/crash > > > > Signed-off-by: Kuan-Ying Lee > > --- > > Documentation/admin-guide/kdump/vmcoreinfo.rst | 6 ++ > > arch/arm64/kernel/vmcore_info.c| 3 +++ > > 2 files changed, 9 insertions(+) > > In which case, wouldn't you also want to know about pgtable_l4_enabled()? That is a good question. I guess it's deduced in code, mostly needed for different PAGE_OFFSET, how to transfer virtual addr to physical addr, etc. Add Crash utility experts here. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
On 08/27/24 at 09:00am, Tom Lendacky wrote: > On 8/27/24 08:52, Tom Lendacky wrote: > > On 8/26/24 22:19, Baoquan He wrote: > >> On 08/26/24 at 09:24am, Tom Lendacky wrote: > >>> On 8/25/24 21:44, Baoquan He wrote: > >>>> Recently, it's reported that kdump kernel is broken during bootup on > >>>> SME system when CONFIG_IMA_KEXEC=y. When debugging, I noticed this > >>>> can be traced back to commit ("b69a2afd5afc x86/kexec: Carry forward > >>>> IMA measurement log on kexec"). Just nobody ever tested it on SME > >>>> system when enabling CONFIG_IMA_KEXEC. > >>>> > >>>> > >>>> Here fix the code bug to make kexec/kdump kernel boot up successfully. > >>>> > >>>> Fixes: 8f716c9b5feb ("x86/mm: Add support to access boot related data in > >>>> the clear") > >>> > >>> The check that was modified was added by: > >>> b3c72fc9a78e ("x86/boot: Introduce setup_indirect") > >>> > >>> The SETUP_INDIRECT patches seem to be the issue here. > >> > >> Hmm, I didn't check it carefully, thanks for addding this info. While > >> after checking commit b3c72fc9a78e, I feel the adding code was trying to > >> fix your original early_memremap_is_setup_data(). Even though > >> SETUP_INDIRECT type of setup_data has been added, the original > >> early_memremap_is_setup_data() only check the starting address and > >> the content of struct setup_data, that's obviously wrong. > > > > IIRC, when this function was created, the value of "len" in setup_data > > included the length of "data", so the calculation was correct. Everything > > was contiguous in a setup_data element. > > > >> > >> arch/x86/include/uapi/asm/setup_data.h: > >> /* extensible setup data list node */ > >> struct setup_data { > >> __u64 next; > >> __u32 type; > >> __u32 len; > >> __u8 data[]; > >> }; > >> > >> As you can see, the zero-length will embed the carried data which is > >> actually expected and adjacent to its carrier, the struct setup_data. > > > > Right, and "len" is the length of that data. So paddr + len goes to the > > end of the overall setup_data. > > Ah, I see what you're saying. "len" doesn't include the size of the > setup_data structure, only the data. If so, then, yes, adding a sizeof() > to the calculation in the if statement is correct. Exactly. That could confuse people sometime. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
On 08/27/24 at 01:41pm, Dave Young wrote: > On Tue, 27 Aug 2024 at 13:28, Baoquan He wrote: > > > > On 08/26/24 at 09:24am, Tom Lendacky wrote: > > > On 8/25/24 21:44, Baoquan He wrote: .. > > > > --- > > > > arch/x86/mm/ioremap.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c > > > > index aa7d279321ea..7953c4a1d28d 100644 > > > > --- a/arch/x86/mm/ioremap.c > > > > +++ b/arch/x86/mm/ioremap.c > > > > @@ -717,7 +717,7 @@ static bool __init > > > > early_memremap_is_setup_data(resource_size_t phys_addr, > > > > paddr_next = data->next; > > > > len = data->len; > > > > > > > > - if ((phys_addr > paddr) && (phys_addr < (paddr + len))) { > > > > + if ((phys_addr > paddr) && (phys_addr < (paddr + size + > > > > len))) { > > > > > > I don't think this is correct. You are adding the requested size to the > > > length of the setup data element. The length is the true length of the > > > setup data and should not be increased. > > > > I talked to Dave, he reminded me that people could mix the passed in > > parameter 'size' and the local variable 'size' defined inside the while > > loop, not sure which 'size' you are referring to. > > > Baoquan, you are right, but I think I mistakenly read the code in > memremap_is_setup_data instead of early_memremap_is_setup_data. You > can check the memremap_is_setup_data, no "size = sizeof (*data)", so > these two functions could both need fixes. Agree, memremap_is_setup_data() has the same drawback in code. > > Otherwise it would be better to change the function internal variable > name, it could cause confusion even if the actual result is correct. Ok, will consider to change when spinning v2. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] crash: Default to CRASH_DUMP=n when support for it is unlikely
On 08/27/24 at 08:37am, John Paul Adrian Glaubitz wrote: > On Tue, 2024-08-27 at 14:22 +0800, Baoquan He wrote: > > About why it's enabled by default, as Michael has explained in another > > thread, distros usualy needs to enable it by default because vmcore > > dumping is a very important feature on servers, even guest instances. > > Even though kdump codes are enabled to built in, not providing > > crashkernel= value won't make vmcore dumping take effect, it won't cost > > system resources in that case. > > OK, thanks for the explanation. But as we have found out in the mean time, > the assumption was wrong to enable it by default for all architectures as > some architectures cannot boot a crash dump kernel with their default > bootloader > but only through kexec. > > Can we have a follow-up patch to disable crash dump kernels where they're > not needed? I mean, not every platform supported by Linux is obviously a > x86-based or POWER-based server. Yes, while isn't Dave's patch a good one to fix it? In Dave's patch, the default enabling of CRASH_DUMP has been taken off, change to rely on ARCH_DEFAULT_CRASH_DUMP provided by each arch. config CRASH_DUMP bool "kernel crash dumps" - default y + default ARCH_DEFAULT_CRASH_DUMP ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] crash: Default to CRASH_DUMP=n when support for it is unlikely
On 08/23/24 at 08:16pm, John Paul Adrian Glaubitz wrote: > Hi Geert, > > On Fri, 2024-08-23 at 15:13 +0200, Geert Uytterhoeven wrote: > > IMHO CRASH_DUMP should just default to n, like most kernel options, as > > it enables non-trivial extra functionality: the kernel source tree has > > more than 100 locations that check if CONFIG_CRASH_DUMP is enabled. > > I guess we should then revert that part of Baoquan's original patch. > > > What is so special about CRASH_DUMP, that it should be enabled by > > default? > > Let's ask Baoquan who made the original change to enable CRASH_DUMP by > default. Sorry for late reply. It's me who enabled it by default when I clean up the messy Kconfig items related to kexec/kdump. Before the clean up, CONFIG_CRASH_DUMP only controlled a very small file including sevearl functions and macro definitions. But kernel codes took CRASH_DUMP as switch of kdump. About why it's enabled by default, as Michael has explained in another thread, distros usualy needs to enable it by default because vmcore dumping is a very important feature on servers, even guest instances. Even though kdump codes are enabled to built in, not providing crashkernel= value won't make vmcore dumping take effect, it won't cost system resources in that case. Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
On 08/26/24 at 09:24am, Tom Lendacky wrote: > On 8/25/24 21:44, Baoquan He wrote: > > Recently, it's reported that kdump kernel is broken during bootup on > > SME system when CONFIG_IMA_KEXEC=y. When debugging, I noticed this > > can be traced back to commit ("b69a2afd5afc x86/kexec: Carry forward > > IMA measurement log on kexec"). Just nobody ever tested it on SME > > system when enabling CONFIG_IMA_KEXEC. > > > > -- > > ima: No TPM chip found, activating TPM-bypass! > > Loading compiled-in module X.509 certificates > > Loaded X.509 cert 'Build time autogenerated kernel key: > > 18ae0bc7e79b64700122bb1d6a904b070fef2656' > > ima: Allocated hash algorithm: sha256 > > Oops: general protection fault, probably for non-canonical address > > 0xcfacfdfe6660003e: [#1] PREEMPT SMP NOPTI > > CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc2+ #14 > > Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.20.0 05/03/2023 > > RIP: 0010:ima_restore_measurement_list+0xdc/0x420 > > Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 > > 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 > > 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 > > RSP: 0018:c9053c80 EFLAGS: 00010286 > > RAX: RBX: c9053d03 RCX: > > RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 > > RBP: c9053d80 R08: R09: 82de1a88 > > R10: c9053da0 R11: 0003 R12: 01a4 > > R13: c9053df0 R14: R15: > > FS: () GS:88804020() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 > > Call Trace: > > > > ? show_trace_log_lvl+0x1b0/0x2f0 > > ? show_trace_log_lvl+0x1b0/0x2f0 > > ? ima_load_kexec_buffer+0x6e/0xf0 > > ? __die_body.cold+0x8/0x12 > > ? die_addr+0x3c/0x60 > > ? exc_general_protection+0x178/0x410 > > ? asm_exc_general_protection+0x26/0x30 > > ? ima_restore_measurement_list+0xdc/0x420 > > ? vprintk_emit+0x1f0/0x270 > > ? ima_load_kexec_buffer+0x6e/0xf0 > > ima_load_kexec_buffer+0x6e/0xf0 > > ima_init+0x52/0xb0 > > ? __pfx_init_ima+0x10/0x10 > > init_ima+0x26/0xc0 > > ? __pfx_init_ima+0x10/0x10 > > do_one_initcall+0x5b/0x300 > > do_initcalls+0xdf/0x100 > > ? __pfx_kernel_init+0x10/0x10 > > kernel_init_freeable+0x147/0x1a0 > > kernel_init+0x1a/0x140 > > ret_from_fork+0x34/0x50 > > ? __pfx_kernel_init+0x10/0x10 > > ret_from_fork_asm+0x1a/0x30 > > > > Modules linked in: > > ---[ end trace ]--- > > RIP: 0010:ima_restore_measurement_list+0xdc/0x420 > > Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 > > 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 > > 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 > > RSP: 0018:c9053c80 EFLAGS: 00010286 > > RAX: RBX: c9053d03 RCX: > > RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 > > RBP: c9053d80 R08: R09: 82de1a88 > > R10: c9053da0 R11: 0003 R12: 01a4 > > R13: c9053df0 R14: R15: > > FS: () GS:88804020() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 > > Kernel panic - not syncing: Fatal exception > > Kernel Offset: disabled > > Rebooting in 10 seconds.. > > > > From debugging printing, the stored addr and size of ima_kexec buffer > > are not decrypted correctly like: > > -- > > ima: ima_load_kexec_buffer, buffer:0xcfacfdfe6660003e, > > size:0xe48066052d5df359 > > -- > > > > There are three pieces of setup_data info passed to kexec/kdump kernel: > > SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED. However, among them, only > > ima_kexec buffer suffered from the incorrect decryption. After > > debugging, it's because of the code bug in early_memremap_is_setup_data() > > where checking the embedded content inside setup_data takes wrong range > > calculation. > > &g
Re: [PATCH] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
On 08/27/24 at 09:41am, Dave Young wrote: > On Tue, 27 Aug 2024 at 09:39, Dave Young wrote: > > > > On Mon, 26 Aug 2024 at 22:24, Tom Lendacky wrote: > > > > > > On 8/25/24 21:44, Baoquan He wrote: > > > > Recently, it's reported that kdump kernel is broken during bootup on > > > > SME system when CONFIG_IMA_KEXEC=y. When debugging, I noticed this > > > > can be traced back to commit ("b69a2afd5afc x86/kexec: Carry forward > > > > IMA measurement log on kexec"). Just nobody ever tested it on SME > > > > system when enabling CONFIG_IMA_KEXEC. > > > > > > > > -- > > > > ima: No TPM chip found, activating TPM-bypass! > > > > Loading compiled-in module X.509 certificates > > > > Loaded X.509 cert 'Build time autogenerated kernel key: > > > > 18ae0bc7e79b64700122bb1d6a904b070fef2656' > > > > ima: Allocated hash algorithm: sha256 > > > > Oops: general protection fault, probably for non-canonical address > > > > 0xcfacfdfe6660003e: [#1] PREEMPT SMP NOPTI > > > > CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc2+ #14 > > > > Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.20.0 05/03/2023 > > > > RIP: 0010:ima_restore_measurement_list+0xdc/0x420 > > > > Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 > > > > 00 00 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 > > > > 3e 01 49 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 > > > > RSP: 0018:c9053c80 EFLAGS: 00010286 > > > > RAX: RBX: c9053d03 RCX: > > > > RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 > > > > RBP: c9053d80 R08: R09: 82de1a88 > > > > R10: c9053da0 R11: 0003 R12: 01a4 > > > > R13: c9053df0 R14: R15: > > > > FS: () GS:88804020() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 > > > > Call Trace: > > > > > > > > ? show_trace_log_lvl+0x1b0/0x2f0 > > > > ? show_trace_log_lvl+0x1b0/0x2f0 > > > > ? ima_load_kexec_buffer+0x6e/0xf0 > > > > ? __die_body.cold+0x8/0x12 > > > > ? die_addr+0x3c/0x60 > > > > ? exc_general_protection+0x178/0x410 > > > > ? asm_exc_general_protection+0x26/0x30 > > > > ? ima_restore_measurement_list+0xdc/0x420 > > > > ? vprintk_emit+0x1f0/0x270 > > > > ? ima_load_kexec_buffer+0x6e/0xf0 > > > > ima_load_kexec_buffer+0x6e/0xf0 > > > > ima_init+0x52/0xb0 > > > > ? __pfx_init_ima+0x10/0x10 > > > > init_ima+0x26/0xc0 > > > > ? __pfx_init_ima+0x10/0x10 > > > > do_one_initcall+0x5b/0x300 > > > > do_initcalls+0xdf/0x100 > > > > ? __pfx_kernel_init+0x10/0x10 > > > > kernel_init_freeable+0x147/0x1a0 > > > > kernel_init+0x1a/0x140 > > > > ret_from_fork+0x34/0x50 > > > > ? __pfx_kernel_init+0x10/0x10 > > > > ret_from_fork_asm+0x1a/0x30 > > > > > > > > Modules linked in: > > > > ---[ end trace ]--- > > > > RIP: 0010:ima_restore_measurement_list+0xdc/0x420 > > > > Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 > > > > 00 00 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 > > > > 3e 01 49 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 > > > > RSP: 0018:c9053c80 EFLAGS: 00010286 > > > > RAX: RBX: c9053d03 RCX: > > > > RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 > > > > RBP: c9053d80 R08: R09: 82de1a88 > > > > R10: c9053da0 R11: 0003 R12: 01a4 > > > > R13: c9053df0 R14: R15: > > > > FS: () GS:88804020() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: 7f2c744050e8 CR3: 80004110e000 CR
Re: [PATCH] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
On 08/26/24 at 09:24am, Tom Lendacky wrote: > On 8/25/24 21:44, Baoquan He wrote: > > Recently, it's reported that kdump kernel is broken during bootup on > > SME system when CONFIG_IMA_KEXEC=y. When debugging, I noticed this > > can be traced back to commit ("b69a2afd5afc x86/kexec: Carry forward > > IMA measurement log on kexec"). Just nobody ever tested it on SME > > system when enabling CONFIG_IMA_KEXEC. > > > > -- > > ima: No TPM chip found, activating TPM-bypass! > > Loading compiled-in module X.509 certificates > > Loaded X.509 cert 'Build time autogenerated kernel key: > > 18ae0bc7e79b64700122bb1d6a904b070fef2656' > > ima: Allocated hash algorithm: sha256 > > Oops: general protection fault, probably for non-canonical address > > 0xcfacfdfe6660003e: [#1] PREEMPT SMP NOPTI > > CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc2+ #14 > > Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.20.0 05/03/2023 > > RIP: 0010:ima_restore_measurement_list+0xdc/0x420 > > Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 > > 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 > > 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 > > RSP: 0018:c9053c80 EFLAGS: 00010286 > > RAX: RBX: c9053d03 RCX: > > RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 > > RBP: c9053d80 R08: R09: 82de1a88 > > R10: c9053da0 R11: 0003 R12: 01a4 > > R13: c9053df0 R14: R15: > > FS: () GS:88804020() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 > > Call Trace: > > > > ? show_trace_log_lvl+0x1b0/0x2f0 > > ? show_trace_log_lvl+0x1b0/0x2f0 > > ? ima_load_kexec_buffer+0x6e/0xf0 > > ? __die_body.cold+0x8/0x12 > > ? die_addr+0x3c/0x60 > > ? exc_general_protection+0x178/0x410 > > ? asm_exc_general_protection+0x26/0x30 > > ? ima_restore_measurement_list+0xdc/0x420 > > ? vprintk_emit+0x1f0/0x270 > > ? ima_load_kexec_buffer+0x6e/0xf0 > > ima_load_kexec_buffer+0x6e/0xf0 > > ima_init+0x52/0xb0 > > ? __pfx_init_ima+0x10/0x10 > > init_ima+0x26/0xc0 > > ? __pfx_init_ima+0x10/0x10 > > do_one_initcall+0x5b/0x300 > > do_initcalls+0xdf/0x100 > > ? __pfx_kernel_init+0x10/0x10 > > kernel_init_freeable+0x147/0x1a0 > > kernel_init+0x1a/0x140 > > ret_from_fork+0x34/0x50 > > ? __pfx_kernel_init+0x10/0x10 > > ret_from_fork_asm+0x1a/0x30 > > > > Modules linked in: > > ---[ end trace ]--- > > RIP: 0010:ima_restore_measurement_list+0xdc/0x420 > > Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 > > 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 > > 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 > > RSP: 0018:c9053c80 EFLAGS: 00010286 > > RAX: RBX: c9053d03 RCX: > > RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 > > RBP: c9053d80 R08: R09: 82de1a88 > > R10: c9053da0 R11: 0003 R12: 01a4 > > R13: c9053df0 R14: R15: > > FS: () GS:88804020() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 > > Kernel panic - not syncing: Fatal exception > > Kernel Offset: disabled > > Rebooting in 10 seconds.. > > > > From debugging printing, the stored addr and size of ima_kexec buffer > > are not decrypted correctly like: > > -- > > ima: ima_load_kexec_buffer, buffer:0xcfacfdfe6660003e, > > size:0xe48066052d5df359 > > -- > > > > There are three pieces of setup_data info passed to kexec/kdump kernel: > > SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED. However, among them, only > > ima_kexec buffer suffered from the incorrect decryption. After > > debugging, it's because of the code bug in early_memremap_is_setup_data() > > where checking the embedded content inside setup_data takes wrong range > > calculation. > > &g
[PATCH] x86/mm/sme: fix the kdump kernel breakage on SME system when CONFIG_IMA_KEXEC=y
Recently, it's reported that kdump kernel is broken during bootup on SME system when CONFIG_IMA_KEXEC=y. When debugging, I noticed this can be traced back to commit ("b69a2afd5afc x86/kexec: Carry forward IMA measurement log on kexec"). Just nobody ever tested it on SME system when enabling CONFIG_IMA_KEXEC. -- ima: No TPM chip found, activating TPM-bypass! Loading compiled-in module X.509 certificates Loaded X.509 cert 'Build time autogenerated kernel key: 18ae0bc7e79b64700122bb1d6a904b070fef2656' ima: Allocated hash algorithm: sha256 Oops: general protection fault, probably for non-canonical address 0xcfacfdfe6660003e: [#1] PREEMPT SMP NOPTI CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc2+ #14 Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.20.0 05/03/2023 RIP: 0010:ima_restore_measurement_list+0xdc/0x420 Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 RSP: 0018:c9053c80 EFLAGS: 00010286 RAX: RBX: c9053d03 RCX: RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 RBP: c9053d80 R08: R09: 82de1a88 R10: c9053da0 R11: 0003 R12: 01a4 R13: c9053df0 R14: R15: FS: () GS:88804020() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 Call Trace: ? show_trace_log_lvl+0x1b0/0x2f0 ? show_trace_log_lvl+0x1b0/0x2f0 ? ima_load_kexec_buffer+0x6e/0xf0 ? __die_body.cold+0x8/0x12 ? die_addr+0x3c/0x60 ? exc_general_protection+0x178/0x410 ? asm_exc_general_protection+0x26/0x30 ? ima_restore_measurement_list+0xdc/0x420 ? vprintk_emit+0x1f0/0x270 ? ima_load_kexec_buffer+0x6e/0xf0 ima_load_kexec_buffer+0x6e/0xf0 ima_init+0x52/0xb0 ? __pfx_init_ima+0x10/0x10 init_ima+0x26/0xc0 ? __pfx_init_ima+0x10/0x10 do_one_initcall+0x5b/0x300 do_initcalls+0xdf/0x100 ? __pfx_kernel_init+0x10/0x10 kernel_init_freeable+0x147/0x1a0 kernel_init+0x1a/0x140 ret_from_fork+0x34/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Modules linked in: ---[ end trace ]--- RIP: 0010:ima_restore_measurement_list+0xdc/0x420 Code: ff 48 c7 85 10 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 48 85 f6 0f 84 09 03 00 00 48 83 fa 17 0f 86 ff 02 00 00 <66> 83 3e 01 49 89 f4 0f 85 90 94 7d 00 48 83 7e 10 ff 0f 84 74 94 RSP: 0018:c9053c80 EFLAGS: 00010286 RAX: RBX: c9053d03 RCX: RDX: e48066052d5df359 RSI: cfacfdfe6660003e RDI: cfacfdfe66600056 RBP: c9053d80 R08: R09: 82de1a88 R10: c9053da0 R11: 0003 R12: 01a4 R13: c9053df0 R14: R15: FS: () GS:88804020() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2c744050e8 CR3: 80004110e000 CR4: 003506b0 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Rebooting in 10 seconds.. >From debugging printing, the stored addr and size of ima_kexec buffer are not decrypted correctly like: -- ima: ima_load_kexec_buffer, buffer:0xcfacfdfe6660003e, size:0xe48066052d5df359 -- There are three pieces of setup_data info passed to kexec/kdump kernel: SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED. However, among them, only ima_kexec buffer suffered from the incorrect decryption. After debugging, it's because of the code bug in early_memremap_is_setup_data() where checking the embedded content inside setup_data takes wrong range calculation. The length of efi data, rng_seed and ima_kexec are 0x70, 0x20, 0x10, and the length of setup_data is 0x10. When checking if data is inside the embedded conent of setup_data, the starting address of efi data and rng_seed happened to land in the wrong calculated range. While the ima_kexec's starting address unluckily doesn't pass the checking, then error occurred. Here fix the code bug to make kexec/kdump kernel boot up successfully. Fixes: 8f716c9b5feb ("x86/mm: Add support to access boot related data in the clear") Signed-off-by: Baoquan He --- arch/x86/mm/ioremap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index aa7d279321ea..7953c4a1d28d 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -717,7 +717,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, paddr_next = data->next; len = data->len;
Re: [PATCH linux-next v3 05/14] crash: clean up kdump related config items
On 08/22/24 at 08:41pm, Dave Vasilevsky wrote: > On 2024-08-22 20:04, Baoquan He wrote: > > If so, below patch possiblly can fix it. Can you help check if it's OK? > > That removes the possibility of enabling CRASH_DUMP on PPC_BOOK3S_32, even > when booting via other mechanisms. Maybe it would be best to just make it > not-default? Please take a look at this patch: > This is a good mimic of ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_SIG and the correspondent KEXEC_IMAGE_VERIFY_SIG. It looks good to me, as long as no one complain we introduce too many knobs. Can you post this formally so that people can review it? > > From d6e5fe3a45f46f1aa01914648c443291d956de9e Mon Sep 17 00:00:00 2001 > From: Dave Vasilevsky > Date: Thu, 22 Aug 2024 20:13:46 -0400 > Subject: [PATCH] powerpc: Default to CRASH_DUMP=n when Open Firmware boot is > likely > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Open Firmware is unable to boot a kernel where PHYSICAL_START is > non-zero, which occurs when CRASH_DUMP is on. > > On PPC_BOOK3S_32, the most common way of booting is Open Firmware, so > most users probably don't want CRASH_DUMP. Users booting via some > other mechanism can turn it on explicitly. > > Signed-off-by: Dave Vasilevsky > Reported-by: Reimar Döffinger > Fixes: 75bc255a7444 > --- > arch/arm/Kconfig | 3 +++ > arch/arm64/Kconfig | 3 +++ > arch/loongarch/Kconfig | 3 +++ > arch/mips/Kconfig | 3 +++ > arch/powerpc/Kconfig | 4 > arch/riscv/Kconfig | 3 +++ > arch/s390/Kconfig | 3 +++ > arch/sh/Kconfig| 3 +++ > arch/x86/Kconfig | 3 +++ > kernel/Kconfig.kexec | 2 +- > 10 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 54b2bb817a7f..200995052690 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -1597,6 +1597,9 @@ config ATAGS_PROC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config AUTO_ZRELADDR > bool "Auto calculation of the decompressed kernel image address" if > !ARCH_MULTIPLATFORM > default !(ARCH_FOOTBRIDGE || ARCH_RPC || ARCH_SA1100) > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a2f8ff354ca6..43e08cc8204f 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1558,6 +1558,9 @@ config ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_SIG > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > def_bool CRASH_RESERVE > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 70f169210b52..ce232ddcd27d 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -599,6 +599,9 @@ config ARCH_SUPPORTS_KEXEC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config ARCH_SELECTS_CRASH_DUMP > def_bool y > depends on CRASH_DUMP > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > index 60077e576935..b547f4304d0c 100644 > --- a/arch/mips/Kconfig > +++ b/arch/mips/Kconfig > @@ -2881,6 +2881,9 @@ config ARCH_SUPPORTS_KEXEC > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config PHYSICAL_START > hex "Physical address where the kernel is loaded" > default "0x8400" > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index d7b09b064a8a..0f3c1f958eac 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -682,6 +682,10 @@ config RELOCATABLE_TEST > config ARCH_SUPPORTS_CRASH_DUMP > def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) > > +config ARCH_DEFAULT_CRASH_DUMP > + bool > + default y if !PPC_BOOK3S_32 > + > config ARCH_SELECTS_CRASH_DUMP > def_bool y > depends on CRASH_DUMP > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 0f3cd7c3a436..eb247b5ee569 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -880,6 +880,9 @@ config ARCH_SUPPORTS_KEXEC_PURGATORY > config ARCH_SUPPORTS_CRASH_DUMP > def_bool y > > +config ARCH_DEFAULT_CRASH_DUMP > + def_bool y > + > config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > def_bool CRASH_RESERVE > > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > index a822f952f64a..05a1fb408471 100644 > --- a/arch/s390/Kconfig > +++ b/arch/s390/Kconfig > @@ -275,6 +2
Re: [PATCH linux-next v3 05/14] crash: clean up kdump related config items
On 08/22/24 at 11:37am, John Paul Adrian Glaubitz wrote: > Hi Baoquan, > > On Thu, 2024-08-22 at 17:17 +0800, Baoquan He wrote: > > > The change to enable CONFIG_CRASH_DUMP by default apparently broke the > > > boot > > > on 32-bit Power Macintosh systems which fail after GRUB with: > > > > > > "Error: You can't boot a kdump kernel from OF!" > > > > > > We may have to turn this off for 32-bit Power Macintosh systems. > > > > > > See this thread on debian-powerpc ML: > > > https://lists.debian.org/debian-powerpc/2024/07/msg1.html > > > > If so, fix need be made. > > > > We may need change in ARCH_SUPPORTS_CRASH_DUMP of ppc, can you or anyone > > post a patch? I don't know how to identify 32-bit Power Macintosh. > > > > arch/powerpc/Kconfig: > > === > > config ARCH_SUPPORTS_CRASH_DUMP > > def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) > > > > config ARCH_SELECTS_CRASH_DUMP > > def_bool y > > depends on CRASH_DUMP > > select RELOCATABLE if PPC64 || 44x || PPC_85xx > > .. > > config PHYSICAL_START > > hex "Physical address where the kernel is loaded" if > > PHYSICAL_START_BOOL > > default "0x0200" if PPC_BOOK3S && CRASH_DUMP && > > !NONSTATIC_KERNEL > > default "0x" > > I think the architecture does support crash dumps, but I think the kernel has > to > be booted from kexec in this case. Booting a kernel with CRASH_DUMP enabled > won't > work from Open Firmware. So, I think CRASH_DUMP should just be disabled for > PPC_BOOK3S_32 by default and users who want to use it on these systems, will > have to > enable it explicitly. If so, below patch possiblly can fix it. Can you help check if it's OK? >From dd5318dc5dcd66521b31214f0e5921f258532ef8 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Fri, 23 Aug 2024 07:37:38 +0800 Subject: [PATCH] powerpc/crash: do not default to enable CRASH_DUMP for PPC_BOOK3S_32 system Content-type: text/plain Recently it's reported that PowerPC macMini system failed to boot up. It's because CONFIG_CRASH_DUMP=y is set by default on the system since kernel 6.9, and that makes CONFIG_PHYSICAL_START not equaling 0 any more and causes failure of normal kernel bootup. The link of error report can be found here: https://lists.debian.org/debian-powerpc/2024/07/msg1.html And copy the code snippet here for reference: arch/powerpc/Kconfig: == config KERNEL_START hex "Virtual address of kernel base" if KERNEL_START_BOOL default PAGE_OFFSET if PAGE_OFFSET_BOOL default "0xc200" if CRASH_DUMP && !NONSTATIC_KERNEL default "0xc000" So let's stop enabling CRASH_DUMP by default on PPC_BOOK3S_32. Signed-off-by: Baoquan He --- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index d7b09b064a8a..dc5ca58be1d6 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -680,7 +680,7 @@ config RELOCATABLE_TEST relocation code. config ARCH_SUPPORTS_CRASH_DUMP - def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) + def_bool PPC64 || PPC_85xx || (44x && !SMP) config ARCH_SELECTS_CRASH_DUMP def_bool y -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH linux-next v3 05/14] crash: clean up kdump related config items
On 08/22/24 at 09:33am, John Paul Adrian Glaubitz wrote: > Hi Baoquan, > > On Wed, 2024-01-24 at 13:12 +0800, Baoquan He wrote: > > By splitting CRASH_RESERVE and VMCORE_INFO out from CRASH_CORE, cleaning > > up the dependency of FA_DMUMP on CRASH_DUMP, and moving crash codes from > > kexec_core.c to crash_core.c, now we can rearrange CRASH_DUMP to > > depend on KEXEC_CORE, and make CRASH_DUMP select CRASH_RESERVE and > > VMCORE_INFO. > > > > KEXEC_CORE won't select CRASH_RESERVE and VMCORE_INFO any more because > > KEXEC_CORE enables codes which allocate control pages, copy > > kexec/kdump segments, and prepare for switching. These codes are shared > > by both kexec reboot and crash dumping. > > > > Doing this makes codes and the corresponding config items more > > logical (the right item depends on or is selected by the left item). > > > > PROC_KCORE ---> VMCORE_INFO > > > >|--> VMCORE_INFO > > FA_DUMP| > >|--> CRASH_RESERVE > > > > >VMCORE_INFO > >/ > >|>CRASH_RESERVE > > KEXEC --|/| > > |--> KEXEC_CORE--> CRASH_DUMP-->/-|>PROC_VMCORE > > KEXEC_FILE --| \ | > >\>CRASH_HOTPLUG > > > > KEXEC --| > > |--> KEXEC_CORE--> kexec reboot > > KEXEC_FILE --| > > > > Signed-off-by: Baoquan He > > --- > > kernel/Kconfig.kexec | 7 --- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec > > index 8faf27043432..6c34e63c88ff 100644 > > --- a/kernel/Kconfig.kexec > > +++ b/kernel/Kconfig.kexec > > @@ -9,8 +9,6 @@ config VMCORE_INFO > > bool > > > > config KEXEC_CORE > > - select VMCORE_INFO > > - select CRASH_RESERVE > > bool > > > > config KEXEC_ELF > > @@ -99,8 +97,11 @@ config KEXEC_JUMP > > > > config CRASH_DUMP > > bool "kernel crash dumps" > > + default y > > depends on ARCH_SUPPORTS_CRASH_DUMP > > - select KEXEC_CORE > > + depends on KEXEC_CORE > > + select VMCORE_INFO > > + select CRASH_RESERVE > > help > > Generate crash dump after being started by kexec. > > This should be normally only set in special crash dump kernels > > The change to enable CONFIG_CRASH_DUMP by default apparently broke the boot > on 32-bit Power Macintosh systems which fail after GRUB with: > > "Error: You can't boot a kdump kernel from OF!" > > We may have to turn this off for 32-bit Power Macintosh systems. > > See this thread on debian-powerpc ML: > https://lists.debian.org/debian-powerpc/2024/07/msg1.html If so, fix need be made. We may need change in ARCH_SUPPORTS_CRASH_DUMP of ppc, can you or anyone post a patch? I don't know how to identify 32-bit Power Macintosh. arch/powerpc/Kconfig: === config ARCH_SUPPORTS_CRASH_DUMP def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) config ARCH_SELECTS_CRASH_DUMP def_bool y depends on CRASH_DUMP select RELOCATABLE if PPC64 || 44x || PPC_85xx .. config PHYSICAL_START hex "Physical address where the kernel is loaded" if PHYSICAL_START_BOOL default "0x0200" if PPC_BOOK3S && CRASH_DUMP && !NONSTATIC_KERNEL default "0x" ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] Document/kexec: Generalize crash hotplug description
Add Jonathan and Andew. On 08/12/24 at 09:46am, Sourabh Jain wrote: > Commit 79365026f869 ("crash: add a new kexec flag for hotplug support") > generalizes the crash hotplug support to allow architectures to update > multiple kexec segments on CPU/Memory hotplug and not just elfcorehdr. > Therefore, update the relevant kernel documentation to reflect the same. Hi Jonathan and Andew, Could any of you pick this into your tree? Thanks Baoquan > > Cc: Petr Tesarik > Cc: Hari Bathini > Cc: kexec@lists.infradead.org > Cc: linux-ker...@vger.kernel.org > Cc: linuxppc-...@lists.ozlabs.org > Cc: x...@kernel.org > Signed-off-by: Sourabh Jain > --- > > Changelog: > > Since v1: > https://lore.kernel.org/all/20240805050829.297171-1-sourabhj...@linux.ibm.com/ > - Update crash_hotplug sysfs document as suggested by Petr T > - Update an error message in crash_handle_hotplug_event and > crash_check_hotplug_support function. > > --- > .../ABI/testing/sysfs-devices-memory | 6 ++-- > .../ABI/testing/sysfs-devices-system-cpu | 6 ++-- > .../admin-guide/mm/memory-hotplug.rst | 5 +-- > Documentation/core-api/cpu_hotplug.rst| 10 +++--- > kernel/crash_core.c | 33 +++ > 5 files changed, 35 insertions(+), 25 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-devices-memory > b/Documentation/ABI/testing/sysfs-devices-memory > index a95e0f17c35a..cec65827e602 100644 > --- a/Documentation/ABI/testing/sysfs-devices-memory > +++ b/Documentation/ABI/testing/sysfs-devices-memory > @@ -115,6 +115,6 @@ What: /sys/devices/system/memory/crash_hotplug > Date:Aug 2023 > Contact: Linux kernel mailing list > Description: > - (RO) indicates whether or not the kernel directly supports > - modifying the crash elfcorehdr for memory hot un/plug and/or > - on/offline changes. > + (RO) indicates whether or not the kernel updates relevant kexec > + segments on memory hot un/plug and/or on/offline events, > avoiding the > + need to reload kdump kernel. > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu > b/Documentation/ABI/testing/sysfs-devices-system-cpu > index 325873385b71..1a31b7c71676 100644 > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu > @@ -703,9 +703,9 @@ What: /sys/devices/system/cpu/crash_hotplug > Date:Aug 2023 > Contact: Linux kernel mailing list > Description: > - (RO) indicates whether or not the kernel directly supports > - modifying the crash elfcorehdr for CPU hot un/plug and/or > - on/offline changes. > + (RO) indicates whether or not the kernel updates relevant kexec > + segments on memory hot un/plug and/or on/offline events, > avoiding the > + need to reload kdump kernel. > > What:/sys/devices/system/cpu/enabled > Date:Nov 2022 > diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst > b/Documentation/admin-guide/mm/memory-hotplug.rst > index 098f14d83e99..cb2c080f400c 100644 > --- a/Documentation/admin-guide/mm/memory-hotplug.rst > +++ b/Documentation/admin-guide/mm/memory-hotplug.rst > @@ -294,8 +294,9 @@ The following files are currently defined: > ``crash_hotplug`` read-only: when changes to the system memory map > occur due to hot un/plug of memory, this file contains > '1' if the kernel updates the kdump capture kernel memory > -map itself (via elfcorehdr), or '0' if userspace must > update > -the kdump capture kernel memory map. > +map itself (via elfcorehdr and other relevant kexec > +segments), or '0' if userspace must update the kdump > +capture kernel memory map. > > Availability depends on the CONFIG_MEMORY_HOTPLUG kernel > configuration option. > diff --git a/Documentation/core-api/cpu_hotplug.rst > b/Documentation/core-api/cpu_hotplug.rst > index dcb0e379e5e8..a21dbf261be7 100644 > --- a/Documentation/core-api/cpu_hotplug.rst > +++ b/Documentation/core-api/cpu_hotplug.rst > @@ -737,8 +737,9 @@ can process the event further. > > When changes to the CPUs in the system occur, the sysfs file > /sys/devices/system/cpu/crash_hotplug contains '1' if the kernel > -updates the kdump capture kernel list of CPUs itself (via elfcorehdr), > -or '0' if userspace must update the kdump capture kernel list of CPUs. > +updates the kdump capture kernel list of CPUs itself (via elfcorehdr and > +other relevant kexec segment), or '0' if userspace must update the kdump > +capture kernel list of CPUs. > > The availability depends on the CONFIG_HOTPLUG_CPU kernel configuration > option. > @@ -
Re: [PATCH] kexec/crash: no crash update when kexec in progress
On 08/19/24 at 09:45am, Sourabh Jain wrote: > Hello Michael and Boaquan > > On 01/08/24 12:21, Sourabh Jain wrote: > > Hello Michael, > > > > On 01/08/24 08:04, Michael Ellerman wrote: > > > Sourabh Jain writes: > > > > The following errors are observed when kexec is done with SMT=off on > > > > powerpc. > > > > > > > > [Â 358.458385] Removing IBM Power 842 compression device > > > > [Â 374.795734] kexec_core: Starting new kernel > > > > [Â 374.795748] kexec: Waking offline cpu 1. > > > > [Â 374.875695] crash hp: kexec_trylock() failed, elfcorehdr may > > > > be inaccurate > > > > [Â 374.935833] kexec: Waking offline cpu 2. > > > > [Â 375.015664] crash hp: kexec_trylock() failed, elfcorehdr may > > > > be inaccurate > > > > snip.. > > > > [Â 375.515823] kexec: Waking offline cpu 6. > > > > [Â 375.635667] crash hp: kexec_trylock() failed, elfcorehdr may > > > > be inaccurate > > > > [Â 375.695836] kexec: Waking offline cpu 7. > > > Are they actually errors though? Do they block the actual kexec from > > > happening? Or are they just warnings in dmesg? > > > > The kexec kernel boots fine. > > > > This warning appears regardless of whether the kdump kernel is loaded. > > > > However, when the kdump kernel is loaded, we will not be able to update > > the kdump image (FDT). > > I think this should be fine given that kexec is in progress. > > > > Please let me know your opinion. > > > > > Because the fix looks like it could be racy. > > > > It seems like it is racy, but given that kexec takes the lock first and > > then > > brings the CPU up, which triggers the kdump image, which always fails to > > update the kdump image because it could not take the same lock. > > > > Note: the kexec lock is not released unless kexec boot fails. > > Any comments or suggestions on this fix? Is this a little better? diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 63cf89393c6e..0355ffb712f4 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -504,7 +504,7 @@ int crash_check_hotplug_support(void) crash_hotplug_lock(); /* Obtain lock while reading crash information */ - if (!kexec_trylock()) { + if (!kexec_trylock() && kexec_in_progress) { pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock(); return 0; @@ -539,7 +539,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, crash_hotplug_lock(); /* Obtain lock while changing crash information */ - if (!kexec_trylock()) { + if (!kexec_trylock() && kexec_in_progress) { pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock(); return; ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/1] kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
Hi Andrew, On 08/05/24 at 05:07pm, Petr Tesarik wrote: > From: Petr Tesarik > > Fix the condition to exclude the elfcorehdr segment from the SHA digest > calculation. > > The j iterator is an index into the output sha_regions[] array, not into > the input image->segment[] array. Once it reaches image->elfcorehdr_index, > all subsequent segments are excluded. Besides, if the purgatory segment > precedes the elfcorehdr segment, the elfcorehdr may be wrongly included in > the calculation. Ping. This is a good fix, could you pick this one into your tree? Thanks Baoquan > > Fixes: f7cc804a9fd4 ("kexec: exclude elfcorehdr from the segment digest") > Cc: sta...@kernel.org > Signed-off-by: Petr Tesarik > --- > kernel/kexec_file.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index 3d64290d24c9..3eedb8c226ad 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -752,7 +752,7 @@ static int kexec_calculate_store_digests(struct kimage > *image) > > #ifdef CONFIG_CRASH_HOTPLUG > /* Exclude elfcorehdr segment to allow future changes via > hotplug */ > - if (j == image->elfcorehdr_index) > + if (i == image->elfcorehdr_index) > continue; > #endif > > -- > 2.45.2 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable()
Cc Eric and kexec mailing list. On 08/14/24 at 03:46pm, Kirill A. Shutemov wrote: > The init_transition_pgtable() function sets up transitional page tables. > It ensures that the relocate_kernel() function is present in the > identity mapping at the same location as in the kernel page tables. > relocate_kernel() switches to the identity mapping, and the function > must be present at the same location in the virtual address space before > and after switching page tables. > > init_transition_pgtable() maps a copy of relocate_kernel() in > image->control_code_page at the relocate_kernel() virtual address, but > the original physical address of relocate_kernel() would also work. > > It is safe to use original relocate_kernel() physical address cannot be > overwritten until swap_pages() is called, and the relocate_kernel() > virtual address will not be used by then. I haven't read these codes for long time, wondering if we still need copy relocate_kernel() to image->control_code_page + PAGE_SIZE as you said. > > Map the original relocate_kernel() at the relocate_kernel() virtual > address in the identity mapping. It is preparation to replace the > init_transition_pgtable() implementation with a call to > kernel_ident_mapping_init(). > > Note that while relocate_kernel() switches to the identity mapping, it > does not flush global TLB entries (CR4.PGE is not cleared). This means > that in most cases, the kernel still runs relocate_kernel() from the > original physical address before the change. > > Signed-off-by: Kirill A. Shutemov > --- > arch/x86/kernel/machine_kexec_64.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/machine_kexec_64.c > b/arch/x86/kernel/machine_kexec_64.c > index 9c9ac606893e..645690e81c2d 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, > pgd_t *pgd) > pte_t *pte; > > vaddr = (unsigned long)relocate_kernel; > - paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE); > + paddr = __pa(relocate_kernel); > pgd += pgd_index(vaddr); > if (!pgd_present(*pgd)) { > p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL); > -- > 2.43.0 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] Document/kexec: Generalize crash hotplug description
On 08/13/24 at 10:58am, Sourabh Jain wrote: > Hello Baoquan, > > On 13/08/24 10:34, Baoquan He wrote: > > On 08/12/24 at 09:46am, Sourabh Jain wrote: > > .. > > > --- > > > > > > Changelog: > > > > > > Since v1: > > > https://lore.kernel.org/all/20240805050829.297171-1-sourabhj...@linux.ibm.com/ > > >- Update crash_hotplug sysfs document as suggested by Petr T > > >- Update an error message in crash_handle_hotplug_event and > > > crash_check_hotplug_support function. > > > > > > --- > > .. > > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > > > index 63cf89393c6e..c1048893f4b6 100644 > > > --- a/kernel/crash_core.c > > > +++ b/kernel/crash_core.c > > > @@ -505,7 +505,7 @@ int crash_check_hotplug_support(void) > > > crash_hotplug_lock(); > > > /* Obtain lock while reading crash information */ > > > if (!kexec_trylock()) { > > > - pr_info("kexec_trylock() failed, elfcorehdr may be > > > inaccurate\n"); > > > + pr_info("kexec_trylock() failed, kdump image may be > > > inaccurate\n"); > > Wondering why this need be updated. > > In some architectures, additional kexec segments become obsolete during a > hotplug event, > so simply calling out the `elfcorehdr may be inaccurate` may not be > sufficient. > Therefore, it has been generalized with the kdump image. OK, I forgot the case in ppc, makes sense to me, thx. Acked-by: Baoquan He ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] Document/kexec: Generalize crash hotplug description
On 08/12/24 at 09:46am, Sourabh Jain wrote: .. > --- > > Changelog: > > Since v1: > https://lore.kernel.org/all/20240805050829.297171-1-sourabhj...@linux.ibm.com/ > - Update crash_hotplug sysfs document as suggested by Petr T > - Update an error message in crash_handle_hotplug_event and > crash_check_hotplug_support function. > > --- .. > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index 63cf89393c6e..c1048893f4b6 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -505,7 +505,7 @@ int crash_check_hotplug_support(void) > crash_hotplug_lock(); > /* Obtain lock while reading crash information */ > if (!kexec_trylock()) { > - pr_info("kexec_trylock() failed, elfcorehdr may be > inaccurate\n"); > + pr_info("kexec_trylock() failed, kdump image may be > inaccurate\n"); Wondering why this need be updated. > crash_hotplug_unlock(); > return 0; > } > @@ -520,18 +520,25 @@ int crash_check_hotplug_support(void) > } > > /* > - * To accurately reflect hot un/plug changes of cpu and memory resources > - * (including onling and offlining of those resources), the elfcorehdr > - * (which is passed to the crash kernel via the elfcorehdr= parameter) > - * must be updated with the new list of CPUs and memories. > + * To accurately reflect hot un/plug changes of CPU and Memory resources > + * (including onling and offlining of those resources), the relevant > + * kexec segments must be updated with latest CPU and Memory resources. > * > - * In order to make changes to elfcorehdr, two conditions are needed: > - * First, the segment containing the elfcorehdr must be large enough > - * to permit a growing number of resources; the elfcorehdr memory size > - * is based on NR_CPUS_DEFAULT and CRASH_MAX_MEMORY_RANGES. > - * Second, purgatory must explicitly exclude the elfcorehdr from the > - * list of segments it checks (since the elfcorehdr changes and thus > - * would require an update to purgatory itself to update the digest). > + * Architectures must ensure two things for all segments that need > + * updating during hotplug events: > + * > + * 1. Segments must be large enough to accommodate a growing number of > + *resources. > + * 2. Exclude the segments from SHA verification. > + * > + * For example, on most architectures, the elfcorehdr (which is passed > + * to the crash kernel via the elfcorehdr= parameter) must include the > + * new list of CPUs and memory. To make changes to the elfcorehdr, it > + * should be large enough to permit a growing number of CPU and Memory > + * resources. One can estimate the elfcorehdr memory size based on > + * NR_CPUS_DEFAULT and CRASH_MAX_MEMORY_RANGES. The elfcorehdr is > + * excluded from SHA verification by default if the architecture > + * supports crash hotplug. > */ > static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int > cpu, void *arg) > { > @@ -540,7 +547,7 @@ static void crash_handle_hotplug_event(unsigned int > hp_action, unsigned int cpu, > crash_hotplug_lock(); > /* Obtain lock while changing crash information */ > if (!kexec_trylock()) { > - pr_info("kexec_trylock() failed, elfcorehdr may be > inaccurate\n"); > + pr_info("kexec_trylock() failed, kdump image may be > inaccurate\n"); > crash_hotplug_unlock(); > return; > } > -- > 2.45.2 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -next v2] crash: Fix riscv64 crash memory reserve dead loop
On 08/12/24 at 02:20pm, Jinjie Ruan wrote: > On RISCV64 Qemu machine with 512MB memory, cmdline "crashkernel=500M,high" > will cause system stall as below: > >Zone ranges: > DMA32[mem 0x8000-0x9fff] > Normal empty >Movable zone start for each node >Early memory node ranges > node 0: [mem 0x8000-0x8005] > node 0: [mem 0x8006-0x9fff] >Initmem setup node 0 [mem 0x8000-0x9fff] > (stall here) > > commit 5d99cadf1568 ("crash: fix x86_32 crash memory reserve dead loop > bug") fix this on 32-bit architecture. However, the problem is not > completely solved. If `CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX` on 64-bit > architecture, for example, when system memory is equal to > CRASH_ADDR_LOW_MAX on RISCV64, the following infinite loop will also occur: > > -> reserve_crashkernel_generic() and high is true > -> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail > -> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly >(because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). > > As Catalin suggested, do not remove the ",high" reservation fallback to > ",low" logic which will change arm64's kdump behavior, but fix it by > skipping the above situation similar to commit d2f32f23190b ("crash: fix > x86_32 crash memory reserve dead loop"). > > After this patch, it print: > cannot allocate crashkernel (size:0x1f40) > > Signed-off-by: Jinjie Ruan > Suggested-by: Catalin Marinas > --- > v2: > - Fix it in another way suggested by Catalin. > - Add Suggested-by. > --- > kernel/crash_reserve.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Acked-by: Baoquan He > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index 5387269114f6..aae4a9e998d1 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -427,7 +427,8 @@ void __init reserve_crashkernel_generic(char *cmdline, > if (high && search_end == CRASH_ADDR_HIGH_MAX) { > search_end = CRASH_ADDR_LOW_MAX; > search_base = 0; > - goto retry; > + if (search_end != CRASH_ADDR_HIGH_MAX) > + goto retry; > } > pr_warn("cannot allocate crashkernel (size:0x%llx)\n", > crash_size); > -- > 2.34.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -next] crash: Fix riscv64 crash memory reserve dead loop
On 08/08/24 at 03:56pm, Jinjie Ruan wrote: > > > On 2024/8/7 3:34, Catalin Marinas wrote: > > On Tue, Aug 06, 2024 at 08:10:30PM +0100, Catalin Marinas wrote: > >> On Fri, Aug 02, 2024 at 06:11:01PM +0800, Baoquan He wrote: > >>> On 08/02/24 at 05:01pm, Jinjie Ruan wrote: > >>>> On RISCV64 Qemu machine with 512MB memory, cmdline > >>>> "crashkernel=500M,high" > >>>> will cause system stall as below: > >>>> > >>>> Zone ranges: > >>>> DMA32[mem 0x8000-0x9fff] > >>>> Normal empty > >>>> Movable zone start for each node > >>>> Early memory node ranges > >>>> node 0: [mem 0x8000-0x8005] > >>>> node 0: [mem 0x8006-0x9fff] > >>>> Initmem setup node 0 [mem 0x8000-0x9fff] > >>>> (stall here) > >>>> > >>>> commit 5d99cadf1568 ("crash: fix x86_32 crash memory reserve dead loop > >>>> bug") fix this on 32-bit architecture. However, the problem is not > >>>> completely solved. If `CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX` on > >>>> 64-bit > >>>> architecture, for example, when system memory is equal to > >>>> CRASH_ADDR_LOW_MAX on RISCV64, the following infinite loop will also > >>>> occur: > >>> > >>> Interesting, I didn't expect risc-v defining them like these. > >>> > >>> #define CRASH_ADDR_LOW_MAX dma32_phys_limit > >>> #define CRASH_ADDR_HIGH_MAX memblock_end_of_DRAM() > >> > >> arm64 defines the high limit as PHYS_MASK+1, it doesn't need to be > >> dynamic and x86 does something similar (SZ_64T). Not sure why the > >> generic code and riscv define it like this. > >> > >>>> -> reserve_crashkernel_generic() and high is true > >>>> -> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail > >>>>-> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly > >>>> (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). > >>>> > >>>> Before refactor in commit 9c08a2a139fe ("x86: kdump: use generic > >>>> interface > >>>> to simplify crashkernel reservation code"), x86 do not try to reserve > >>>> crash > >>>> memory at low if it fails to alloc above high 4G. However before refator > >>>> in > >>>> commit fdc268232dbba ("arm64: kdump: use generic interface to simplify > >>>> crashkernel reservation"), arm64 try to reserve crash memory at low if it > >>>> fails above high 4G. For 64-bit systems, this attempt is less beneficial > >>>> than the opposite, remove it to fix this bug and align with native x86 > >>>> implementation. > >>> > >>> And I don't like the idea crashkernel=,high failure will fallback to > >>> attempt in low area, so this looks good to me. > >> > >> Well, I kind of liked this behaviour. One can specify ,high as a > >> preference rather than forcing a range. The arm64 land has different > >> platforms with some constrained memory layouts. Such fallback works well > >> as a default command line option shipped with distros without having to > >> guess the SoC memory layout. > > > > I haven't tried but it's possible that this patch also breaks those > > arm64 platforms with all RAM above 4GB when CRASH_ADDR_LOW_MAX is > > memblock_end_of_DRAM(). Here all memory would be low and in the absence > > of no fallback, it fails to allocate. > > > > So, my strong preference would be to re-instate the current behaviour > > and work around the infinite loop in a different way. > > Hi, baoquan, What's your opinion? > > Only this patch should be re-instate or all the 3 dead loop fix patch? I am not sure which way Catalin suggested to take. Hi Catalin, Could you say more words about your preference so that Jinjie can proceed accordingly? Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] Document/kexec: Generalize crash hotplug description
On 08/05/24 at 10:38am, Sourabh Jain wrote: > Commit 79365026f869 ("crash: add a new kexec flag for hotplug support") > generalizes the crash hotplug support to allow architectures to update > multiple kexec segments on CPU/Memory hotplug and not just elfcorehdr. > Therefore, update the relevant kernel documentation to reflect the same. > > No functional change. > > Cc: Petr Tesarik > Cc: Hari Bathini > Cc: kexec@lists.infradead.org > Cc: linux-ker...@vger.kernel.org > Cc: linuxppc-...@lists.ozlabs.org > Cc: x...@kernel.org > Signed-off-by: Sourabh Jain > --- > > Discussion about the documentation update: > https://lore.kernel.org/all/68d0328d-531a-4a2b-ab26-c97fd8a12...@linux.ibm.com/ > > --- > .../ABI/testing/sysfs-devices-memory | 6 ++-- > .../ABI/testing/sysfs-devices-system-cpu | 6 ++-- > .../admin-guide/mm/memory-hotplug.rst | 5 ++-- > Documentation/core-api/cpu_hotplug.rst| 10 --- > kernel/crash_core.c | 29 --- > 5 files changed, 33 insertions(+), 23 deletions(-) The overall looks good to me, except of concern from Petr. Thanks. > > diff --git a/Documentation/ABI/testing/sysfs-devices-memory > b/Documentation/ABI/testing/sysfs-devices-memory > index a95e0f17c35a..421acc8e2c6b 100644 > --- a/Documentation/ABI/testing/sysfs-devices-memory > +++ b/Documentation/ABI/testing/sysfs-devices-memory > @@ -115,6 +115,6 @@ What: /sys/devices/system/memory/crash_hotplug > Date:Aug 2023 > Contact: Linux kernel mailing list > Description: > - (RO) indicates whether or not the kernel directly supports > - modifying the crash elfcorehdr for memory hot un/plug and/or > - on/offline changes. > + (RO) indicates whether or not the kernel update of kexec > + segments on memory hot un/plug and/or on/offline events, > + avoiding the need to reload kdump kernel. > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu > b/Documentation/ABI/testing/sysfs-devices-system-cpu > index 325873385b71..f4ada1cd2f96 100644 > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu > @@ -703,9 +703,9 @@ What: /sys/devices/system/cpu/crash_hotplug > Date:Aug 2023 > Contact: Linux kernel mailing list > Description: > - (RO) indicates whether or not the kernel directly supports > - modifying the crash elfcorehdr for CPU hot un/plug and/or > - on/offline changes. > + (RO) indicates whether or not the kernel update of kexec > + segments on CPU hot un/plug and/or on/offline events, > + avoiding the need to reload kdump kernel. > > What:/sys/devices/system/cpu/enabled > Date:Nov 2022 > diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst > b/Documentation/admin-guide/mm/memory-hotplug.rst > index 098f14d83e99..cb2c080f400c 100644 > --- a/Documentation/admin-guide/mm/memory-hotplug.rst > +++ b/Documentation/admin-guide/mm/memory-hotplug.rst > @@ -294,8 +294,9 @@ The following files are currently defined: > ``crash_hotplug`` read-only: when changes to the system memory map > occur due to hot un/plug of memory, this file contains > '1' if the kernel updates the kdump capture kernel memory > -map itself (via elfcorehdr), or '0' if userspace must > update > -the kdump capture kernel memory map. > +map itself (via elfcorehdr and other relevant kexec > +segments), or '0' if userspace must update the kdump > +capture kernel memory map. > > Availability depends on the CONFIG_MEMORY_HOTPLUG kernel > configuration option. > diff --git a/Documentation/core-api/cpu_hotplug.rst > b/Documentation/core-api/cpu_hotplug.rst > index dcb0e379e5e8..a21dbf261be7 100644 > --- a/Documentation/core-api/cpu_hotplug.rst > +++ b/Documentation/core-api/cpu_hotplug.rst > @@ -737,8 +737,9 @@ can process the event further. > > When changes to the CPUs in the system occur, the sysfs file > /sys/devices/system/cpu/crash_hotplug contains '1' if the kernel > -updates the kdump capture kernel list of CPUs itself (via elfcorehdr), > -or '0' if userspace must update the kdump capture kernel list of CPUs. > +updates the kdump capture kernel list of CPUs itself (via elfcorehdr and > +other relevant kexec segment), or '0' if userspace must update the kdump > +capture kernel list of CPUs. > > The availability depends on the CONFIG_HOTPLUG_CPU kernel configuration > option. > @@ -750,8 +751,9 @@ file can be used in a udev rule as follows: > SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" > > For a CPU hot un/plug event, if the architecture su
Re: [PATCH 1/1] kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
On 08/05/24 at 05:07pm, Petr Tesarik wrote: > From: Petr Tesarik > > Fix the condition to exclude the elfcorehdr segment from the SHA digest > calculation. > > The j iterator is an index into the output sha_regions[] array, not into > the input image->segment[] array. Once it reaches image->elfcorehdr_index, > all subsequent segments are excluded. Besides, if the purgatory segment > precedes the elfcorehdr segment, the elfcorehdr may be wrongly included in > the calculation. Indeed, good catch. Acked-by: Baoquan He > > Fixes: f7cc804a9fd4 ("kexec: exclude elfcorehdr from the segment digest") > Cc: sta...@kernel.org > Signed-off-by: Petr Tesarik > --- > kernel/kexec_file.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index 3d64290d24c9..3eedb8c226ad 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -752,7 +752,7 @@ static int kexec_calculate_store_digests(struct kimage > *image) > > #ifdef CONFIG_CRASH_HOTPLUG > /* Exclude elfcorehdr segment to allow future changes via > hotplug */ > - if (j == image->elfcorehdr_index) > + if (i == image->elfcorehdr_index) > continue; > #endif > > -- > 2.45.2 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -next] crash: Fix riscv64 crash memory reserve dead loop
On 08/02/24 at 05:01pm, Jinjie Ruan wrote: > On RISCV64 Qemu machine with 512MB memory, cmdline "crashkernel=500M,high" > will cause system stall as below: > >Zone ranges: > DMA32[mem 0x8000-0x9fff] > Normal empty >Movable zone start for each node >Early memory node ranges > node 0: [mem 0x8000-0x8005] > node 0: [mem 0x8006-0x9fff] >Initmem setup node 0 [mem 0x8000-0x9fff] > (stall here) > > commit 5d99cadf1568 ("crash: fix x86_32 crash memory reserve dead loop > bug") fix this on 32-bit architecture. However, the problem is not > completely solved. If `CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX` on 64-bit > architecture, for example, when system memory is equal to > CRASH_ADDR_LOW_MAX on RISCV64, the following infinite loop will also occur: Interesting, I didn't expect risc-v defining them like these. #define CRASH_ADDR_LOW_MAX dma32_phys_limit #define CRASH_ADDR_HIGH_MAX memblock_end_of_DRAM() > > -> reserve_crashkernel_generic() and high is true > -> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail > -> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly >(because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). > > Before refactor in commit 9c08a2a139fe ("x86: kdump: use generic interface > to simplify crashkernel reservation code"), x86 do not try to reserve crash > memory at low if it fails to alloc above high 4G. However before refator in > commit fdc268232dbba ("arm64: kdump: use generic interface to simplify > crashkernel reservation"), arm64 try to reserve crash memory at low if it > fails above high 4G. For 64-bit systems, this attempt is less beneficial > than the opposite, remove it to fix this bug and align with native x86 > implementation. And I don't like the idea crashkernel=,high failure will fallback to attempt in low area, so this looks good to me. > > After this patch, it print: > cannot allocate crashkernel (size:0x1f40) > > Fixes: 39365395046f ("riscv: kdump: use generic interface to simplify > crashkernel reservation") > Signed-off-by: Jinjie Ruan > --- > kernel/crash_reserve.c | 9 - > 1 file changed, 9 deletions(-) Acked-by: Baoquan He > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index 5387269114f6..69e4b8b7b969 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -420,15 +420,6 @@ void __init reserve_crashkernel_generic(char *cmdline, > goto retry; > } > > - /* > - * For crashkernel=size[KMG],high, if the first attempt was > - * for high memory, fall back to low memory. > - */ > - if (high && search_end == CRASH_ADDR_HIGH_MAX) { > - search_end = CRASH_ADDR_LOW_MAX; > - search_base = 0; > - goto retry; > - } > pr_warn("cannot allocate crashkernel (size:0x%llx)\n", > crash_size); > return; > -- > 2.34.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: kexec failure with Xen 4.19-rc4 and 4.20-dev on linux host
On 07/31/24 at 06:34pm, A Kundu wrote: > I am experiencing issues using kexec to load Xen 4.17(debian's apt version), > Xen 4.19-rc4 (compiled from source) and 4.20-dev (compiled from source) on a > debian host. You should CC this to XEN dev list so that XEN dev knows this and may provide help. Not everyone is interested in and knows XEN. > > System information: > $ uname -a > Linux host 6.9.10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.9.10-1 (2024-07-19) > x86_64 GNU/Linux > > $ kexec --version # compiled from source tarball with ./configure --with-xen > kexec-tools 2.0.29 > > Steps to reproduce: > > 1. Set variables: > > XEN_HYPERVISOR="/boot/xen.gz" > XEN_CMD="dom0_mem=6G dom0_max_vcpus=6 dom0_vcpus_pin cpufreq=xen" > > 2. Attempt to load Xen 4.19-rc4: > > # kexec -l "$XEN_HYPERVISOR" --command-line="$XEN_CMD" > Could not find a free area of memory of 0x3b6001 bytes... > elf_exec_build_load_relocatable: ELF exec load failed > > 3. Attempt to load Xen 4.20-dev: > > # kexec -l "$XEN_HYPERVISOR" --command-line="$XEN_CMD" > Could not find a free area of memory of 0x3f8001 bytes... > elf_exec_build_load_relocatable: ELF exec load failed > > 4. Attempt to load Xen 4.17 (from debian rrepositories): > # kexec -l /boot/xen-4.17-amd64.gz --command-line="$XEN_CMD" > Could not find a free area of memory of 0x3b4001 bytes... > elf_exec_build_load_relocatable: ELF exec load failed > > If you need any further information to investigate this problem, > please let me know. > > PS: If I used apt's pacakged version (which might be compiled > --without-xen), > it shows, > > # kexec -l "$XEN_HYPERVISOR" --command-line="$XEN_CMD" > Cannot determine the file type of /boot/xen-4.17-amd64.gz > > # kexec -l "$XEN_HYPERVISOR" --command-line="$XEN_CMD" --type=bzImage > Cannot determine the file type of /boot/xen-4.17-amd64.gz > > > Thank you for your attention to this matter. > > A Kundu > > ___ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec/crash: no crash update when kexec in progress
On 07/31/24 at 08:57pm, Sourabh Jain wrote: > The following errors are observed when kexec is done with SMT=off on > powerpc. > > [ 358.458385] Removing IBM Power 842 compression device > [ 374.795734] kexec_core: Starting new kernel > [ 374.795748] kexec: Waking offline cpu 1. > [ 374.875695] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate > [ 374.935833] kexec: Waking offline cpu 2. > [ 375.015664] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate > snip.. > [ 375.515823] kexec: Waking offline cpu 6. > [ 375.635667] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate > [ 375.695836] kexec: Waking offline cpu 7. > > During kexec, the offline CPUs are brought online, which triggers the Is this a generic action or specific on ppc about the offline CPUs being brought line during kexec? > crash hotplug handler `crash_handle_hotplug_event()` to update the kdump > image. Given that the system is on the kexec path and the kexec lock is > taken, the `crash_handle_hotplug_event()` function fails to take the > same lock to update the kdump image, resulting in the above error > messages. > > To fix this, let's return from `crash_handle_hotplug_event()` if kexec > is in progress. > > The same applies to the `crash_check_hotplug_support()` function. > Return 0 if kexec is in progress. > > Cc: Hari Bathini > Cc: Michael Ellerman > Cc: kexec@lists.infradead.org > Cc: linuxppc-...@ozlabs.org > Cc: linux-ker...@vger.kernel.org > Cc: x...@kernel.org > Reported-by: Sachin P Bappalige > Signed-off-by: Sourabh Jain > --- > kernel/crash_core.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index 63cf89393c6e..d37a16d5c3a1 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -502,6 +502,9 @@ int crash_check_hotplug_support(void) > { > int rc = 0; > > + if (kexec_in_progress) > + return 0; > + > crash_hotplug_lock(); > /* Obtain lock while reading crash information */ > if (!kexec_trylock()) { > @@ -537,6 +540,9 @@ static void crash_handle_hotplug_event(unsigned int > hp_action, unsigned int cpu, > { > struct kimage *image; > > + if (kexec_in_progress) > + return; > + > crash_hotplug_lock(); > /* Obtain lock while changing crash information */ > if (!kexec_trylock()) { > -- > 2.45.2 > > > ___ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5] crash: Fix crash memory reserve exceed system memory bug
On 07/29/24 at 11:24am, Jinjie Ruan wrote: > > > On 2024/7/23 13:17, Baoquan He wrote: > > On 07/23/24 at 10:07am, Jinjie Ruan wrote: > >> On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=4G" is ok > >> as below: > >>crashkernel reserved: 0x2000 - 0x00012000 (4096 MB) > >> > >> It's similar on other architectures, such as ARM32 and RISCV32. > >> > >> The cause is that the crash_size is parsed and printed with "unsigned long > >> long" data type which is 8 bytes but allocated used with "phys_addr_t" > >> which is 4 bytes in memblock_phys_alloc_range(). > >> > >> Fix it by checking if crash_size is greater than system RAM size and > >> return error if so. > >> > >> After this patch, there is no above confusing reserve success info. > >> > >> Signed-off-by: Jinjie Ruan > >> Suggested-by: Baoquan He > >> Suggested-by: Mike Rapoport > > > > > > My Suggested-by can be taken off because I suggested to check the parsed > > value after parse_crashkernel(), Mike's suggestion is better. > > Hi, Can the suggested-by be removed when this version is merged, or a > new version needs to be sent? You can send a new one and CC Andrew. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 0/3] crash: Fix x86_32 memory reserve dead loop bug
Hi Andrew, On 07/19/24 at 05:57pm, Jinjie Ruan wrote: > Fix two bugs for x86_32 crash memory reserve, and prepare to apply generic > crashkernel reservation to 32bit system. Then use generic interface to > simplify crashkernel reservation for ARM32. This is the final version v4 we agree on. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5] crash: Fix crash memory reserve exceed system memory bug
On 07/23/24 at 10:07am, Jinjie Ruan wrote: > On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=4G" is ok > as below: > crashkernel reserved: 0x2000 - 0x00012000 (4096 MB) > > It's similar on other architectures, such as ARM32 and RISCV32. > > The cause is that the crash_size is parsed and printed with "unsigned long > long" data type which is 8 bytes but allocated used with "phys_addr_t" > which is 4 bytes in memblock_phys_alloc_range(). > > Fix it by checking if crash_size is greater than system RAM size and > return error if so. > > After this patch, there is no above confusing reserve success info. > > Signed-off-by: Jinjie Ruan > Suggested-by: Baoquan He > Suggested-by: Mike Rapoport My Suggested-by can be taken off because I suggested to check the parsed value after parse_crashkernel(), Mike's suggestion is better. For this version, Acked-by: Baoquan He > --- > v5: > - Fix it in common parse_crashkernel() instead of per-arch. > - Add suggested-by. > > v4: > - Update the warn info to align with parse_crashkernel_mem(). > - Rebased on the "ARM: Use generic interface to simplify crashkernel > reservation" patch. > - Also fix for riscv32. > - Update the commit message. > > v3: > - Handle the check in reserve_crashkernel() Baoquan suggested. > - Split x86_32 and arm32. > - Add Suggested-by. > - Drop the wrong fix tag. > > v2: > - Also fix for x86_32. > - Update the fix method. > - Peel off the other two patches. > - Update the commit message. > --- > kernel/crash_reserve.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index ad5b3f2c5487..5387269114f6 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -335,6 +335,9 @@ int __init parse_crashkernel(char *cmdline, > if (!*crash_size) > ret = -EINVAL; > > + if (*crash_size >= system_ram) > + ret = -EINVAL; > + > return ret; > } > > -- > 2.34.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Use atomic_try_cmpxchg_acquire() in kexec_trylock()
On 07/22/24 at 10:53am, Uros Bizjak wrote: > On Mon, Jul 22, 2024 at 5:09 AM Baoquan He wrote: > > > > On 07/19/24 at 12:38pm, Uros Bizjak wrote: > > > Use atomic_try_cmpxchg_acquire(*ptr, &old, new) instead of > > > atomic_cmpxchg_acquire(*ptr, old, new) == old in kexec_trylock(). > > > x86 CMPXCHG instruction returns success in ZF flag, so > > > this change saves a compare after cmpxchg. > > > > Seems it can simplify code even though on non-x86 arch, should we > > replace atomic_try_cmpxchg_acquire() with atomic_try_cmpxchg_acquire() > > in all similar places? > > Yes, the change is beneficial also for non-x86 architectures, please > see analysis at thread [1]. I've been looking through the kernel > sources for these places for quite some time, and I believe I have > changed most of the places. The change is relatively straightforward, > and immediately results in a better code. > > [1] https://lore.kernel.org/lkml/871qwgmqws@mpe.ellerman.id.au/ Good to know, thanks for telling. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Use atomic_try_cmpxchg_acquire() in kexec_trylock()
On 07/19/24 at 12:38pm, Uros Bizjak wrote: > Use atomic_try_cmpxchg_acquire(*ptr, &old, new) instead of > atomic_cmpxchg_acquire(*ptr, old, new) == old in kexec_trylock(). > x86 CMPXCHG instruction returns success in ZF flag, so > this change saves a compare after cmpxchg. Seems it can simplify code even though on non-x86 arch, should we replace atomic_try_cmpxchg_acquire() with atomic_try_cmpxchg_acquire() in all similar places? For this one, Acked-by: Baoquan He > > Signed-off-by: Uros Bizjak > Cc: Eric Biederman > --- > kernel/kexec_internal.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h > index 2595defe8c0d..d35d9792402d 100644 > --- a/kernel/kexec_internal.h > +++ b/kernel/kexec_internal.h > @@ -23,7 +23,8 @@ int kimage_is_destination_range(struct kimage *image, > extern atomic_t __kexec_lock; > static inline bool kexec_trylock(void) > { > - return atomic_cmpxchg_acquire(&__kexec_lock, 0, 1) == 0; > + int old = 0; > + return atomic_try_cmpxchg_acquire(&__kexec_lock, &old, 1); > } > static inline void kexec_unlock(void) > { > -- > 2.42.0 > > > ___ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 3/3] ARM: Use generic interface to simplify crashkernel reservation
On 07/19/24 at 05:57pm, Jinjie Ruan wrote: > Currently, x86, arm64, riscv and loongarch has been switched to generic > crashkernel reservation, which is also ready for 32bit system. > So with the help of function parse_crashkernel() and generic > reserve_crashkernel_generic(), arm32 crashkernel reservation can also > be simplified by steps: > > 1) Add a new header file , and define CRASH_ALIGN, >CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX in it; > > 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and >reserve_crashkernel_generic(); > > 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in >arch/arm/Kconfig. > > The old reserve_crashkernel() can be removed. > > Following test cases have been performed as expected on QEMU vexpress-a9 > (1GB system memory): > > 1) crashkernel=4G,high// invalid > 2) crashkernel=1G,high// invalid > 3) crashkernel=1G,high crashkernel=0M,low // invalid > 4) crashkernel=256M,high // invalid > 5) crashkernel=256M,low // invalid > 6) crashkernel=256M crashkernel=256M,high // high is ignored, ok > 7) crashkernel=256M crashkernel=256M,low // low is ignored, ok > 8) crashkernel=256M,high crashkernel=256M,low // invalid > 9) crashkernel=256M,high crashkernel=4G,low // invalid > 10) crashkernel=256M // ok > 11) crashkernel=512M // ok > 12) crashkernel=256M@0x8800 // ok > 13) crashkernel=256M@0x7800 // ok > 14) crashkernel=512M@0x7800 // ok > > Signed-off-by: Jinjie Ruan > --- > v4: > - Remove the Tested-by as suggested. > v3: > - Update the commit message. > --- > arch/arm/Kconfig | 3 ++ > arch/arm/include/asm/crash_reserve.h | 24 +++ > arch/arm/kernel/setup.c | 63 ---- > 3 files changed, 36 insertions(+), 54 deletions(-) > create mode 100644 arch/arm/include/asm/crash_reserve.h LGTM, Acked-by: Baoquan He By the way, you may need respost the parsed crashkernel value limitation checking patch for arm32 and i386. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 2/3] crash: Fix x86_32 crash memory reserve dead loop
On 07/19/24 at 05:57pm, Jinjie Ruan wrote: > On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=512M" will > also cause system stall as below: > > ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b] > ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7] > ACPI: Reserving FACS table memory at [mem 0x3ffe-0x3ffe003f] > ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb] > ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3] > ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b] > 143MB HIGHMEM available. > 879MB LOWMEM available. > mapped low ram: 0 - 36ffe000 > low ram: 0 - 36ffe000 > (stall here) > > The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX > on x86_32, the first "low" crash kernel memory reservation for 512M fails, > then it go into the "retry" loop and never came out as below (consider > CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX = 512M): > > -> reserve_crashkernel_generic() and high is false >-> alloc at [0, 0x2000] fail > -> alloc at [0x2000, 0x2000] fail and repeatedly > (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). > > Fix it by skipping meaningless calls of memblock_phys_alloc_range() with > `start = end` > > After this patch, the retry dead loop is avoided and print below info: > cannot allocate crashkernel (size:0x2000) > > And apply generic crashkernel reservation to 32bit system will be ready. ~~~ applying Other than this nit, it looks good to me. Acked-by: Baoquan He > > Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify > crashkernel reservation code") > Signed-off-by: Jinjie Ruan > Suggested-by: Baoquan He > --- > v4: > - Signed-off-by -> Suggested-by as suggested. > - Remove the Tested-by as suggested. > - Update the commit subject > v3: > - Fix it as Baoquan suggested. > - Update the commit message. > --- > kernel/crash_reserve.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index c5213f123e19..dacc268429e2 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline, > search_end = CRASH_ADDR_HIGH_MAX; > search_base = CRASH_ADDR_LOW_MAX; > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > - goto retry; > + if (search_base != search_end) > + goto retry; > } > > /* > -- > 2.34.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 2/3] crash: Fix x86_32 crash memory reserve dead loop bug at high
On 07/18/24 at 08:10pm, Jinjie Ruan wrote: > > > On 2024/7/18 19:14, Baoquan He wrote: > > On 07/18/24 at 11:54am, Jinjie Ruan wrote: > > > > I don't fully catch the subject, what does the 'dead loop bug at high' > > mean? > > It means alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] repeatedly > which corresponds to the crashkernel parameter of the "high". That may mislead people to think it's a crashkernel=,high setting and the corresponding issue. Maybe "crash: Fix x86_32 crashkernel reservation dead loop" is good enough. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 2/3] crash: Fix x86_32 crash memory reserve dead loop bug at high
On 07/18/24 at 11:54am, Jinjie Ruan wrote: I don't fully catch the subject, what does the 'dead loop bug at high' mean? > On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=512M" will > also cause system stall as below: > > ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b] > ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7] > ACPI: Reserving FACS table memory at [mem 0x3ffe-0x3ffe003f] > ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb] > ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3] > ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b] > 143MB HIGHMEM available. > 879MB LOWMEM available. > mapped low ram: 0 - 36ffe000 > low ram: 0 - 36ffe000 > (stall here) > > The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX > on x86_32, the first "low" crash kernel memory reservation for 512M fails, > then it go into the "retry" loop and never came out as below (consider > CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX = 512M): > > -> reserve_crashkernel_generic() and high is false >-> alloc at [0, 0x2000] fail > -> alloc at [0x2000, 0x2000] fail and repeatedly > (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). > > Fix it by skipping meaningless calls of memblock_phys_alloc_range() with > `start = end` > > After this patch, the retry dead loop is avoided and print below info: > cannot allocate crashkernel (size:0x2000) > > And apply generic crashkernel reservation to 32bit system will be ready. > > Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify > crashkernel reservation code") > Signed-off-by: Jinjie Ruan > Signed-off-by: Baoquan He > Tested-by: Jinjie Ruan Also the tag issues, please update. Other than above concerns, the patch looks good to me. > --- > v3: > - Fix it as Baoquan suggested. > - Update the commit message. > --- > kernel/crash_reserve.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index c5213f123e19..dacc268429e2 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline, > search_end = CRASH_ADDR_HIGH_MAX; > search_base = CRASH_ADDR_LOW_MAX; > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > - goto retry; > + if (search_base != search_end) > + goto retry; > } > > /* > -- > 2.34.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/3] crash: Fix x86_32 crash memory reserve dead loop bug
On 07/18/24 at 11:54am, Jinjie Ruan wrote: > On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=1G,high" > will cause system stall as below: > > ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b] > ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7] > ACPI: Reserving FACS table memory at [mem 0x3ffe-0x3ffe003f] > ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb] > ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3] > ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b] > 143MB HIGHMEM available. > 879MB LOWMEM available. > mapped low ram: 0 - 36ffe000 > low ram: 0 - 36ffe000 >(stall here) > > The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX > on x86_32, the first high crash kernel memory reservation will fail, then > go into the "retry" loop and never came out as below. > > -> reserve_crashkernel_generic() and high is true > -> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail > -> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly >(because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). > > Fix it by prevent crashkernel=,high from being parsed successfully on 32bit > system with a architecture-defined macro. > > After this patch, the 'crashkernel=,high' for 32bit system can't succeed, > and it has no chance to call reserve_crashkernel_generic(), therefore this > issue on x86_32 is solved. > > Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify > crashkernel reservation code") > Signed-off-by: Jinjie Ruan > Signed-off-by: Baoquan He Just adding my Suggested-by is fine. If multiple people cooperate on one patch, the Co-developed-by tag is needed. As a maintainer, I prefer to have the Suggested-by tag in this case. > Tested-by: Jinjie Ruan You can't add Tested-by tag for your own patch. When you post patch, testing it is your obligation. Other than these tag adding concerns, this patch looks good to me. You can post v4 to update and add my: Acked-by: Baoquan He > --- > v3: > - Fix it as Baoquan suggested. > - Update the commit message. > v2: > - Peel off the other two patches. > - Update the commit message and fix tag. > --- > arch/arm64/include/asm/crash_reserve.h | 2 ++ > arch/riscv/include/asm/crash_reserve.h | 2 ++ > arch/x86/include/asm/crash_reserve.h | 1 + > kernel/crash_reserve.c | 2 +- > 4 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/crash_reserve.h > b/arch/arm64/include/asm/crash_reserve.h > index 4afe027a4e7b..bf362c1a612f 100644 > --- a/arch/arm64/include/asm/crash_reserve.h > +++ b/arch/arm64/include/asm/crash_reserve.h > @@ -7,4 +7,6 @@ > > #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > #define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) > + > +#define HAVE_ARCH_CRASHKERNEL_RESERVATION_HIGH > #endif > diff --git a/arch/riscv/include/asm/crash_reserve.h > b/arch/riscv/include/asm/crash_reserve.h > index 013962e63587..8d7a8fc1d459 100644 > --- a/arch/riscv/include/asm/crash_reserve.h > +++ b/arch/riscv/include/asm/crash_reserve.h > @@ -7,5 +7,7 @@ > #define CRASH_ADDR_LOW_MAX dma32_phys_limit > #define CRASH_ADDR_HIGH_MAX memblock_end_of_DRAM() > > +#define HAVE_ARCH_CRASHKERNEL_RESERVATION_HIGH > + > extern phys_addr_t memblock_end_of_DRAM(void); > #endif > diff --git a/arch/x86/include/asm/crash_reserve.h > b/arch/x86/include/asm/crash_reserve.h > index 7835b2cdff04..24c2327f9a16 100644 > --- a/arch/x86/include/asm/crash_reserve.h > +++ b/arch/x86/include/asm/crash_reserve.h > @@ -26,6 +26,7 @@ extern unsigned long swiotlb_size_or_default(void); > #else > # define CRASH_ADDR_LOW_MAX SZ_4G > # define CRASH_ADDR_HIGH_MAXSZ_64T > +#define HAVE_ARCH_CRASHKERNEL_RESERVATION_HIGH > #endif > > # define DEFAULT_CRASH_KERNEL_LOW_SIZE crash_low_size_default() > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index 5b2722a93a48..c5213f123e19 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -306,7 +306,7 @@ int __init parse_crashkernel(char *cmdline, > /* crashkernel=X[@offset] */ > ret = __parse_crashkernel(cmdline, system_ram, crash_size, > crash_base, NULL); > -#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > +#ifdef HAVE_ARCH_CRASHKERNEL_RESERVATION_HIGH > /* >* If non-NULL 'high' passed in and no normal crashkernel >* setting detected, try parsing crashkernel=,high|low. > -- > 2.34.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -next] crash: fix x86_32 memory reserve dead loop retry bug at "high"
On 07/17/24 at 12:49pm, Andrew Morton wrote: > On Wed, 17 Jul 2024 21:38:41 +0800 Baoquan He wrote: > > > 1) revert commit 8f9dade5906a in Andrew's tree; Thanks, Andrew. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -next] crash: fix x86_32 memory reserve dead loop retry bug at "high"
On 07/17/24 at 03:09pm, Jinjie Ruan wrote: > Similar to commit 8f9dade5906a ("crash: fix x86_32 memory reserve dead loop > retry bug") and in the symmetry case, on x86_32 Qemu machine with > 1GB memory, the cmdline "crashkernel=512M" will also cause system stall > as below: > > ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b] > ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7] > ACPI: Reserving FACS table memory at [mem 0x3ffe-0x3ffe003f] > ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb] > ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3] > ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b] > 143MB HIGHMEM available. > 879MB LOWMEM available. > mapped low ram: 0 - 36ffe000 > low ram: 0 - 36ffe000 > (stall here) > > The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX > on x86_32, the first "low" crash kernel memory reservation for 512M fails, > then it go into the "retry" loop and never came out as below (consider > CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX = 512M): > > -> reserve_crashkernel_generic() and high is false >-> alloc at [0, 0x2000] fail > -> alloc at [0x2000, 0x2000] fail and repeatedly > (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). > > Fix it by also changing the another out check condition, the fixed base > situation has no problem because it warn out if it fail to alloc. > > After this patch, it prints: > cannot allocate crashkernel (size:0x2000) > > Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify > crashkernel reservation code") > Signed-off-by: Jinjie Ruan > --- > kernel/crash_reserve.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index 03e455738e75..36c13cf942f4 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -409,7 +409,7 @@ void __init reserve_crashkernel_generic(char *cmdline, >* low memory, fall back to high memory, the minimum required >* low memory will be reserved later. >*/ > - if (!high && search_end == CRASH_ADDR_LOW_MAX) { > + if (!high && !search_base) { Hmm, this may not be good. We can't guarantee that CRASH_ADDR_LOW_MAX must not be 0. I still suggest you testing below draft patch to see if it works well. And we should revert the patch in Andrew's tree since it's not good. Posting like these mess will confuse people and add difficulty when backporting. You haven't responded to my earlier request to test those two draft patches. When you tested below code and it's good, you can post this as a formal patch. So my suggestion to the whole work is: 1) revert commit 8f9dade5906a in Andrew's tree; 2) post two patches I suggested to prevert crashkernel=,high for 32bit system, and fix the issue you found; 3) post patchset to make arm32 use generic crashkernel reservation. diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c index 5b2722a93a48..ac087ba442cd 100644 --- a/kernel/crash_reserve.c +++ b/kernel/crash_reserve.c @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline, search_end = CRASH_ADDR_HIGH_MAX; search_base = CRASH_ADDR_LOW_MAX; crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; - goto retry; + if (search_base != search_end) + goto retry; } /* ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] crash: Fix x86_32 and arm32 memory reserve bug
On 07/16/24 at 08:46pm, Jinjie Ruan wrote: > > > On 2024/7/16 14:22, Baoquan He wrote: > > On 07/16/24 at 11:44am, Jinjie Ruan wrote: > >> > >> > >> On 2024/7/15 22:48, Baoquan He wrote: > >>> On 07/13/24 at 09:48am, Jinjie Ruan wrote: > >>>> On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=4G" is > >>>> ok > >>>> as below: > >>>> crashkernel reserved: 0x2000 - 0x00012000 (4096 MB) > >>>> > >>>> And on Qemu vexpress-a9 with 1GB memory, the crash kernel > >>>> "crashkernel=4G" > >>>> is also ok as below: > >>>> Reserving 4096MB of memory at 2432MB for crashkernel (System RAM: > >>>> 1024MB) > >>>> > >>>> The cause is that the crash_size is parsed and printed with "unsigned > >>>> long > >>>> long" data type which is 8 bytes but allocated used with "phys_addr_t" > >>>> which is 4 bytes in memblock_phys_alloc_range(). > >>>> > >>>> Fix it by limiting the "crash_size" to phys_addr_t and bypass the invalid > >>>> input size. > >>> > >>> I am not sure if this is a good idea. Shouldn't we handle this in > >>> arch_reserve_crashkernel() to check the system RAM size? > >>> > >>> With this patch, if you specify crashkernel=4352M (namely 4G+256M) in > >>> kernel cmdline, then you will reserve 256M crashkernel in system, don't > >>> you think that is confusing? > >> > >> You are right! > >> > >> In the case you mentioned, it can still allocate 256M successfully, but > >> the log shows 4352M successfully allocated, which is not taken into > >> account by this patch. > >> > >> And handle this in arch_reserve_crashkernel() is a good idea, which will > >> bypass all these corner case, I'll do it next version. > >> > >>> > >>> By the way, I am considering changing code to apply generic crashkernel > >>> reservation to 32bit system. Maybe below draft code can prevent > >>> crashkernel=,high from being parsed successfully on 32bit system. > >>> > >>> What do you think? > >> > >> I agree with you, I've thought about passing in a parameter in the > >> generic interface whether high is supported or not to implement it, > >> which is so incompatible. An architecture-defined macro to filter out > >> parsing of "high" fundamentally avoid using the generic interface to > >> allocate memory in "high" for the architecture that does not support > >> "high". The below code do prevent "crashkernel=,high" from being parsed > >> successfully on 32bit system. > >> > >> But if it is to support 32 bit system to use generic crash memory > >> reserve interface, reserve_crashkernel_generic() needs more modification > >> , as it may try to allocate memory at high. > > > > You are right. Below change may be able to fix that. > > > > And I have been thinking if one case need be taken off in which the > > first attempt was for high memory, then fall back to low memory. Surely, > > this is not related to the 32bit crashkernel reservation. > > It seems that ARM64 has the possibility before the refactoring. However, > x86 supports only the "low" -> "high" retry but not the "high" -> "low" > retry before the refactoring. In my opinion, "low" -> "high" retry is > more usefull, but I'm not sure if we should get rid of the other way. Thanks for the valuable input, I will think more about this. > > > > > By the way, do you figure out these issues from code reading and qemu > > testing, or there's need for deploying kdump on 32bit system, e.g i386 > > or arm32? Just curious. > > I found these problems during testing on QEMU when trying to support > this generic crash memory retention on ARM32, and I further found that > x86_32 also has the same problem by code reading and uncommon > configuration test. I see, thanks for the explanation. Could you help test the earlier patch and below draft patch, to see if it fixes the issues you saw up to now? If they works, the generic crashkernel reservation can be taken on 32bit system like arm32 based on them. > > > > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > > index 5b2722a93a48..ac087ba442cd 100644 > > --- a/kernel/crash_reserve.c > > +++ b/kernel/crash_reserve.c > > @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline, > > search_end = CRASH_ADDR_HIGH_MAX; > > search_base = CRASH_ADDR_LOW_MAX; > > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > > - goto retry; > > + if (search_base != search_end) > > + goto retry; > > } > > > > /* > > > > > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] crash: Fix x86_32 and arm32 memory reserve bug
On 07/16/24 at 11:44am, Jinjie Ruan wrote: > > > On 2024/7/15 22:48, Baoquan He wrote: > > On 07/13/24 at 09:48am, Jinjie Ruan wrote: > >> On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=4G" is ok > >> as below: > >>crashkernel reserved: 0x2000 - 0x00012000 (4096 MB) > >> > >> And on Qemu vexpress-a9 with 1GB memory, the crash kernel "crashkernel=4G" > >> is also ok as below: > >>Reserving 4096MB of memory at 2432MB for crashkernel (System RAM: > >> 1024MB) > >> > >> The cause is that the crash_size is parsed and printed with "unsigned long > >> long" data type which is 8 bytes but allocated used with "phys_addr_t" > >> which is 4 bytes in memblock_phys_alloc_range(). > >> > >> Fix it by limiting the "crash_size" to phys_addr_t and bypass the invalid > >> input size. > > > > I am not sure if this is a good idea. Shouldn't we handle this in > > arch_reserve_crashkernel() to check the system RAM size? > > > > With this patch, if you specify crashkernel=4352M (namely 4G+256M) in > > kernel cmdline, then you will reserve 256M crashkernel in system, don't > > you think that is confusing? > > You are right! > > In the case you mentioned, it can still allocate 256M successfully, but > the log shows 4352M successfully allocated, which is not taken into > account by this patch. > > And handle this in arch_reserve_crashkernel() is a good idea, which will > bypass all these corner case, I'll do it next version. > > > > > By the way, I am considering changing code to apply generic crashkernel > > reservation to 32bit system. Maybe below draft code can prevent > > crashkernel=,high from being parsed successfully on 32bit system. > > > > What do you think? > > I agree with you, I've thought about passing in a parameter in the > generic interface whether high is supported or not to implement it, > which is so incompatible. An architecture-defined macro to filter out > parsing of "high" fundamentally avoid using the generic interface to > allocate memory in "high" for the architecture that does not support > "high". The below code do prevent "crashkernel=,high" from being parsed > successfully on 32bit system. > > But if it is to support 32 bit system to use generic crash memory > reserve interface, reserve_crashkernel_generic() needs more modification > , as it may try to allocate memory at high. You are right. Below change may be able to fix that. And I have been thinking if one case need be taken off in which the first attempt was for high memory, then fall back to low memory. Surely, this is not related to the 32bit crashkernel reservation. By the way, do you figure out these issues from code reading and qemu testing, or there's need for deploying kdump on 32bit system, e.g i386 or arm32? Just curious. diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c index 5b2722a93a48..ac087ba442cd 100644 --- a/kernel/crash_reserve.c +++ b/kernel/crash_reserve.c @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline, search_end = CRASH_ADDR_HIGH_MAX; search_base = CRASH_ADDR_LOW_MAX; crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; - goto retry; + if (search_base != search_end) + goto retry; } /* ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] crash: Fix x86_32 and arm32 memory reserve bug
On 07/13/24 at 09:48am, Jinjie Ruan wrote: > On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=4G" is ok > as below: > crashkernel reserved: 0x2000 - 0x00012000 (4096 MB) > > And on Qemu vexpress-a9 with 1GB memory, the crash kernel "crashkernel=4G" > is also ok as below: > Reserving 4096MB of memory at 2432MB for crashkernel (System RAM: > 1024MB) > > The cause is that the crash_size is parsed and printed with "unsigned long > long" data type which is 8 bytes but allocated used with "phys_addr_t" > which is 4 bytes in memblock_phys_alloc_range(). > > Fix it by limiting the "crash_size" to phys_addr_t and bypass the invalid > input size. I am not sure if this is a good idea. Shouldn't we handle this in arch_reserve_crashkernel() to check the system RAM size? With this patch, if you specify crashkernel=4352M (namely 4G+256M) in kernel cmdline, then you will reserve 256M crashkernel in system, don't you think that is confusing? By the way, I am considering changing code to apply generic crashkernel reservation to 32bit system. Maybe below draft code can prevent crashkernel=,high from being parsed successfully on 32bit system. What do you think? diff --git a/arch/arm64/include/asm/crash_reserve.h b/arch/arm64/include/asm/crash_reserve.h index 4afe027a4e7b..bf362c1a612f 100644 --- a/arch/arm64/include/asm/crash_reserve.h +++ b/arch/arm64/include/asm/crash_reserve.h @@ -7,4 +7,6 @@ #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit #define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) + +#define HAVE_ARCH_CRASHKERNEL_RESERVATION_HIGH #endif diff --git a/arch/riscv/include/asm/crash_reserve.h b/arch/riscv/include/asm/crash_reserve.h index 013962e63587..8d7a8fc1d459 100644 --- a/arch/riscv/include/asm/crash_reserve.h +++ b/arch/riscv/include/asm/crash_reserve.h @@ -7,5 +7,7 @@ #define CRASH_ADDR_LOW_MAX dma32_phys_limit #define CRASH_ADDR_HIGH_MAXmemblock_end_of_DRAM() +#define HAVE_ARCH_CRASHKERNEL_RESERVATION_HIGH + extern phys_addr_t memblock_end_of_DRAM(void); #endif diff --git a/arch/x86/include/asm/crash_reserve.h b/arch/x86/include/asm/crash_reserve.h index 7835b2cdff04..24c2327f9a16 100644 --- a/arch/x86/include/asm/crash_reserve.h +++ b/arch/x86/include/asm/crash_reserve.h @@ -26,6 +26,7 @@ extern unsigned long swiotlb_size_or_default(void); #else # define CRASH_ADDR_LOW_MAX SZ_4G # define CRASH_ADDR_HIGH_MAXSZ_64T +#define HAVE_ARCH_CRASHKERNEL_RESERVATION_HIGH #endif # define DEFAULT_CRASH_KERNEL_LOW_SIZE crash_low_size_default() diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c index 5b2722a93a48..c5213f123e19 100644 --- a/kernel/crash_reserve.c +++ b/kernel/crash_reserve.c @@ -306,7 +306,7 @@ int __init parse_crashkernel(char *cmdline, /* crashkernel=X[@offset] */ ret = __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, NULL); -#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION +#ifdef HAVE_ARCH_CRASHKERNEL_RESERVATION_HIGH /* * If non-NULL 'high' passed in and no normal crashkernel * setting detected, try parsing crashkernel=,high|low. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] sysfs/cpu: Make crash_hotplug attribute world-readable
On 07/11/24 at 12:34pm, Petr Tesarik wrote: > From: Petr Tesarik > > There is no reason to restrict access to this attribute, as it merely > reports whether crash elfcorehdr is automatically updated on CPU hot > plug/unplug and/or online/offline events. > > Note that since commit 79365026f8694 ("crash: add a new kexec flag for > hotplug support"), this maps to the same flag which is world-accessible > through /sys/devices/system/memory/crash_hotplug. > > Signed-off-by: Petr Tesarik > --- > drivers/base/cpu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c > index c61ecb0c2ae2..73d69791d0d3 100644 > --- a/drivers/base/cpu.c > +++ b/drivers/base/cpu.c > @@ -308,7 +308,7 @@ static ssize_t crash_hotplug_show(struct device *dev, > { > return sysfs_emit(buf, "%d\n", crash_check_hotplug_support()); > } > -static DEVICE_ATTR_ADMIN_RO(crash_hotplug); > +static DEVICE_ATTR_RO(crash_hotplug); Agree. I guess this was copied from codes related to crash_notes/crash_notes_size. While crash_notes/crash_notes_size are in /sys/devices/system/cpu/cpuX/ which is next level. Acked-by: Baoquan He ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/3] ARM: Use generic interface to simplify crashkernel reservation
On 07/10/24 at 09:52am, Jinjie Ruan wrote: > > > On 2024/7/9 22:06, Baoquan He wrote: > > On 07/09/24 at 07:06pm, Jinjie Ruan wrote: > >> > >> > >> On 2024/7/9 18:39, Baoquan He wrote: > >>> On 07/09/24 at 05:50pm, Jinjie Ruan wrote: > >>>> > >>>> > >>>> On 2024/7/9 17:29, Baoquan He wrote: > >>>>> On 07/08/24 at 09:33pm, Jinjie Ruan wrote: > >>>>>> Currently, x86, arm64, riscv and loongarch has been switched to generic > >>>>>> crashkernel reservation. Also use generic interface to simplify > >>>>>> crashkernel > >>>>>> reservation for arm32, and fix two bugs by the way. > >>>>> > >>>>> I am not sure if this is a good idea. I added the generic reservation > >>>>> itnerfaces for ARCH which support crashkernel=,high|low and normal > >>>>> crashkernel reservation, with this, the code can be simplified a lot. > >>>>> However, arm32 doesn't support crashkernel=,high, I am not sure if it's > >>>>> worth taking the change, most importantly, if it will cause > >>>>> misunderstanding or misoperation. > >>>> > >>>> Yes, arm32 doesn't support crashkernel=,high. > >>>> > >>>> However, a little enhancement to the generic code (please see the first > >>>> patch), the generic reservation interfaces can also be applicable to > >>>> architectures that do not support "high" such as arm32, and it can also > >>>> simplify the code (please see the third patch). > >>> > >>> Yeah, I can see the code is simplified. When you specified > >>> 'crashkernel=xM,high', do you think what should be warn out? Because > >>> it's an unsupported syntax on arm32, we should do something to print out > >>> appropriate message. > >> > >> Yes, you are right! In this patch it will print "crashkernel high memory > >> reservation failed." message and out for arm32 if you specify > > > > That message may mislead people to believe crashkernel=,high is > > supported but reservation is failed, then a bug need be filed for this? > > We may expect a message telling this syntax is not supported on this > > ARCH. > > "CRASH_ADDR_LOW_MAX >= CRASH_ADDR_HIGH_MAX" indicate that the arm32 does > not support "crashkernel=,high", I wonder if this is generic for similar Imagine you are a testing engineer or a distros user, how do you know if "CRASH_ADDR_LOW_MAX >= CRASH_ADDR_HIGH_MAX" when you test 'crashkernel=,high' and see the failure message? > architecture. If so, the first patch can print such as > "crashkernel=,high is not supported on this ARCH" message. Please consider conprehensively if this is doable, you can paste draft code here to prove it. > > > > >> 'crashkernel=xM,high because "CRASH_ADDR_LOW_MAX" and > >> "CRASH_ADDR_HIGH_MAX" is identical for arm32. And it should also warn > >> out for other similar architecture. > >> > >> > >>> > >>> > >> > > > > > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/3] ARM: Use generic interface to simplify crashkernel reservation
On 07/09/24 at 07:06pm, Jinjie Ruan wrote: > > > On 2024/7/9 18:39, Baoquan He wrote: > > On 07/09/24 at 05:50pm, Jinjie Ruan wrote: > >> > >> > >> On 2024/7/9 17:29, Baoquan He wrote: > >>> On 07/08/24 at 09:33pm, Jinjie Ruan wrote: > >>>> Currently, x86, arm64, riscv and loongarch has been switched to generic > >>>> crashkernel reservation. Also use generic interface to simplify > >>>> crashkernel > >>>> reservation for arm32, and fix two bugs by the way. > >>> > >>> I am not sure if this is a good idea. I added the generic reservation > >>> itnerfaces for ARCH which support crashkernel=,high|low and normal > >>> crashkernel reservation, with this, the code can be simplified a lot. > >>> However, arm32 doesn't support crashkernel=,high, I am not sure if it's > >>> worth taking the change, most importantly, if it will cause > >>> misunderstanding or misoperation. > >> > >> Yes, arm32 doesn't support crashkernel=,high. > >> > >> However, a little enhancement to the generic code (please see the first > >> patch), the generic reservation interfaces can also be applicable to > >> architectures that do not support "high" such as arm32, and it can also > >> simplify the code (please see the third patch). > > > > Yeah, I can see the code is simplified. When you specified > > 'crashkernel=xM,high', do you think what should be warn out? Because > > it's an unsupported syntax on arm32, we should do something to print out > > appropriate message. > > Yes, you are right! In this patch it will print "crashkernel high memory > reservation failed." message and out for arm32 if you specify That message may mislead people to believe crashkernel=,high is supported but reservation is failed, then a bug need be filed for this? We may expect a message telling this syntax is not supported on this ARCH. > 'crashkernel=xM,high because "CRASH_ADDR_LOW_MAX" and > "CRASH_ADDR_HIGH_MAX" is identical for arm32. And it should also warn > out for other similar architecture. > > > > > > > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/3] ARM: Use generic interface to simplify crashkernel reservation
On 07/09/24 at 05:50pm, Jinjie Ruan wrote: > > > On 2024/7/9 17:29, Baoquan He wrote: > > On 07/08/24 at 09:33pm, Jinjie Ruan wrote: > >> Currently, x86, arm64, riscv and loongarch has been switched to generic > >> crashkernel reservation. Also use generic interface to simplify crashkernel > >> reservation for arm32, and fix two bugs by the way. > > > > I am not sure if this is a good idea. I added the generic reservation > > itnerfaces for ARCH which support crashkernel=,high|low and normal > > crashkernel reservation, with this, the code can be simplified a lot. > > However, arm32 doesn't support crashkernel=,high, I am not sure if it's > > worth taking the change, most importantly, if it will cause > > misunderstanding or misoperation. > > Yes, arm32 doesn't support crashkernel=,high. > > However, a little enhancement to the generic code (please see the first > patch), the generic reservation interfaces can also be applicable to > architectures that do not support "high" such as arm32, and it can also > simplify the code (please see the third patch). Yeah, I can see the code is simplified. When you specified 'crashkernel=xM,high', do you think what should be warn out? Because it's an unsupported syntax on arm32, we should do something to print out appropriate message. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/3] ARM: Use generic interface to simplify crashkernel reservation
On 07/08/24 at 09:33pm, Jinjie Ruan wrote: > Currently, x86, arm64, riscv and loongarch has been switched to generic > crashkernel reservation. Also use generic interface to simplify crashkernel > reservation for arm32, and fix two bugs by the way. I am not sure if this is a good idea. I added the generic reservation itnerfaces for ARCH which support crashkernel=,high|low and normal crashkernel reservation, with this, the code can be simplified a lot. However, arm32 doesn't support crashkernel=,high, I am not sure if it's worth taking the change, most importantly, if it will cause misunderstanding or misoperation. Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/3] kexec_load: Use new kexec flag for hotplug support
On 07/07/24 at 08:54pm, Sourabh Jain wrote: > Kernel commit 79365026f869 (crash: add a new kexec flag for hotplug > support) has introduced a new kexec flag to generalize hotplug support. > The newly introduced kexec flags for hotplug allow architectures to > exclude all the required kexec segments from SHA calculation so that > the kernel can update them on hotplug events. This was not possible > earlier with the KEXEC_UPDATE_ELFCOREHDR kexec flags since it was added > only for the elfcorehdr segment. > > To enable architectures to control the list of kexec segments to exclude > when hotplug support is enabled, add a new architecture-specific > function named arch_do_exclude_segment. During the SHA calculation, this > function gets called to let the architecture decide whether a specific > kexec segment should be considered for SHA calculation or not. > > Note: To avoid breaking backward compatibility, the new kexec flag > KEXEC_CRASH_HOTPLUG_SUPPORT is not used for x86 for now. > > Cc: Aditya Gupta > Cc: Baoquan He > Cc: Coiby Xu > Cc: Hari Bathini > Cc: Mahesh Salgaonkar > Cc: Simon Horman > Signed-off-by: Sourabh Jain > --- > kexec/arch/arm/kexec-arm.c | 5 > kexec/arch/arm64/kexec-arm64.c | 5 > kexec/arch/cris/kexec-cris.c | 4 +++ > kexec/arch/hppa/kexec-hppa.c | 5 > kexec/arch/i386/kexec-x86.c| 8 ++ > kexec/arch/ia64/kexec-ia64.c | 4 +++ > kexec/arch/loongarch/kexec-loongarch.c | 5 > kexec/arch/m68k/kexec-m68k.c | 5 > kexec/arch/mips/kexec-mips.c | 4 +++ > kexec/arch/ppc/kexec-ppc.c | 4 +++ > kexec/arch/ppc64/kexec-ppc64.c | 5 > kexec/arch/s390/kexec-s390.c | 5 > kexec/arch/sh/kexec-sh.c | 5 > kexec/arch/x86_64/kexec-x86_64.c | 8 ++ > kexec/kexec-syscall.h | 1 + > kexec/kexec.c | 40 ++ > kexec/kexec.h | 2 ++ > 17 files changed, 109 insertions(+), 6 deletions(-) LGTM, Acked-by: Baoquan He ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/3] kexec_load: Use new kexec flag for hotplug support
On 07/08/24 at 01:25pm, Sourabh Jain wrote: > Hello Baoquan, > > On 08/07/24 07:09, Baoquan He wrote: > > Hi Sourabh, > > > > On 07/07/24 at 08:54pm, Sourabh Jain wrote: > > > Kernel commit 79365026f869 (crash: add a new kexec flag for hotplug > > > support) has introduced a new kexec flag to generalize hotplug support. > > > The newly introduced kexec flags for hotplug allow architectures to > > > exclude all the required kexec segments from SHA calculation so that > > > the kernel can update them on hotplug events. This was not possible > > > earlier with the KEXEC_UPDATE_ELFCOREHDR kexec flags since it was added > > > only for the elfcorehdr segment. > > > > > > To enable architectures to control the list of kexec segments to exclude > > > when hotplug support is enabled, add a new architecture-specific > > > function named arch_do_exclude_segment. During the SHA calculation, this > > > function gets called to let the architecture decide whether a specific > > > kexec segment should be considered for SHA calculation or not. > > > > > > Note: To avoid breaking backward compatibility, the new kexec flag > > > KEXEC_CRASH_HOTPLUG_SUPPORT is not used for x86 for now. > > For x86, both KEXEC_UPDATE_ELFCOREHDR and KEXEC_CRASH_HOTPLUG_SUPPORT > > should be OK for kexec_file_load. > > Do we even need these flags for kexec_file_load at all? > My understanding is that these flags are only needed for the kexec_load > system call. Oh, sorry, my bad, I must have mixed this with KEXEC_FILE_DEBUG I earlier added when I checked this patchset. I think everything is like what you said. > > > > Your change will make a difference > > between kexec_load and kexec_file_load. > > I am confused by the above statement. > > Given that we don't even send any of the above flags for kexec_file_load, I > am not > sure how these changes make a difference between the two system calls. > > > But I agree with you on the > > backward cmpatibility with KEXEC_CRASH_HOTPLUG_SUPPORT flag. > > > > Anyway, if it's in a hurry to catch up with Simon's new release, this is > > fine, we can change it later. > > It would be great if we could consider this patch series for the next > release, but not at > the cost of breaking any backward compatibility for x86. If you think these > changes are > breaking anything for any kernel version, I would prefer to update my patch > series. > > > Otherwise, we may be better to remove the > > difference, namely, not making x86 only be able to accept > > KEXEC_UPDATE_ELFCOREHDR flag on kexec_load. My personal opinion > > On x86, passing the KEXEC_CRASH_HOTPLUG_SUPPORT kexec bit to kernel versions > 6.5 to 6.9 > with the kexec_load system call will fail with -EINVAL. However, from kernel > 6.10 onward, > both KEXEC_UPDATE_ELFCOREHDR and KEXEC_CRASH_HOTPLUG_SUPPORT kexec bits are > acceptable for x86. > > My proposal is to use KEXEC_UPDATE_ELFCOREHDR on x86 for some time (maybe a > couple of kernel releases), > and eventually switch to KEXEC_CRASH_HOTPLUG_SUPPORT for x86 as well. > > This proposal of shifting to the KEXEC_CRASH_HOTPLUG_SUPPORT kexec bit for > x86 is also mentioned in the > comment for the get_hotplug_kexec_flag function. > > Please let me know your opinion. It sounds like a good plan, thanks for the effort. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/3] kexec_load: Use new kexec flag for hotplug support
Hi Sourabh, On 07/07/24 at 08:54pm, Sourabh Jain wrote: > Kernel commit 79365026f869 (crash: add a new kexec flag for hotplug > support) has introduced a new kexec flag to generalize hotplug support. > The newly introduced kexec flags for hotplug allow architectures to > exclude all the required kexec segments from SHA calculation so that > the kernel can update them on hotplug events. This was not possible > earlier with the KEXEC_UPDATE_ELFCOREHDR kexec flags since it was added > only for the elfcorehdr segment. > > To enable architectures to control the list of kexec segments to exclude > when hotplug support is enabled, add a new architecture-specific > function named arch_do_exclude_segment. During the SHA calculation, this > function gets called to let the architecture decide whether a specific > kexec segment should be considered for SHA calculation or not. > > Note: To avoid breaking backward compatibility, the new kexec flag > KEXEC_CRASH_HOTPLUG_SUPPORT is not used for x86 for now. For x86, both KEXEC_UPDATE_ELFCOREHDR and KEXEC_CRASH_HOTPLUG_SUPPORT should be OK for kexec_file_load. Your change will make a difference between kexec_load and kexec_file_load. But I agree with you on the backward cmpatibility with KEXEC_CRASH_HOTPLUG_SUPPORT flag. Anyway, if it's in a hurry to catch up with Simon's new release, this is fine, we can change it later. Otherwise, we may be better to remove the difference, namely, not making x86 only be able to accept KEXEC_UPDATE_ELFCOREHDR flag on kexec_load. My personal opinion. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RESEND PATCH] crash: Remove duplicate included header
Hi, On 07/03/24 at 11:35am, Thorsten Blum wrote: > Remove duplicate included header file linux/kexec.h > > Acked-by: Baoquan He > Signed-off-by: Thorsten Blum > --- > kernel/crash_reserve.c | 1 - > 1 file changed, 1 deletion(-) Thanks for catching this, while, It has been fixed by below patch in next tree: commit 5eb1911a8c63f0e10a5f746f52fcc3c9bbfbc710 Author: Wenchao Hao Date: Thu Jun 6 17:14:27 2024 +0800 crash: remove header files which are included more than once > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index 5b2722a93a48..d3b4cd12bdd1 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -13,7 +13,6 @@ > #include > #include > #include > -#include > #include > > #include > -- > 2.45.2 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 3/3] doc/hotplug: update man and --help
On 07/02/24 at 10:00am, Sourabh Jain wrote: > Update the man page and --help option to make the description of the > --hotplug option easier to understand. > > Cc: Aditya Gupta > Cc: Baoquan He > Cc: Coiby Xu > Cc: Mahesh Salgaonkar > Cc: Simon Horman > Acked-by: Hari Bathini > Signed-off-by: Sourabh Jain > --- > Changelog: > > From v2 -> v3 > - Updated --hotplug option description > > --- > kexec/kexec.8 | 8 > kexec/kexec.c | 4 +++- > 2 files changed, 7 insertions(+), 5 deletions(-) Acked-by: Baoquan He > > diff --git a/kexec/kexec.8 b/kexec/kexec.8 > index 9e995fe..793e876 100644 > --- a/kexec/kexec.8 > +++ b/kexec/kexec.8 > @@ -140,10 +140,10 @@ Open a help file for > .BR kexec . > .TP > .B \-\-hotplug > -Setup for kernel modification of the elfcorehdr. This option performs > -the steps needed to support kernel updates to the elfcorehdr in the > -presence of hot un/plug and/or on/offline events. This option only > -useful for KEXEC_LOAD syscall. > +Setup kexec segments such that kernel can safely update them on CPU/Memory > +hot add/remove events. If this option is enabled, kernel does in-kernel > +update of kexec segments on CPU/Memory hot add/remove events, thus avoiding > +the need to reload kdump kernel. > .TP > .B \-i\ (\-\-no-checks) > Fast reboot, no memory integrity checks. > diff --git a/kexec/kexec.c b/kexec/kexec.c > index 034cea6..9b7c34c 100644 > --- a/kexec/kexec.c > +++ b/kexec/kexec.c > @@ -1093,7 +1093,9 @@ void usage(void) > " back to the compatibility syscall when > file based\n" > " syscall is not supported or the kernel > did not\n" > " understand the image (default)\n" > -" --hotplugSetup for kernel modification of > elfcorehdr.\n" > +" --hotplugDo in-kernel update of kexec segments on > CPU/Memory\n" > +" hot add/remove events, avoiding the need > to reload\n" > +" kdump kernel on online/offline events.\n" > " -d, --debug Enable debugging to help spot a > failure.\n" > " -S, --status Return 1 if the type (by default crash) > is loaded,\n" > " 0 if not.\n" > -- > 2.45.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/3] kexec_load: Use new kexec flag for hotplug support
On 07/02/24 at 10:00am, Sourabh Jain wrote: .. > diff --git a/kexec/arch/i386/kexec-x86.c b/kexec/arch/i386/kexec-x86.c > index 444cb69..b4947a0 100644 > --- a/kexec/arch/i386/kexec-x86.c > +++ b/kexec/arch/i386/kexec-x86.c > @@ -208,3 +208,11 @@ void arch_update_purgatory(struct kexec_info *info) > elf_rel_set_symbol(&info->rhdr, "panic_kernel", > &panic_kernel, sizeof(panic_kernel)); > } > + > +int arch_do_exclude_segment(struct kexec_segment *seg_ptr, struct kexec_info > *info) > +{ > + if (info->elfcorehdr == (unsigned long) seg_ptr->mem) > + return 1; > + > + return 0; > +} I know the similar question has been asked in earlier version, I may not get it. still raise concern here to ask why x86_64 returns 0 directly, while i386 will have the different cases. > diff --git a/kexec/arch/ia64/kexec-ia64.c b/kexec/arch/ia64/kexec-ia64.c > index 418d997..8d9c1f3 100644 > --- a/kexec/arch/ia64/kexec-ia64.c > +++ b/kexec/arch/ia64/kexec-ia64.c > @@ -245,3 +245,7 @@ void arch_update_purgatory(struct kexec_info > *UNUSED(info)) > { > } > > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > kexec_info *UNUSED(info)) ~~~ A tiny nit about the parameter naming, since the existing is taking 'struct kexec_segment *segment', can we also take the same way? Not strong opinion. = kexec-tools/kexec/kexec.c: static int valid_memory_segment(struct kexec_info *info, struct kexec_segment *segment) { unsigned long sstart, send; sstart = (unsigned long)segment->mem; send = sstart + segment->memsz - 1; return valid_memory_range(info, sstart, send); } > +{ > + return 0; > +} > diff --git a/kexec/arch/loongarch/kexec-loongarch.c > b/kexec/arch/loongarch/kexec-loongarch.c > index ac75030..ee7b9f1 100644 > --- a/kexec/arch/loongarch/kexec-loongarch.c > +++ b/kexec/arch/loongarch/kexec-loongarch.c > @@ -381,3 +381,8 @@ unsigned long add_buffer(struct kexec_info *info, const > void *buf, > return add_buffer_phys_virt(info, buf, bufsz, memsz, buf_align, > buf_min, buf_max, buf_end, 1); > } > + > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > kexec_info *UNUSED(info)) > +{ > + return 0; > +} > diff --git a/kexec/arch/m68k/kexec-m68k.c b/kexec/arch/m68k/kexec-m68k.c > index cb54927..0c7dbaf 100644 > --- a/kexec/arch/m68k/kexec-m68k.c > +++ b/kexec/arch/m68k/kexec-m68k.c > @@ -108,3 +108,8 @@ void add_segment(struct kexec_info *info, const void > *buf, size_t bufsz, > { > add_segment_phys_virt(info, buf, bufsz, base, memsz, 1); > } > + > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > kexec_info *UNUSED(info)) > +{ > + return 0; > +} > diff --git a/kexec/arch/mips/kexec-mips.c b/kexec/arch/mips/kexec-mips.c > index d8cbea8..94224ee 100644 > --- a/kexec/arch/mips/kexec-mips.c > +++ b/kexec/arch/mips/kexec-mips.c > @@ -189,3 +189,7 @@ unsigned long add_buffer(struct kexec_info *info, const > void *buf, > buf_min, buf_max, buf_end, 1); > } > > +int arch_do_exclude_segment(const void *UNUSED(seg_ptr), struct kexec_info > *UNUSED(info)) > +{ > + return 0; > +} > diff --git a/kexec/arch/ppc/kexec-ppc.c b/kexec/arch/ppc/kexec-ppc.c > index 03bec36..c8af870 100644 > --- a/kexec/arch/ppc/kexec-ppc.c > +++ b/kexec/arch/ppc/kexec-ppc.c > @@ -966,3 +966,7 @@ void arch_update_purgatory(struct kexec_info > *UNUSED(info)) > { > } > > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > kexec_info *UNUSED(info)) > +{ > + return 0; > +} > diff --git a/kexec/arch/ppc64/kexec-ppc64.c b/kexec/arch/ppc64/kexec-ppc64.c > index bd5274c..fb27b6b 100644 > --- a/kexec/arch/ppc64/kexec-ppc64.c > +++ b/kexec/arch/ppc64/kexec-ppc64.c > @@ -967,3 +967,8 @@ int arch_compat_trampoline(struct kexec_info > *UNUSED(info)) > void arch_update_purgatory(struct kexec_info *UNUSED(info)) > { > } > + > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > kexec_info *UNUSED(info)) > +{ > + return 0; > +} > diff --git a/kexec/arch/s390/kexec-s390.c b/kexec/arch/s390/kexec-s390.c > index 33ba6b9..0561ee7 100644 > --- a/kexec/arch/s390/kexec-s390.c > +++ b/kexec/arch/s390/kexec-s390.c > @@ -267,3 +267,8 @@ int get_crash_kernel_load_range(uint64_t *start, uint64_t > *end) > { > return parse_iomem_single("Crash kernel\n", start, end); > } > + > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > kexec_info *UNUSED(info)) > +{ > + return 0; > +} > diff --git a/kexec/arch/sh/kexec-sh.c b/kexec/arch/sh/kexec-sh.c > index ce341c8..f84c40c 100644 > --- a/kexec/arch/sh/kexec-sh.c > +++ b/kexec/arch/sh/kexec-sh.c > @@ -257,3 +257,8 @@ unsigned long add_buffer(struct kexec_info *info, const > void *buf, >
Re: [PATCH v2 1/3] kexec_load: Use new kexec flag for hotplug support
On 06/26/24 at 10:49am, Sourabh Jain wrote: > Hello Baoquan, > > On 26/06/24 09:20, Baoquan He wrote: > > On 06/25/24 at 01:51pm, Sourabh Jain wrote: > > > Any review/comments on this patch series. > > I try to have a look, while there's conflict when applying to the latest > > kexec-tools. > > This patch series apply cleanly on master branch of below repos. > > https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/ > https://github.com/horms/kexec-tools > > May I know which upstream repo and branch you are applying this patch > series? Oh, sorry. I have left local patch related to hotplug from much ealier discussion, I thought it's a rebasing conflict. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 1/3] kexec_load: Use new kexec flag for hotplug support
On 06/25/24 at 01:51pm, Sourabh Jain wrote: > Any review/comments on this patch series. I try to have a look, while there's conflict when applying to the latest kexec-tools. > > On 14/06/24 00:37, Sourabh Jain wrote: > > Kernel commit 79365026f869 (crash: add a new kexec flag for hotplug > > support) has introduced a new kexec flag to generalize hotplug support. > > The newly introduced kexec flags for hotplug allow architectures to > > exclude all the required kexec segments from SHA calculation so that > > the kernel can update them on hotplug events. This was not possible > > earlier with the KEXEC_UPDATE_ELFCOREHDR kexec flags since it was added > > only for the elfcorehdr segment. > > > > To enable architectures to control the list of kexec segments to exclude > > when hotplug support is enabled, add a new architecture-specific > > function named arch_do_exclude_segment. During the SHA calculation, this > > function gets called to let the architecture decide whether a specific > > kexec segment should be considered for SHA calculation or not. > > > > Given that the KEXEC_UPDATE_ELFCOREHDR is no longer required and was > > colliding with the KEXEC_LIVE_UPDATE update flag, it is removed. > > > > Cc: Aditya Gupta > > Cc: Baoquan He > > Cc: Coiby Xu > > Cc: Mahesh Salgaonkar > > Cc: Simon Horman > > Acked-by: Hari Bathini > > Signed-off-by: Sourabh Jain > > --- > > > > * No changes in v2. > > > > --- > > kexec/arch/arm/kexec-arm.c | 5 + > > kexec/arch/arm64/kexec-arm64.c | 4 > > kexec/arch/cris/kexec-cris.c | 4 > > kexec/arch/hppa/kexec-hppa.c | 5 + > > kexec/arch/i386/kexec-x86.c| 8 > > kexec/arch/ia64/kexec-ia64.c | 4 > > kexec/arch/loongarch/kexec-loongarch.c | 5 + > > kexec/arch/m68k/kexec-m68k.c | 5 + > > kexec/arch/mips/kexec-mips.c | 4 > > kexec/arch/ppc/kexec-ppc.c | 4 > > kexec/arch/ppc64/kexec-ppc64.c | 5 + > > kexec/arch/s390/kexec-s390.c | 5 + > > kexec/arch/sh/kexec-sh.c | 5 + > > kexec/arch/x86_64/kexec-x86_64.c | 5 + > > kexec/kexec-syscall.h | 2 +- > > kexec/kexec.c | 14 -- > > kexec/kexec.h | 2 ++ > > 17 files changed, 79 insertions(+), 7 deletions(-) > > > > diff --git a/kexec/arch/arm/kexec-arm.c b/kexec/arch/arm/kexec-arm.c > > index 49f35b1..34531f9 100644 > > --- a/kexec/arch/arm/kexec-arm.c > > +++ b/kexec/arch/arm/kexec-arm.c > > @@ -148,3 +148,8 @@ int have_sysfs_fdt(void) > > { > > return !access(SYSFS_FDT, F_OK); > > } > > + > > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > > kexec_info *UNUSED(info)) > > +{ > > + return 0; > > +} > > diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c > > index 4a67b0d..9d052b0 100644 > > --- a/kexec/arch/arm64/kexec-arm64.c > > +++ b/kexec/arch/arm64/kexec-arm64.c > > @@ -1363,3 +1363,7 @@ void arch_reuse_initrd(void) > > void arch_update_purgatory(struct kexec_info *UNUSED(info)) > > { > > } > > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > > kexec_info *UNUSED(info)) > > +{ > > + return 0; > > +} > > diff --git a/kexec/arch/cris/kexec-cris.c b/kexec/arch/cris/kexec-cris.c > > index 3b69709..7f09121 100644 > > --- a/kexec/arch/cris/kexec-cris.c > > +++ b/kexec/arch/cris/kexec-cris.c > > @@ -109,3 +109,7 @@ unsigned long add_buffer(struct kexec_info *info, const > > void *buf, > > buf_min, buf_max, buf_end, 1); > > } > > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > > kexec_info *UNUSED(info)) > > +{ > > + return 0; > > +} > > diff --git a/kexec/arch/hppa/kexec-hppa.c b/kexec/arch/hppa/kexec-hppa.c > > index 77c9739..a64dc3d 100644 > > --- a/kexec/arch/hppa/kexec-hppa.c > > +++ b/kexec/arch/hppa/kexec-hppa.c > > @@ -146,3 +146,8 @@ unsigned long virt_to_phys(unsigned long addr) > > { > > return addr - phys_offset; > > } > > + > > +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct > > kexec_info *UNUSED(info)) > > +{ > > + return 0; > > +} > > diff --git a/kexec/arch/i386
Re: [PATCH v4 2/7] crash_dump: make dm crypt keys persist for the kdump kernel
On 06/07/24 at 08:27pm, Coiby Xu wrote: > On Tue, Jun 04, 2024 at 04:51:03PM +0800, Baoquan He wrote: > > Hi Coiby, > > Hi Baoquan, > > > > > On 05/23/24 at 01:04pm, Coiby Xu wrote: > > > A sysfs /sys/kernel/crash_dm_crypt_keys is provided for user space to make > > > the dm crypt keys persist for the kdump kernel. User space can send the > > > following commands, > > > - "init KEY_NUM" > > > Initialize needed structures > > > - "record KEY_DESC" > > > Record a key description. The key must be a logon key. > > > > > > User space can also read this API to learn about current state. > > > > From the subject, can I think the luks keys will persist forever? or > > only for a while? > > Yes, you are right. The keys need to stay in kdump reserved memory. Hmm, there are two different concepts we may need differentiate. From security keys's point of view, the keys need be stored for a while so that kdump loading take action to get it, that's done through sysfs; Froom kdump's point of view, the keys need be stored forever till kdump kernel use it. I can't see what you are referring to from the subject, esepcially you stress the newly added sysfs /sys/kernel/crash_dm_crypt_keys. > > > If need and can only keep it for a while, can you > > mention it and tell why and how it will be used. Because you add a lot > > of codes, but only simply mention the sysfs, that doesn't make sense. > > Thanks for raising the concern! I've added > Documentation/ABI/testing/crash_dm_crypt_keys and copy some text in the > cover letter to this patch in v5. > > > > > > > > > Signed-off-by: Coiby Xu > > > --- > > > include/linux/crash_core.h | 5 +- > > > kernel/Kconfig.kexec | 8 +++ > > > kernel/Makefile | 1 + > > > kernel/crash_dump_dm_crypt.c | 113 +++ > > > kernel/ksysfs.c | 22 +++ > > > 5 files changed, 148 insertions(+), 1 deletion(-) > > > create mode 100644 kernel/crash_dump_dm_crypt.c > > > > > > diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h > > > index 44305336314e..6bff1c24efa3 100644 > > > --- a/include/linux/crash_core.h > > > +++ b/include/linux/crash_core.h > > > @@ -34,7 +34,10 @@ static inline void arch_kexec_protect_crashkres(void) > > > { } > > > static inline void arch_kexec_unprotect_crashkres(void) { } > > > #endif > [...] > > > +static int init(const char *buf) > > A more interesting name with more description? > > Thanks for the suggestion! I've added some comments for this function > in v5. But I can't come up with a better name after looking at current > kernel code. You are welcome to suggest any better name:) Usually init() is for the whole driver module. Your init() here only receive the passed total keys number, and allocate the key_header, how can you simply name it init()? If you call it init_keys_header(), I would think it's much more meaningful. > > > > +static int process_cmd(const char *buf, size_t count) > > > > If nobody use the count, why do you add it? > > Good catch! Yes, this is no need to use count in v4. But v5 now needs it to > avoid > buffer overflow. OK, did you add code comment telling what 'count' stands for? And the name 'process_cmd()' is also ambiguous. We may need avoid this kind of name, e.g process_cmd, do_things, handle_stuff. Can you add a more specific name?
Re: [PATCH v4 2/7] crash_dump: make dm crypt keys persist for the kdump kernel
On 06/07/24 at 08:27pm, Coiby Xu wrote: > On Wed, Jun 05, 2024 at 04:22:12PM +0800, Baoquan He wrote: > > On 05/23/24 at 01:04pm, Coiby Xu wrote: > > . > > > diff --git a/kernel/crash_dump_dm_crypt.c b/kernel/crash_dump_dm_crypt.c > > > new file mode 100644 > > > index ..78809189084a > > > --- /dev/null > > > +++ b/kernel/crash_dump_dm_crypt.c > > > @@ -0,0 +1,113 @@ > > > +// SPDX-License-Identifier: GPL-2.0-only > [...] > > > + > > > +static unsigned int key_count; > > > +static size_t keys_header_size; > > > > These two global variables seems not so necessary. Please see comment at > > below. > > Thanks for the comment! But I think it's better to keep these two static > variables for reasons as will be explained later. > > > > > > + > > > +struct dm_crypt_key { > > > + unsigned int key_size; > > > + char key_desc[KEY_DESC_LEN]; > > > + u8 data[KEY_SIZE_MAX]; > > > +}; > > > + > > > +static struct keys_header { > > > + unsigned int key_count; > > > > This is the max number a system have from init(); > > You can add one field member to record how many key slots have been > > used. > > > + struct dm_crypt_key keys[] __counted_by(key_count); > > > +} *keys_header; > > > > Maybe we can rearrange the keys_header like below, the name may not be > > very appropriate though. > > > > static struct keys_header { > > unsigned int max_key_slots; > > unsigned int used_key_slots; > > struct dm_crypt_key keys[] __counted_by(key_count); > > } *keys_header; > > Thanks for the suggestion! Since 1) KEY_NUM_MAX already defines the > maximum number of dm crypt keys 2) we only need to let the kdump kernel > now how many keys are saved, so I simply use total_keys instead of > key_count in struct keys_header in v5, > > static struct keys_header { > unsigned int total_keys; > struct dm_crypt_key keys[] __counted_by(total_keys); > } *keys_header; > > Hopefully this renaming will improve code readability. If you add key_count into keys_header, then kdump kernel will know how many keys are really saved and need be retrieved. What's your concern when you have to put key_count outside and take it as a global variable? > > > > > > > > > > > + > > > +static size_t get_keys_header_size(struct keys_header *keys_header, > > > +size_t key_count) > > > +{ > > > + return struct_size(keys_header, keys, key_count); > > > +} > > > > I personally don't think get_keys_header_size is so necessary. If we > > have to keep it, may be we can remove the global variable > > keys_header_size, we can call get_keys_header_size() and use local > > variable to record the value instead. > > Thanks for the suggestion! But the kdump kernel also need to call > get_keys_header_size in later patches. If so, you can remove keys_header_size now. You can define local variable to take the newly calculated value. keys_header_size seems not so necesary. By the way, you don't need to rush to post v5. When people review patches, agreement need be reached after discussion. Then next version can be posted.
Re: [PATCH v4 0/7] Support kdump with LUKS encryption by reusing LUKS volume keys
Hi Coiby, On 05/23/24 at 01:04pm, Coiby Xu wrote: > LUKS is the standard for Linux disk encryption. Many users choose LUKS > and in some use cases like Confidential VM it's mandated. With kdump > enabled, when the 1st kernel crashes, the system could boot into the > kdump/crash kernel and dump the memory image i.e. /proc/vmcore to a > specified target. Currently, when dumping vmcore to a LUKS > encrypted device, there are two problems, I am done with this round of reviewing. The overall approach looks good to me, while there are places to improve or fix. I have added comment on all things I am concerned about, please check. Thanks for the effort. By the way, do you get confirmation on the solution from encryption/keys developer of redhat internally or upstream? With my understanding, it looks good. It may need their confirmation or approval in some ways. Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 7/7] x86/crash: make the page that stores the dm crypt keys inaccessible
On 05/23/24 at 01:04pm, Coiby Xu wrote: > This adds an addition layer of protection for the saved copy of dm > crypt key. Trying to access the saved copy will cause page fault. > > Suggested-by: Pingfan Liu > Signed-off-by: Coiby Xu > --- > arch/x86/kernel/machine_kexec_64.c | 21 + > 1 file changed, 21 insertions(+) > > diff --git a/arch/x86/kernel/machine_kexec_64.c > b/arch/x86/kernel/machine_kexec_64.c > index b180d8e497c3..fc0a80f4254e 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -545,13 +545,34 @@ static void kexec_mark_crashkres(bool protect) > kexec_mark_range(control, crashk_res.end, protect); > } > > +static void kexec_mark_dm_crypt_keys(bool protect) > +{ > + unsigned long start_paddr, end_paddr; > + unsigned int nr_pages; > + > + if (kexec_crash_image->dm_crypt_keys_addr) { > + start_paddr = kexec_crash_image->dm_crypt_keys_addr; > + end_paddr = start_paddr + kexec_crash_image->dm_crypt_keys_sz - > 1; > + nr_pages = (PAGE_ALIGN(end_paddr) - > PAGE_ALIGN_DOWN(start_paddr))/PAGE_SIZE; > + if (protect) > + set_memory_np((unsigned long)phys_to_virt(start_paddr), > nr_pages); > + else > + __set_memory_prot( > + (unsigned long)phys_to_virt(start_paddr), > + nr_pages, > + __pgprot(_PAGE_PRESENT | _PAGE_NX | _PAGE_RW)); > + } > +} > + > void arch_kexec_protect_crashkres(void) > { > kexec_mark_crashkres(true); > + kexec_mark_dm_crypt_keys(true); Isn't crashkernel region covering crypt keys' storing region? Do we need mark it again specifically? Not sure if I miss anything. > } > > void arch_kexec_unprotect_crashkres(void) > { > + kexec_mark_dm_crypt_keys(false); > kexec_mark_crashkres(false); > } > #endif > -- > 2.45.0 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 6/7] x86/crash: pass dm crypt keys to kdump kernel
On 05/23/24 at 01:04pm, Coiby Xu wrote: > 1st kernel will build up the kernel command parameter dmcryptkeys as > similar to elfcorehdr to pass the memory address of the stored info of > dm crypt key to kdump kernel. > > Signed-off-by: Coiby Xu > --- > arch/x86/kernel/crash.c | 15 ++- > arch/x86/kernel/kexec-bzimage64.c | 7 +++ > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > index f06501445cd9..74b3844ae53c 100644 > --- a/arch/x86/kernel/crash.c > +++ b/arch/x86/kernel/crash.c > @@ -266,6 +266,7 @@ static int memmap_exclude_ranges(struct kimage *image, > struct crash_mem *cmem, >unsigned long long mend) > { > unsigned long start, end; > + int r; ~? r is only to contain the returned value? Then you can call it ret as many do in kernel code. > > cmem->ranges[0].start = mstart; > cmem->ranges[0].end = mend; > @@ -274,7 +275,19 @@ static int memmap_exclude_ranges(struct kimage *image, > struct crash_mem *cmem, > /* Exclude elf header region */ > start = image->elf_load_addr; > end = start + image->elf_headers_sz - 1; > - return crash_exclude_mem_range(cmem, start, end); > + r = crash_exclude_mem_range(cmem, start, end); > + > + if (r) > + return r; > + > + /* Exclude dm crypt keys region */ > + if (image->dm_crypt_keys_addr) { > + start = image->dm_crypt_keys_addr; > + end = start + image->dm_crypt_keys_sz - 1; > + return crash_exclude_mem_range(cmem, start, end); > + } You need adjust the array length of cmem->ranges[], I believe you will cause the array overflow because the keys are randomly set and mostly will be in the middle of crashkernel region. > + > + return r; > } > > /* Prepare memory map for crash dump kernel */ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 5/7] crash_dump: retrieve dm crypt keys in kdump kernel
On 05/23/24 at 01:04pm, Coiby Xu wrote: .. > +ssize_t __weak dm_crypt_keys_read(char *buf, size_t count, u64 *ppos) > +{ > + struct kvec kvec = { .iov_base = buf, .iov_len = count }; > + struct iov_iter iter; > + > + iov_iter_kvec(&iter, READ, &kvec, 1, count); > + return read_from_oldmem(&iter, count, ppos, false); Do we need create a x86 specific version to cope with the confidential computing thing, e.g sme/tdx? > +} > + ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] crash: Remove header files which are included more than once
On 06/06/24 at 05:14pm, Wenchao Hao wrote: > Following warning is reported, so remove these duplicated header > including: > > ./kernel/crash_reserve.c: linux/kexec.h is included more than once. > > This is just a clean code, no logic changed. > > Signed-off-by: Wenchao Hao > --- > kernel/crash_reserve.c | 1 - > 1 file changed, 1 deletion(-) I remember someone posted patch to clean this up. Anyway, Acked-by: Baoquan He > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index 5b2722a93a48..d3b4cd12bdd1 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -13,7 +13,6 @@ > #include > #include > #include > -#include > #include > > #include > -- > 2.38.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec