Re: [PATCH -next] crash: fix x86_32 memory reserve dead loop retry bug at "high"
On Wed, 17 Jul 2024 21:38:41 +0800 Baoquan He wrote: > 1) revert commit 8f9dade5906a in Andrew's tree; Thanks, dropped. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] crash: use macro to add crashk_res into iomem early for specific arch
On Mon, 25 Mar 2024 09:50:50 +0800 Baoquan He wrote: > There are regression reports[1][2] that crashkernel region on x86_64 can't > be added into iomem tree sometime. This causes the later failure of kdump > loading. So I think a cc:stable is needed. > This happened after commit 4a693ce65b18 ("kdump: defer the insertion of > crashkernel resources") was merged. > > Even though, these reported issues are proved to be related to other > component, they are just exposed after above commmit applied, I still > would like to keep crashk_res and crashk_low_res being added into iomem > early as before because the early adding has been always there on x86_64 > and working very well. For safety of kdump, Let's change it back. I'll use 4a693ce65b18 as the Fixes: target, since there is no "Exposed-by-non-buggy-patch:" tag. To tell the -stable tree maintainers how far back in history this should be backported. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH RESEND] crash_core: export vmemmap when CONFIG_SPARSEMEM_VMEMMAP is enabled
On Mon, 26 Feb 2024 10:04:33 +0800 Huang Shijie wrote: > In memory_model.h, if CONFIG_SPARSEMEM_VMEMMAP is configed, > kernel will use vmemmap to do the __pfn_to_page/page_to_pfn, > and kernel will not use the "classic sparse" to do the > __pfn_to_page/page_to_pfn. > > So export the vmemmap when CONFIG_SPARSEMEM_VMEMMAP is configed. > This makes the user applications (crash, etc) get faster > pfn_to_page/page_to_pfn operations too. This is significantly out of sync with development trees. Please take a look at linux-next and redo the patch if still needed? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 00/14] Split crash out from kexec and clean up related config items
On Thu, 22 Feb 2024 10:47:29 +0530 Hari Bathini wrote: > > > On 22/02/24 2:27 am, Andrew Morton wrote: > > On Wed, 21 Feb 2024 11:15:00 +0530 Hari Bathini > > wrote: > > > >> On 04/02/24 8:56 am, Baoquan He wrote: > >>>>> Hope Hari and Pingfan can help have a look, see if > >>>>> it's doable. Now, I make it either have both kexec and crash enabled, or > >>>>> disable both of them altogether. > >>>> > >>>> Sure. I will take a closer look... > >>> Thanks a lot. Please feel free to post patches to make that, or I can do > >>> it with your support or suggestion. > >> > >> Tested your changes and on top of these changes, came up with the below > >> changes to get it working for powerpc: > >> > >> > >> https://lore.kernel.org/all/20240213113150.1148276-1-hbath...@linux.ibm.com/ > > > > So can we take it that you're OK with Baoquan's series as-is? > > Hi Andrew, > > If you mean > > v3 (https://lore.kernel.org/all/20240124051254.67105-1-...@redhat.com/) > + > follow-up from Baoquan > (https://lore.kernel.org/all/Zb8D1ASrgX0qVm9z@MiWiFi-R3L-srv/) > > Yes. > Can I add your Acked-by: and/or Tested-by: to the patches in this series? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 00/14] Split crash out from kexec and clean up related config items
On Wed, 21 Feb 2024 11:15:00 +0530 Hari Bathini wrote: > On 04/02/24 8:56 am, Baoquan He wrote: > >>> Hope Hari and Pingfan can help have a look, see if > >>> it's doable. Now, I make it either have both kexec and crash enabled, or > >>> disable both of them altogether. > >> > >> Sure. I will take a closer look... > > Thanks a lot. Please feel free to post patches to make that, or I can do > > it with your support or suggestion. > > Tested your changes and on top of these changes, came up with the below > changes to get it working for powerpc: > > > https://lore.kernel.org/all/20240213113150.1148276-1-hbath...@linux.ibm.com/ So can we take it that you're OK with Baoquan's series as-is? Baoquan, do you believe the patches in mm-unstable are ready for moving into mm-stable in preparation for an upstream merge? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 01/14] kexec: split crashkernel reservation code out from crash_core.c
On Wed, 21 Feb 2024 22:59:47 +0530 Sourabh Jain wrote: > > config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION > > - def_bool CRASH_CORE > > + def_bool CRASH_RESEERVE > > %s/CRASH_RESEERVE/CRASH_RESERVE? Yes, thanks, this has been addressed in a followon fixup patch in the mm.git tree. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kdump: Defer the insertion of crashkernel resources
On Fri, 29 Dec 2023 16:02:13 +0800 Huacai Chen wrote: > In /proc/iomem, sub-regions should be inserted after their parent, > otherwise the insertion of parent resource fails. But after generic > crashkernel reservation applied, in both RISC-V and ARM64 (LoongArch > will also use generic reservation later on), crashkernel resources are > inserted before their parent, which causes the parent disappear in > /proc/iomem. So we defer the insertion of crashkernel resources to an > early_initcall(). > > ... > > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -377,7 +377,6 @@ static int __init reserve_crashkernel_low(unsigned long > long low_size) > > crashk_low_res.start = low_base; > crashk_low_res.end = low_base + low_size - 1; > - insert_resource(_resource, _low_res); > #endif > return 0; > } > @@ -459,8 +458,19 @@ void __init reserve_crashkernel_generic(char *cmdline, > > crashk_res.start = crash_base; > crashk_res.end = crash_base + crash_size - 1; > - insert_resource(_resource, _res); > } > + > +static __init int insert_crashkernel_resources(void) > +{ > + if (crashk_res.start < crashk_res.end) > + insert_resource(_resource, _res); > + > + if (crashk_low_res.start < crashk_low_res.end) > + insert_resource(_resource, _low_res); > + > + return 0; > +} > +early_initcall(insert_crashkernel_resources); > #endif > > int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, I'm thinking Fixes: 0ab97169aa0 ("crash_core: add generic function to do reservation"). Also, is this a regression? Were earlier kernels OK? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] crash_core: optimize crash_exclude_mem_range()
On Wed, 20 Dec 2023 00:34:18 +0800 Yuntao Wang wrote: > Because memory ranges in mem->ranges are stored in ascending order, when we > detect `p_end < start`, we can break the for loop early, as the subsequent > memory ranges must also be outside the range we are looking for. > > Signed-off-by: Yuntao Wang > --- > Hi Andrew, > > Patch "[PATCH 2/2] crash_core: fix out-of-bounds access check in > crash_exclude_mem_range()" can be ignored, use this patch instead. > Some reviewer input on this would be helpful please? > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -575,9 +575,12 @@ int crash_exclude_mem_range(struct crash_mem *mem, > p_start = mstart; > p_end = mend; > > - if (p_start > end || p_end < start) > + if (p_start > end) > continue; > > + if (p_end < start) > + break; > + > /* Truncate any area outside of range */ > if (p_start < start) > p_start = start; > -- > 2.43.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 3/3] crash_core: fix and simplify the logic of crash_exclude_mem_range()
On Sat, 16 Dec 2023 11:31:04 +0800 Baoquan He wrote: > > > Imagine we have a crashkernel region 256M reserved under 4G, say [2G, > > > 2G+256M]. > > > Then after excluding the 256M from a region, it should stop. But now, > > > this patch > > > will make it continue scanning. Not sure if it's all in my mind. > > > > Hi Baoquan, > > > > Thank you for such a detailed reply. Now I finally understand why the code > > is > > written this way. > > > > However, if we can guarantee its correctness, wouldn't it be better to use > > the > > generic region removing logic? At least it is more concise and clear, and > > other > > people reading this code for the first time wouldn't get confused like me. > > > > As for your concern about the while loop, I think it wouldn't affect > > performance > > much because the total number of loops is small. > > Well, see below kexec-tools commit, you wouldn't say that. And when you > understand the code, you will feel a little uncomfortable about the > sustaining useless scanning. At least, we should stop scanning after > needed exluding is done. > > Or, we may need add a generic region removing function so that it > can be shared, e.g e820 memory region removing, memblock region removing. > Otherwise, I can't see why a specific region excluding need a generic > region removing function. So where do we now stand on this patchset? Thanks. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec_core: fix the assignment to kimage->control_page
On Thu, 21 Dec 2023 12:23:08 +0800 Yuntao Wang wrote: > image->control_page represents the starting address for allocating the next > control page, while hole_end represents the address of the last valid byte > of the currently allocated control page. > > Therefore, after successfully allocating a control page, image->control_page > should be updated to `hole_end + 1`, rather than hole_end. Thanks. Again, please include a description of the userspace-visible effects of the bug. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 5/7] kexec_file, ricv: print out debugging message if required
On Wed, 20 Dec 2023 12:22:29 +0800 Baoquan He wrote: > Could you help fix the typo in subject? > > [PATCH v4 5/7] kexec_file, ricv: print out debugging message if required >~~~ s/ricv/riscv/ I made that change. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
On Tue, 19 Dec 2023 15:29:01 +0800 Yuntao Wang wrote: > When an error is detected, use pr_err() instead of kexec_dprintk() to > output log message. > > In addition, remove the unnecessary return from set_page_address(). The above describes what the code does, which is already quite clear from looking at the code. Please write changelogs and code comments which describe *why* the code does something, rather than *what* it does. > > > --- a/arch/x86/kernel/kexec-bzimage64.c > +++ b/arch/x86/kernel/kexec-bzimage64.c > @@ -429,7 +429,7 @@ static void *bzImage64_load(struct kimage *image, char > *kernel, >* command line. Make sure it does not overflow >*/ > if (cmdline_len + MAX_ELFCOREHDR_STR_LEN > header->cmdline_size) { > - kexec_dprintk("Appending elfcorehdr= to command line > exceeds maximum allowed length\n"); > + pr_err("Appending elfcorehdr= to command line exceeds > maximum allowed length\n"); ie, what are the reasons for this change? > return ERR_PTR(-EINVAL); > } > > diff --git a/mm/highmem.c b/mm/highmem.c > index e19269093a93..bd48ba445dd4 100644 > --- a/mm/highmem.c > +++ b/mm/highmem.c > @@ -799,8 +799,6 @@ void set_page_address(struct page *page, void *virtual) > } > spin_unlock_irqrestore(>lock, flags); > } > - > - return; > } > > void __init page_address_init(void) > -- > 2.43.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 3/3] x86/kexec: use pr_err() instead of pr_debug() when an error occurs
On Sun, 17 Dec 2023 11:35:28 +0800 Yuntao Wang wrote: > When an error is detected, use pr_err() instead of pr_debug() to output > log message. > > In addition, remove the unnecessary return from set_page_address(). > > ... > > --- a/arch/x86/kernel/kexec-bzimage64.c > +++ b/arch/x86/kernel/kexec-bzimage64.c > @@ -424,7 +424,7 @@ static void *bzImage64_load(struct kimage *image, char > *kernel, >* command line. Make sure it does not overflow >*/ > if (cmdline_len + MAX_ELFCOREHDR_STR_LEN > header->cmdline_size) { > - pr_debug("Appending elfcorehdr= to command line exceeds > maximum allowed length\n"); > + pr_err("Appending elfcorehdr= to command line exceeds > maximum allowed length\n"); > return ERR_PTR(-EINVAL); > } https://lkml.kernel.org/r/20231213055747.61826-4-...@redhat.com has already changed this to call kexec_dprintk(). I'll skip this patch. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] crash_core: fix out-of-bounds access check in crash_exclude_mem_range()
On Mon, 18 Dec 2023 16:19:15 +0800 Yuntao Wang wrote: > mem->nr_ranges represents the current number of elements stored in > the mem->ranges array, and mem->max_nr_ranges represents the maximum number > of elements that the mem->ranges array can hold. Therefore, the correct > array out-of-bounds check should be mem->nr_ranges >= mem->max_nr_ranges. > This does not apply after your own "crash_core: fix and simplify the logic of crash_exclude_mem_range()". What should be done? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] crash_core: Fix the check for whether crashkernel is from high memory
On Sat, 9 Dec 2023 22:14:38 +0800 Yuntao Wang wrote: > If crash_base is equal to CRASH_ADDR_LOW_MAX, it also indicates that > the crashkernel memory is allocated from high memory. However, the > current check only considers the case where crash_base is greater than > CRASH_ADDR_LOW_MAX. Fix it. > > This patch also includes some minor cleanups. Can we please include a description of the runtime effects of this change? ie, what happens now and under what circumstances, and how does the fix alter these things? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: fix KEXEC_FILE dependencies
On Thu, 2 Nov 2023 16:03:18 +0800 Baoquan He wrote: > > > CONFIG_KEXEC_FILE, but still get purgatory code built in which is > > > totally useless. > > > > > > Not sure if I think too much over this. > > > > I see your point here, and I would suggest changing the > > CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY symbol to just indicate > > the availability of the purgatory code for the arch, rather > > than actually controlling the code itself. I already mentioned > > this for s390, but riscv would need the same thing on top. > > > > I think the change below should address your concern. > > Since no new comment, do you mind spinning v2 to wrap all these up? This patchset remains in mm-hotfixes-unstable from the previous -rc cycle. Eric, do you have any comments? Arnd, do you plan on a v2? If not, should I merge v1? If so, should I now add cc:stable? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
On Wed, 29 Nov 2023 22:34:13 + Ignat Korchagin wrote: > On Wed, Nov 29, 2023 at 10:23 PM Andrew Morton > wrote: > > > > On Wed, 29 Nov 2023 22:04:09 + Ignat Korchagin > > wrote: > > > > > Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") > > > Cc: sta...@vger.kernel.org # 6.6+: f8ff234: kernel/Kconfig.kexec: drop > > > select of KEXEC for CRASH_DUMP > > > Cc: sta...@vger.kernel.org # 6.6+ > > > > I doubt if anyone knows what the two above lines mean. What are your > > recommendations for the merging of this patch? > > Hmm... I was just following > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-1 > and basically wanted to make sure that this patch gets backported > together with commit f8ff234: kernel/Kconfig.kexec: drop select of > KEXEC for CRASH_DUMP (they should go together) I see, thanks. I don't think I've ever received a patch which did this! ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
On Wed, 29 Nov 2023 22:04:09 + Ignat Korchagin wrote: > Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") > Cc: sta...@vger.kernel.org # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select > of KEXEC for CRASH_DUMP > Cc: sta...@vger.kernel.org # 6.6+ I doubt if anyone knows what the two above lines mean. What are your recommendations for the merging of this patch? > Signed-off-by: Ignat Korchagin ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/3] drivers/base/cpu: crash data showing should depends on KEXEC_CORE
On Thu, 23 Nov 2023 19:15:43 +0800 Baoquan He wrote: > > > CONFIG_KEXEC is used to enable kexec_load interface, the > > > crash_notes/crash_notes_size/crash_hotplug showing depends on > > > CONFIG_KEXEC is incorrect. It should depend on KEXEC_CORE instead. > > > > > > Fix it now. > > > > Can we add Fixes/CC stable, so it gets eventually backported into 6.6? > > Makes sense. Will add it in v2 since I need respin to add the missing > stuff in patch 1. Thanks. Please avoid mixing cc:stable patches and this-merge-window fixes in the same series as next-merge-window material. Because I'll just have to separate them out anyway, causing what-I-merged to unnecessarily differ from what-you-sent. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()
On Tue, 14 Nov 2023 17:16:57 +0800 Baoquan He wrote: > This function, being a variant of walk_system_ram_res() introduced in > commit 8c86e70acead ("resource: provide new functions to walk through > resources"), walks through a list of all the resources of System RAM > in reversed order, i.e., from higher to lower. > > It will be used in kexec_file code to load kernel, initrd etc when > preparing kexec reboot. > > ... > > +/* > + * This function, being a variant of walk_system_ram_res(), calls the @func > + * callback against all memory ranges of type System RAM which are marked as > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > + * higher to lower. > + */ > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > + int (*func)(struct resource *, void *)) > +{ > + struct resource res, *rams; > + int rams_size = 16, i; > + unsigned long flags; > + int ret = -1; > + > + /* create a list */ > + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL); > + if (!rams) > + return ret; > + > + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + i = 0; > + while ((start < end) && > + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, > ))) { > + if (i >= rams_size) { > + /* re-alloc */ > + struct resource *rams_new; > + int rams_new_size; > + > + rams_new_size = rams_size + 16; > + rams_new = kvcalloc(rams_new_size, sizeof(struct > resource), > + GFP_KERNEL); kvrealloc()? > + if (!rams_new) > + goto out; > + > + memcpy(rams_new, rams, > + sizeof(struct resource) * rams_size); > + kvfree(rams); > + rams = rams_new; > + rams_size = rams_new_size; > + } > + > + rams[i].start = res.start; > + rams[i++].end = res.end; > + > + start = res.end + 1; > + } > + > + /* go reverse */ > + for (i--; i >= 0; i--) { > + ret = (*func)([i], arg); > + if (ret) > + break; > + } > + > +out: > + kvfree(rams); > + return ret; > +} > + > /* > * This function calls the @func callback against all memory ranges, which > * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. > -- > 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3] Crash: add lock to serialize crash hotplug handling
On Tue, 26 Sep 2023 20:09:05 +0800 Baoquan He wrote: > Eric reported that handling corresponding crash hotplug event can be > failed easily when many memory hotplug event are notified in a short > period. They failed because failing to take __kexec_lock. I'm assuming that this failure is sufficiently likely so as to justify a -stable backport of the fix. Please let me know if this is incorrect. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: change locking mechanism to a mutex
On Thu, 21 Sep 2023 17:59:38 -0400 Eric DeVolder wrote: > Scaled up testing has revealed that the kexec_trylock() > implementation leads to failures within the crash hotplug > infrastructure due to the inability to acquire the lock, > specifically the message: > > crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate > > When hotplug events occur, the crash hotplug infrastructure first > attempts to obtain the lock via the kexec_trylock(). However, the > implementation either acquires the lock, or fails and returns; there > is no waiting on the lock. Here is the comment/explanation from > kernel/kexec_internal.h:kexec_trylock(): > > * Whatever is used to serialize accesses to the kexec_crash_image needs to be > * NMI safe, as __crash_kexec() can happen during nmi_panic(), so here we use > a > * "simple" atomic variable that is acquired with a cmpxchg(). > > While this in theory can happen for either CPU or memory hoptlug, > this problem is most prone to occur for memory hotplug. > > When memory is hot plugged, the memory is converted into smaller > 128MiB memblocks (typically). As each memblock is processed, a > kernel thread and a udev event thread are created. The udev thread > tries for the lock via the reading of the sysfs node > /sys/devices/system/memory/crash_hotplug node, and the kernel > worker thread tries for the lock upon entering the crash hotplug > infrastructure. > > These threads then compete for the kexec lock. > > For example, a 1GiB DIMM is converted into 8 memblocks, each > spawning two threads for a total of 16 threads that create a small > "swarm" all trying to acquire the lock. The larger the DIMM, the > more the memblocks and the larger the swarm. > > At the root of the problem is the atomic lock behind kexec_trylock(); > it works well for low lock traffic; ie loading/unloading a capture > kernel, things that happen basically once. But with the introduction > of crash hotplug, the traffic through the lock increases significantly, > and more importantly in bursts occurring at roughly the same time. Thus > there is a need to wait on the lock. > > A possible workaround is to simply retry the lock, say up to N times. > There is, of course, the problem of determining a value of N that works for > all implementations, and for all the other call sites of kexec_trylock(). > Not ideal. > > The design decision to use the atomic lock is described in the comment > from kexec_internal.h, cited above. However, examining the code of > __crash_kexec(): > > if (kexec_trylock()) { > if (kexec_crash_image) { > ... > } > kexec_unlock(); > } > > reveals that the use of kexec_trylock() here is actually a "best effort" > due to the atomic lock. This atomic lock, prior to crash hotplug, > would almost always be assured (another kexec syscall could hold the lock > and prevent this, but that is about it). > > So at the point where the capture kernel would be invoked, if the lock > is not obtained, then kdump doesn't occur. > > It is possible to instead use a mutex with proper waiting, and utilize > mutex_trylock() as the "best effort" in __crash_kexec(). The use of a > mutex then avoids all the lock acquisition problems that were revealed > by the crash hotplug activity. > > Convert the atomic lock to a mutex. > > ... > > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -47,7 +47,7 @@ > #include > #include "kexec_internal.h" > > -atomic_t __kexec_lock = ATOMIC_INIT(0); > +DEFINE_MUTEX(__kexec_lock); > > /* Flag to indicate we are going to kexec a new kernel */ > bool kexec_in_progress = false; > @@ -1057,7 +1057,7 @@ void __noclone __crash_kexec(struct pt_regs *regs) >* of memory the xchg(_crash_image) would be >* sufficient. But since I reuse the memory... >*/ > - if (kexec_trylock()) { > + if (mutex_trylock(&__kexec_lock)) { > if (kexec_crash_image) { > struct pt_regs fixed_regs; What's happening here? If someone else held the lock we silently fail to run the kexec? Shouldn't we at least alert the user to what just happened? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: change locking mechanism to a mutex
On Thu, 21 Sep 2023 17:59:38 -0400 Eric DeVolder wrote: > Scaled up testing has revealed that the kexec_trylock() > implementation leads to failures within the crash hotplug > infrastructure due to the inability to acquire the lock, > specifically the message: > > ... > > Convert the atomic lock to a mutex. > Do you think this problem is serious enough to warrant a backport into -stable kernels? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v28 0/8] crash: Kernel handling of CPU and memory hot un/plug
On Mon, 14 Aug 2023 17:44:38 -0400 Eric DeVolder wrote: > This series is dependent upon "refactor Kconfig to consolidate > KEXEC and CRASH options". > https://lore.kernel.org/lkml/20230712161545.87870-1-eric.devol...@oracle.com/ > > Once the kdump service is loaded, if changes to CPUs or memory occur, > either by hot un/plug or off/onlining, the crash elfcorehdr must also > be updated. Thanks, I updated branch mm-nonmm-unstable to this version. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 0/2] kexec: Remove unnecessary arch hook
On Thu, 23 Mar 2023 15:49:20 +0800 Baoquan He wrote: > Hi, > > On 03/07/23 at 04:44pm, Bjorn Helgaas wrote: > > From: Bjorn Helgaas > > > > There are no arch-specific things in arch_kexec_kernel_image_load(), so > > remove it and just use the generic version. > > > > v1 is at: > > https://lore.kernel.org/all/20221215182339.129803-1-helg...@kernel.org/ > > > > This v2 is trivially rebased to v6.3-rc1 and the commit log expanded > > slightly. > > This is an obvious and good cleanup patchset, who should I ping to ask > for accepting? It's touching kexec generic code, while the hook > only exists on x86 ARCH. I grabbed them ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4] notifiers: Add tracepoints to the notifiers infrastructure
On Tue, 14 Mar 2023 18:08:37 -0300 "Guilherme G. Piccoli" wrote: > On 14/03/2023 17:50, Andrew Morton wrote: > > On Tue, 14 Mar 2023 17:00:58 -0300 "Guilherme G. Piccoli" > > wrote: > > > >> include/trace/events/notifiers.h | 69 > >> kernel/notifier.c| 6 +++ > > > > Perhaps the filenames should match, which means "notifier.h". > > Hi Andrew, thanks! > > Do you want me to re-submit? I see some emails of the patch getting > added to "mm-nonmm-unstable" (and also a checkpatch fixes you added on > top of that). Lemme know how should I proceed. I changed it, thanks. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4] notifiers: Add tracepoints to the notifiers infrastructure
On Tue, 14 Mar 2023 17:00:58 -0300 "Guilherme G. Piccoli" wrote: > include/trace/events/notifiers.h | 69 > kernel/notifier.c| 6 +++ Perhaps the filenames should match, which means "notifier.h". ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4] panic: Fixes the panic_print NMI backtrace setting
On Fri, 10 Feb 2023 17:35:10 -0300 "Guilherme G. Piccoli" wrote: > Commit 8d470a45d1a6 ("panic: add option to dump all CPUs backtraces in > panic_print") > introduced a setting for the "panic_print" kernel parameter to allow > users to request a NMI backtrace on panic. Problem is that the panic_print > handling happens after the secondary CPUs are already disabled, hence > this option ended-up being kind of a no-op - kernel skips the NMI trace > in idling CPUs, which is the case of offline CPUs. > > Fix it by checking the NMI backtrace bit in the panic_print prior to > the CPU disabling function. > > ... > > Notice that while at it, I got rid of the "crash_kexec_post_notifiers" > local copy in panic(). This was introduced by commit b26e27ddfd2a > ("kexec: use core_param for crash_kexec_post_notifiers boot option"), > but it is not clear from comments or commit message why this local copy > is required. > > My understanding is that it's a mechanism to prevent some concurrency, > in case some other CPU modify this variable while panic() is running. > I find it very unlikely, hence I removed it - but if people consider > this copy needed, I can respin this patch and keep it, even providing a > comment about that, in order to be explict about its need. Only two sites change crash_kexec_post_notifiers, in arch/powerpc/kernel/fadump.c and drivers/hv/hv_common.c. Yes, it's very unlikely that this will be altered while panic() is running and the consequences will be slight anyway. But formally, we shouldn't do this, especially in a -stable backportable patch. So please, let's have the minimal bugfix for now and we can look at removing that local at a later time? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH AUTOSEL 6.1 13/22] proc/vmcore: fix potential memory leak in vmcore_init()
On Sat, 17 Dec 2022 10:27:14 -0500 Sasha Levin wrote: > From: Jianglei Nie > > [ Upstream commit 12b9d301ff73122aebd78548fa4c04ca69ed78fe ] > > Patch series "Some minor cleanup patches resent". > > The first three patches trivial clean up patches. > > And for the patch "kexec: replace crash_mem_range with range", I got a > ibm-p9wr ppc64le system to test, it works well. > > This patch (of 4): > > elfcorehdr_alloc() allocates a memory chunk for elfcorehdr_addr with > kzalloc(). If is_vmcore_usable() returns false, elfcorehdr_addr is a > predefined value. If parse_crash_elf_headers() gets some error and > returns a negetive value, the elfcorehdr_addr should be released with > elfcorehdr_free(). This is exceedingly minor - a single memory leak per boot, under very rare circumstances. With every patch I merge I consider -stable. Often I'll discuss the desirability of a backport with the author and with reviewers. Every single patch. And then some damn script comes along and overrides that quite careful decision. argh. Can we please do something like if (akpm && !cc:stable) dont_backport() And even go further - if your script thinks it might be something we should backport and if it didn't have cc:stable then contact the author, reviewers and committers and ask them to reconsider before we go and backport it. This approach will have the advantage of training people to consider the backport more consistently. I'd (still) like to have a new patch tag like Not-For-Stable: or cc:not-stable or something to tell your scripts "yes, we thought about it and we decided no". ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] vmcoreinfo: Warn if we exceed vmcoreinfo data size
On Thu, 27 Oct 2022 13:50:08 -0700 Stephen Brennan wrote: > Though vmcoreinfo is intended to be small, at just one page, useful > information is still added to it, so we risk running out of space. > Currently there is no runtime check to see whether the vmcoreinfo buffer > has been exhausted. Add a warning for this case. > > Currently, my static checking tool[1] indicates that a good upper bound > for vmcoreinfo size is currently 3415 bytes, but the best time to add > warnings is before the risk becomes too high. > > ... > > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -383,6 +383,9 @@ void vmcoreinfo_append_str(const char *fmt, ...) > memcpy(_data[vmcoreinfo_size], buf, r); > > vmcoreinfo_size += r; > + > + WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES, > + "vmcoreinfo data exceeds allocated size, truncating"); > } Seems that vmcoreinfo_append_str() will truncate (ie: corrupt) the final entry when limiting the overall data size to VMCOREINFO_BYTES. And that final entry will be missing any terminating \n or \0. Is all this desirable, or should we be checking for (and warning about) sufficient space _before_ appending this string? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 0/2] kexec, panic: Making crash_kexec() NMI safe
On Mon, 03 Oct 2022 14:20:51 +0100 Valentin Schneider wrote: > > I'll stash it away for consideration after -rc1. > > I've seen them in linux-next for a while, am I right in assuming they'll be > part of a 6.1 PR? yup. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] vmcoreinfo: add kallsyms_num_syms symbol
On Thu, 25 Aug 2022 10:27:04 -0700 Stephen Brennan wrote: > I just wanted to check on the status of this. I got your email about it > being added to mm-hotfixes-unstable, and I do still see it in the quilt > patches set: > > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/vmcoreinfo-add-kallsyms_num_syms-symbol.patch > > I see the patch in mm-everything-2022-08-15-02-47 but it seems to have > dropped out in mm-everything-2022-08-17-20-59 and later tags. As a > result, it no longer shows up in linux-next. > > Please excuse me if I'm ignorant of part of your process here, I know > you handle a huge volume of patches! Thanks for noticing this - I appear to have messed up a rebase and lost a batch of hotfixes :( I readded mm-hugetlb-avoid-corrupting-page-mapping-in-hugetlb_mcopy_atomic_pte.patch shmem-update-folio-if-shmem_replace_page-updates-the-page.patch writeback-avoid-use-after-free-after-removing-device.patch mailmap-update-guilherme-g-piccolis-email-addresses.patch vmcoreinfo-add-kallsyms_num_syms-symbol.patch mm-re-allow-pinning-of-zero-pfns-again.patch to the hotfixes queue. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 0/2] kexec, panic: Making crash_kexec() NMI safe
On Tue, 12 Jul 2022 12:13:03 +0100 Valentin Schneider wrote: > On 12/07/22 10:47, Baoquan He wrote: > > Hi, > > > > On 06/30/22 at 11:32pm, Valentin Schneider wrote: > >> Hi folks, > >> > >> Here's ~v3~ v4 where we now completely get rid of kexec_mutex. > >> > >> o Patch 1 makes sure all kexec_mutex acquisitions are trylocks. This > >> prevents > >> having to add any while(atomic_cmpxchg()) loops which I'd really hate to > >> see > >> here. If that can't be done then I think we're better off with the > >> combined > >> mutex+atomic var approach. > >> o Patch 2 does the mutex -> atomic var switch. > > > > This series looks good, has it been taken into any tree? > > > > I don't think so, briefly poked around git and haven't seen it anywhere. I'll stash it away for consideration after -rc1. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/2] Expose kallsyms data in vmcoreinfo note
On Mon, 13 Jun 2022 14:59:44 -0700 Stephen Brennan wrote: > >> Related discussion around the BTF side of this: > >> https://lore.kernel.org/bpf/586a6288-704a-f7a7-b256-e18a67592...@oracle.com/T/#u > >> > >> Some work-in-progress branches using this feature: > >> https://github.com/brenns10/dwarves/tree/remove_percpu_restriction_1 > >> https://github.com/brenns10/drgn/tree/kallsyms_plus_btf > > > > What's the story on using gdb with this? > > There is no story with GDB as of yet. I was already familiar with the > code of drgn when I started down this path, so that's what I used. Drgn > happens to have a very extensible type system which made it quite simple > to do. I'd love to see support for doing this with GDB, and might look > into the feasibility of it, but it's not on my roadmap right now. Naive question - could some standalone tool take this kallsyms-based info, combine it with a core image and create a minimally-dwarfified file which any debugger can munch on? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/2] Expose kallsyms data in vmcoreinfo note
On Mon, 16 May 2022 17:05:06 -0700 Stephen Brennan wrote: > The kernel can be configured to contain a lot of introspection or > debugging information built-in, such as ORC for unwinding stack traces, > BTF for type information, and of course kallsyms. Debuggers could use > this information to navigate a core dump or live system, but they need > to be able to find it. > > This patch series adds the necessary symbols into vmcoreinfo, which > would allow a debugger to find and interpret the kallsyms table. Using > the kallsyms data, the debugger can then lookup any symbol, allowing it > to find ORC, BTF, or any other useful data. > > This would allow a live kernel, or core dump, to be debugged without > any DWARF debuginfo. This is useful for many cases: the debuginfo may > not have been generated, or you may not want to deploy the large files > everywhere you need them. Am trying to understand the value of all of this. Can you explain further why carrying the dwarf info is problematic? How problematic are these large files? > I've demonstrated a proof of concept for this at LSF/MM+BPF during a > lighting talk. Using a work-in-progress branch of the drgn debugger, and > an extended set of BTF generated by a patched version of dwarves, I've > been able to open a core dump without any DWARF info and do basic tasks > such as enumerating slab caches, block devices, tasks, and doing > backtraces. I hope this series can be a first step toward a new > possibility of "DWARFless debugging". > > Related discussion around the BTF side of this: > https://lore.kernel.org/bpf/586a6288-704a-f7a7-b256-e18a67592...@oracle.com/T/#u > > Some work-in-progress branches using this feature: > https://github.com/brenns10/dwarves/tree/remove_percpu_restriction_1 > https://github.com/brenns10/drgn/tree/kallsyms_plus_btf What's the story on using gdb with this? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec_file: Drop weak attribute from arch_kexec_apply_relocations[_add]
On Fri, 20 May 2022 14:25:05 -0500 "Eric W. Biederman" wrote: > > I am not strongly against taking off __weak, just wondering if there's > > chance to fix it in recordmcount, and the cost comparing with kernel fix; > > except of this issue, any other weakness of __weak. Noticed Andrew has > > picked this patch, as a witness of this moment, raise a tiny concern. > > I just don't see what else we can realistically do. I think converting all of the kexec __weaks to use the ifdef approach makes sense, if only because kexec is now using two different styles. But for now, I'll send Naveen's v2 patch in to Linus to get us out of trouble. I'm thinking that we should add cc:stable to that patch as well, to reduce the amount of problems which people experience when using newer binutils on older kernels? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] kexec_file: Drop weak attribute from arch_kexec_apply_relocations[_add]
On Thu, 19 May 2022 23:17:48 +0800 kernel test robot wrote: > Hi "Naveen, > > I love your patch! Yet something to improve: > > [auto build test ERROR on f993aed406eaf968ba3867a76bb46c95336a33d0] > > url: > https://github.com/intel-lab-lkp/linux/commits/Naveen-N-Rao/kexec_file-Drop-weak-attribute-from-arch_kexec_apply_relocations-_add/20220519-171432 > base: f993aed406eaf968ba3867a76bb46c95336a33d0 > config: s390-allmodconfig > (https://download.01.org/0day-ci/archive/20220519/202205192320.coxevcfr-...@intel.com/config) > compiler: s390-linux-gcc (GCC) 11.3.0 > reproduce (this is a W=1 build): > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # > https://github.com/intel-lab-lkp/linux/commit/67171688c71cb5b05f26e0dfc45eec8d8d1428ff > git remote add linux-review https://github.com/intel-lab-lkp/linux > git fetch --no-tags linux-review > Naveen-N-Rao/kexec_file-Drop-weak-attribute-from-arch_kexec_apply_relocations-_add/20220519-171432 > git checkout 67171688c71cb5b05f26e0dfc45eec8d8d1428ff > # save the config file > mkdir build_dir && cp config build_dir/.config > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 > O=build_dir ARCH=s390 SHELL=/bin/bash arch/s390/ > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot > > All errors (new ones prefixed by >>): > >In file included from arch/s390/kernel/machine_kexec_reloc.c:3: > >> arch/s390/include/asm/kexec.h:89:38: error: unknown type name 'Elf_Shdr'; > >> did you mean 'elf_shdr'? > 89 | Elf_Shdr *section, > | ^~~~ > | elf_shdr >arch/s390/include/asm/kexec.h:90:44: error: unknown type name 'Elf_Shdr' > 90 | const Elf_Shdr *relsec, > |^~~~ >arch/s390/include/asm/kexec.h:91:44: error: unknown type name 'Elf_Shdr' > 91 | const Elf_Shdr *symtab); > |^~~~ Thanks, I did this: --- a/arch/s390/include/asm/kexec.h~kexec_file-drop-weak-attribute-from-arch_kexec_apply_relocations-fix +++ a/arch/s390/include/asm/kexec.h @@ -9,6 +9,8 @@ #ifndef _S390_KEXEC_H #define _S390_KEXEC_H +#include + #include #include #include _ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec_file: Drop weak attribute from arch_kexec_apply_relocations[_add]
On Wed, 18 May 2022 23:48:28 +0530 "Naveen N. Rao" wrote: > Since commit d1bcae833b32f1 ("ELF: Don't generate unused section > symbols") [1], binutils (v2.36+) started dropping section symbols that > it thought were unused. This isn't an issue in general, but with > kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a > separate .text.unlikely section and the section symbol ".text.unlikely" > is being dropped. Due to this, recordmcount is unable to find a non-weak > symbol in .text.unlikely to generate a relocation record against. > > Address this by dropping the weak attribute from these functions: > - arch_kexec_apply_relocations() is not overridden by any architecture > today, so just drop the weak attribute. > - arch_kexec_apply_relocations_add() is only overridden by x86 and s390. > Retain the function prototype for those and move the weak > implementation into the header as a static inline for other > architectures. > > ... > Sigh. This patch demonstrates why I like __weak :< > --- a/include/linux/kexec.h > +++ b/include/linux/kexec.h > @@ -229,6 +225,30 @@ extern int crash_exclude_mem_range(struct crash_mem *mem, > unsigned long long mend); > extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map, > void **addr, unsigned long *sz); > + > +#if defined(CONFIG_X86_64) || defined(CONFIG_S390) Let's avoid listing the architectures here? Better to add select ARCH_HAVE_ARCH_KEXEC_APPLY_RELOCATIONS_ADD to arch//Kconfig? Please cc me on any additional work on this. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 0/2] x86/kexec: fix memory leak of elf header buffer
On Thu, 31 Mar 2022 19:32:17 +0800 Baoquan He wrote: > Hi Andrew, > > On 02/23/22 at 07:32pm, Baoquan He wrote: > > The memory leak is reported by kmemleak detector, has been existing > > for very long time. It casue much memory loss on large machine > > with huge memory hotplug which will trigger kdump kernel reloading > > many times, with kexec_file_load interface. > > Could you merge these two patches? Or should I ping x86 maintainers to > take them? Ah, sorry, I tend to fall asleep if there's "x86" in the subject. Poking a sleeping Andrew is always the right thing to do. Shall look at them. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 0/3] Convert vmcore to use an iov_iter
> On Sat, 2 Apr 2022 12:30:05 +0800 Baoquan He wrote: You were on the patch delivery path, so these patches should have had your signoff. Documentation/process/submitting-patches.rst explains. > Copy the description of v3 cover letter from Willy: > === > For some reason several people have been sending bad patches to fix > compiler warnings in vmcore recently. Here's how it should be done. > Compile-tested only on x86. As noted in the first patch, s390 should > take this conversion a bit further, but I'm not inclined to do that > work myself. We should tell the S390 maintainers this! Can you please fix the signoff issue, add the S390 team to Cc and resend? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 0/4] Convert vmcore to use an iov_iter
On Thu, 31 Mar 2022 15:36:39 +0100 Matthew Wilcox wrote: > On Thu, Mar 31, 2022 at 07:25:33PM +0800, Baoquan He wrote: > > Hi Andrew, > > > > On 03/18/22 at 05:37pm, Baoquan He wrote: > > > Copy the description of v3 cover letter from Willy: > > > > Could you pick this series into your tree? I reviewed the patches 1~3 > > and tested the whole patchset, no issue found. > > ... I'd fold patch 4 into patch 1, I think so too, please. The addition then removal of a read_from_oldmem() implementation is a bit odd. > but yes, Andrew, please take these patches. And against current -linus please. There have been some changes since then (rcu stuff). ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] proc/vmcore: fix possible deadlock on concurrent mmap and read
On Wed, 19 Jan 2022 20:34:17 +0100 David Hildenbrand wrote: > Lockdep noticed that there is chance for a deadlock if we have > concurrent mmap, concurrent read, and the addition/removal of a > callback. > > As nicely explained by Boqun: > > " > Lockdep warned about the above sequences because rw_semaphore is a fair > read-write lock, and the following can cause a deadlock: > > TASK 1 TASK 2 TASK 3 > == == == > down_write(mmap_lock); > down_read(vmcore_cb_rwsem) > down_write(vmcore_cb_rwsem); // > blocked > down_read(vmcore_cb_rwsem); // cannot get the lock because of the > fairness > down_read(mmap_lock); // blocked I'm wondering about cc:stable. It's hard to believe that this is likely to be observed in real life. But the ongoing reports of lockdep splats will be irritating. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 0/5] Avoid requesting page from DMA zone when no managed pages
On Mon, 13 Dec 2021 20:27:07 +0800 Baoquan He wrote: > Background information can be checked in cover letter of v2 RESEND POST > as below: > https://lore.kernel.org/all/20211207030750.30824-1-...@redhat.com/T/#u Please include all relevant info right here, in the [0/n]. For a number of reasons, one of which is that the text is more likely to be up to date as the patchset evolves. It's unusual that this patchset has two non-urgent patches and the final three patches are cc:stable. It makes one worry that patches 3-5 might have dependencies on 1-2. Also, I'd expect to merge the three -stable patches during 5.16-rcX which means I have to reorder things, redo changelogs, update links and blah blah. So can I ask that you redo all of this as two patch series? A 3-patch series which is targeted at -stable, followed by a separate two-patch series which is targeted at 5.17-rc1. Each series with its own fully prepared [0/n] cover. Thanks. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kdump: vmcore: move copy_to() from vmcore.c to uaccess.h
On Fri, 10 Dec 2021 21:36:00 +0800 Tiezhu Yang wrote: > In arch/*/kernel/crash_dump*.c, there exist similar code about > copy_oldmem_page(), move copy_to() from vmcore.c to uaccess.h, > and then we can use copy_to() to simplify the related code. > > ... > > --- a/fs/proc/vmcore.c > +++ b/fs/proc/vmcore.c > @@ -238,20 +238,6 @@ copy_oldmem_page_encrypted(unsigned long pfn, char *buf, > size_t csize, > return copy_oldmem_page(pfn, buf, csize, offset, userbuf); > } > > -/* > - * Copy to either kernel or user space > - */ > -static int copy_to(void *target, void *src, size_t size, int userbuf) > -{ > - if (userbuf) { > - if (copy_to_user((char __user *) target, src, size)) > - return -EFAULT; > - } else { > - memcpy(target, src, size); > - } > - return 0; > -} > - > #ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP > static int vmcoredd_copy_dumps(void *dst, u64 start, size_t size, int > userbuf) > { > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h > index ac03940..4a6c3e4 100644 > --- a/include/linux/uaccess.h > +++ b/include/linux/uaccess.h > @@ -201,6 +201,20 @@ copy_to_user(void __user *to, const void *from, unsigned > long n) > return n; > } > > +/* > + * Copy to either kernel or user space > + */ > +static inline int copy_to(void *target, void *src, size_t size, int userbuf) > +{ > + if (userbuf) { > + if (copy_to_user((char __user *) target, src, size)) > + return -EFAULT; > + } else { > + memcpy(target, src, size); > + } > + return 0; > +} > + Ordinarily I'd say "this is too large to be inlined". But the function has only a single callsite per architecture so inlining it won't cause bloat at present. But hopefully copy_to() will get additional callers in the future, in which case it shouldn't be inlined. So I'm thinking it would be best to start out with this as a regular non-inlined function, in lib/usercopy.c. Also, copy_to() is a very poor name for a globally-visible helper function. Better would be copy_to_user_or_kernel(), although that's perhaps a bit long. And the `userbuf' arg should have type bool, yes? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
On Mon, 6 Dec 2021 22:03:59 -0600 John Donnelly wrote: > On 12/6/21 9:16 PM, Baoquan He wrote: > > Sorry, forgot adding x86 and x86/mm maintainers > > Hi, > >These commits need applied to Linux-5.15.0 (LTS) too since it has the > original regression : > > 1d659236fb43 ("dma-pool: scale the default DMA coherent pool > size with memory capacity") > > Maybe add "Fixes" to the other commits ? And cc:stable, please. "Fixes:" doesn't always mean "should be backported". ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] proc/vmcore: fix clearing user buffer by properly using clear_user()
On Fri, 12 Nov 2021 10:27:50 +0100 David Hildenbrand wrote: > To clear a user buffer we cannot simply use memset, we have to use > clear_user(). With a virtio-mem device that registers a vmcore_cb and has > some logically unplugged memory inside an added Linux memory block, I can > easily trigger a BUG by copying the vmcore via "cp": > > ... > > Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access > Prevention (SMAP)", which is used to detect wrong access from the kernel to > user buffers like this: SMAP triggers a permissions violation on wrong > access. In the x86-64 variant of clear_user(), SMAP is properly > handled via clac()+stac(). > > To fix, properly use clear_user() when we're dealing with a user buffer. > I added cc:stable, OK? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 8/9] mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
On Tue, 8 Jun 2021 12:13:15 +0300 Mike Rapoport wrote: > From: Mike Rapoport > > After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA > configuration options are equivalent. > > Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead. > > Done with > > $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \ > $(git grep -wl CONFIG_NEED_MULTIPLE_NODES) > $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \ > $(git grep -wl NEED_MULTIPLE_NODES) > > with manual tweaks afterwards. > > ... > > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -987,7 +987,7 @@ extern int movable_zone; > #ifdef CONFIG_HIGHMEM > static inline int zone_movable_is_highmem(void) > { > -#ifdef CONFIG_NEED_MULTIPLE_NODES > +#ifdef CONFIG_NUMA > return movable_zone == ZONE_HIGHMEM; > #else > return (ZONE_MOVABLE - 1) == ZONE_HIGHMEM; I dropped this hunk - your "mm/mmzone.h: simplify is_highmem_idx()" removed zone_movable_is_highmem(). ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] crash_core, vmcoreinfo: Append 'SECTION_SIZE_BITS' to vmcoreinfo
On Tue, 8 Jun 2021 22:24:32 +0800 Baoquan He wrote: > On 06/08/21 at 06:33am, Pingfan Liu wrote: > > As mentioned in kernel commit 1d50e5d0c505 ("crash_core, vmcoreinfo: > > Append 'MAX_PHYSMEM_BITS' to vmcoreinfo"), SECTION_SIZE_BITS in the > > formula: > > #define SECTIONS_SHIFT(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) > > > > Besides SECTIONS_SHIFT, SECTION_SIZE_BITS is also used to calculate > > PAGES_PER_SECTION in makedumpfile just like kernel. > > > > Unfortunately, this arch-dependent macro SECTION_SIZE_BITS changes, e.g. > > recently in kernel commit f0b13ee23241 ("arm64/sparsemem: reduce > > SECTION_SIZE_BITS"). But user space wants a stable interface to get this > > info. Such info is impossible to be deduced from a crashdump vmcore. > > Hence append SECTION_SIZE_BITS to vmcoreinfo. > > ... > > Add the discussion of the original thread in kexec ML for reference: > http://lists.infradead.org/pipermail/kexec/2021-June/022676.html I added a Link: for this. > This looks good to me. > > Acked-by: Baoquan He I'm thinking we should backport this at least to Fixes:f0b13ee23241. But perhaps it's simpler to just backport it as far as possible, so I added a bare cc:stable with no Fixes:. Thoughts? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH V2 4/6] mm: rename the global section array to mem_sections
On Wed, 2 Jun 2021 01:11:07 + HAGIO KAZUHITO(萩尾 一仁) wrote: > -Original Message- > > On Tue, 1 Jun 2021 10:40:09 +0200 David Hildenbrand > > wrote: > > > > > > Thanks, i explained the reason during my last reply. > > > > Andrew has already picked this patch to -mm tree. > > > > > > Just because it's in Andrews tree doesn't mean it will end up upstream. ;) > > > > > > Anyhow, no really strong opinion, it's simply unnecessary code churn > > > that makes bisecting harder without real value IMHO. > > > > I think it's a good change - mem_sections refers to multiple instances > > of a mem_section. Churn is a pain, but that's the price we pay for more > > readable code. And for having screwed it up originally ;) > > >From a makedumpfile/crash-utility viewpoint, I don't deny kernel improvement > and probably the change will not be hard for them to support, but I'd like > you to remember that the tool users will need to update them for the change. > > The situation where we need to update the tools for new kernels is usual, but > there are not many cases that they cannot even start session, and this change > will cause it. Personally I wonder the change is worth forcing users to > update > them. Didn't know that. I guess I'll drop it then. We could do an assembly-level alias I assume.. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH V2 4/6] mm: rename the global section array to mem_sections
On Tue, 1 Jun 2021 10:40:09 +0200 David Hildenbrand wrote: > > Thanks, i explained the reason during my last reply. > > Andrew has already picked this patch to -mm tree. > > Just because it's in Andrews tree doesn't mean it will end up upstream. ;) > > Anyhow, no really strong opinion, it's simply unnecessary code churn > that makes bisecting harder without real value IMHO. I think it's a good change - mem_sections refers to multiple instances of a mem_section. Churn is a pain, but that's the price we pay for more readable code. And for having screwed it up originally ;) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers
On Wed, 7 Apr 2021 11:46:37 +0300 Andy Shevchenko wrote: > On Wed, Apr 7, 2021 at 11:17 AM Kees Cook wrote: > > > > On Tue, Apr 06, 2021 at 04:31:58PM +0300, Andy Shevchenko wrote: > > > kernel.h is being used as a dump for all kinds of stuff for a long time. > > > Here is the attempt to start cleaning it up by splitting out panic and > > > oops helpers. > > > > > > At the same time convert users in header and lib folder to use new header. > > > Though for time being include new header back to kernel.h to avoid twisted > > > indirected includes for existing users. > > > > > > Signed-off-by: Andy Shevchenko > > > > I like it! Do you have a multi-arch CI to do allmodconfig builds to > > double-check this? > > Unfortunately no, I rely on plenty of bots that are harvesting mailing lists. > > But I will appreciate it if somebody can run this through various build tests. > um, did you try x86_64 allmodconfig? I'm up to kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix-fix.patch and counting. From: Andrew Morton Subject: kernelh-split-out-panic-and-oops-helpers-fix more files need panic_notifier.h Cc: Andy Shevchenko Signed-off-by: Andrew Morton --- arch/x86/xen/enlighten.c|1 + drivers/video/fbdev/hyperv_fb.c |1 + 2 files changed, 2 insertions(+) --- a/arch/x86/xen/enlighten.c~kernelh-split-out-panic-and-oops-helpers-fix +++ a/arch/x86/xen/enlighten.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include --- a/drivers/video/fbdev/hyperv_fb.c~kernelh-split-out-panic-and-oops-helpers-fix +++ a/drivers/video/fbdev/hyperv_fb.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include _ From: Andrew Morton Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix arch/x86/purgatory/purgatory.c needs kernel.h Cc: Andy Shevchenko Signed-off-by: Andrew Morton --- arch/x86/purgatory/purgatory.c |1 + 1 file changed, 1 insertion(+) --- a/arch/x86/purgatory/purgatory.c~kernelh-split-out-panic-and-oops-helpers-fix-fix +++ a/arch/x86/purgatory/purgatory.c @@ -8,6 +8,7 @@ * Vivek Goyal */ +#include #include #include #include _ From: Andrew Morton Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix drivers/clk/analogbits/wrpll-cln28hpc.c needs minmax.h, math.h and limits.h Cc: Andy Shevchenko Signed-off-by: Andrew Morton --- drivers/clk/analogbits/wrpll-cln28hpc.c |4 1 file changed, 4 insertions(+) --- a/drivers/clk/analogbits/wrpll-cln28hpc.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix +++ a/drivers/clk/analogbits/wrpll-cln28hpc.c @@ -25,6 +25,10 @@ #include #include #include +#include +#include +#include + #include /* MIN_INPUT_FREQ: minimum input clock frequency, in Hz (Fref_min) */ _ From: Andrew Morton Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix drivers/misc/pvpanic/pvpanic.c needs panic_notifier.h Cc: Andy Shevchenko Signed-off-by: Andrew Morton --- drivers/misc/pvpanic/pvpanic.c |1 + 1 file changed, 1 insertion(+) --- a/drivers/misc/pvpanic/pvpanic.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix +++ a/drivers/misc/pvpanic/pvpanic.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include _ From: Andrew Morton Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix fix drivers/misc/pvpanic/pvpanic.c and drivers/net/ipa/ipa_smp2p.c Cc: Andy Shevchenko Signed-off-by: Andrew Morton --- drivers/net/ipa/ipa_smp2p.c |1 + 1 file changed, 1 insertion(+) --- a/drivers/net/ipa/ipa_smp2p.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix +++ a/drivers/net/ipa/ipa_smp2p.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include _ From: Andrew Morton Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix fix drivers/power/reset/ltc2952-poweroff.c and drivers/misc/bcm-vk/bcm_vk_dev.c Cc: Andy Shevchenko Signed-off-by: Andrew Morton --- drivers/misc/bcm-vk/bcm_vk_dev.c |1 + drivers/power/reset/ltc2952-poweroff.c |1 + 2 files changed, 2 insertions(+) --- a/drivers/power/reset/ltc2952-poweroff.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix +++ a/drivers/power/reset/ltc2952-poweroff.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include #include --- a/drivers/misc/bcm-vk/bcm_vk_dev.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix +++ a/drivers/misc/bcm-vk/bcm_vk_dev.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include _ From: Andrew Morton Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix-fix fix drivers/leds/trigger/ledtrig-panic.c and drivers/firmware/google/gsmi.c Cc: Andy Sh
Re: [EXTERNAL] Re: [PATCH] kexec: Add kexec reboot string
On Thu, 11 Mar 2021 18:14:19 + Joe LeVeque wrote: > Is this all your looking for? If not, please let me know. > > > Signed-off-by: Joe LeVeque Yes, thanks. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Add kexec reboot string
On Thu, 4 Mar 2021 13:46:26 +0100 Paul Menzel wrote: > From: Joe LeVeque > > The purpose is to notify the kernel module for fast reboot. > > Upstream a patch from the SONiC network operating system [1]. > > [1]: https://github.com/Azure/sonic-linux-kernel/pull/46 > > Signed-off-by: Paul Menzel We should have Joe's signed-off-by: for this. Joe, can you please send it? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time
On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young wrote: > crash_kexec_post_notifiers enables running various panic notifier > before kdump kernel booting. This increases risks of kdump failure. > It is well documented in kernel-parameters.txt. We do not suggest > people to enable it together with kdump unless he/she is really sure. > This is also not suggested to be enabled by default when users are > not aware in distributions. > > But unfortunately it is enabled by default in systemd, see below > discussions in a systemd report, we can not convince systemd to change > it: > https://github.com/systemd/systemd/issues/16661 > > Actually we have got reports about kdump kernel hangs in both s390x > and powerpcle cases caused by the systemd change, also some x86 cases > could also be caused by the same (although that is in Hyper-V code > instead of systemd, that need to be addressed separately). > > Thus to avoid the auto enablement here just disable the param writable > permission in sysfs. > Well. I don't think this is at all a desirable way of resolving a disagreement with the systemd developers At the above github address I'm seeing "ryncsn added a commit to ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't enable crash_kexec_post_notifiers by default". So didn't that address the issue? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2][RFC] kdump: append kernel build-id string to VMCOREINFO
On Wed, 10 Jun 2020 21:27:52 -0700 Vijay Balakrishna wrote: > Make kernel GNU build-id available in VMCOREINFO. Having > build-id in VMCOREINFO facilitates presenting appropriate kernel > namelist image with debug information file to kernel crash dump > analysis tools. Currently VMCOREINFO lacks uniquely identifiable > key for crash analysis automation. > > Regarding if this patch is necessary or matching of linux_banner > and OSRELEASE in VMCOREINFO employed by crash(8) meets the > need -- IMO, build-id approach more foolproof, in most instances it > is a cryptographic hash generated using internal code/ELF bits unlike > kernel version string upon which linux_banner is based that is > external to the code. I feel each is intended for a different purpose. > Also OSRELEASE is not suitable when two different kernel builds > from same version with different features enabled. > > Currently for most linux (and non-linux) systems build-id can be > extracted using standard methods for file types such as user mode crash > dumps, shared libraries, loadable kernel modules etc., This is an > exception for linux kernel dump. Having build-id in VMCOREINFO brings > some uniformity for automation tools. > > ... > > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -11,6 +11,8 @@ > #include > #include > > +#include > + > /* vmcoreinfo stuff */ > unsigned char *vmcoreinfo_data; > size_t vmcoreinfo_size; > @@ -376,6 +378,53 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void) > } > EXPORT_SYMBOL(paddr_vmcoreinfo_note); > > +#define NOTES_SIZE (&__stop_notes - &__start_notes) > +#define BUILD_ID_MAX SHA1_DIGEST_SIZE > +#define NT_GNU_BUILD_ID 3 > + > +struct elf_note_section { > + struct elf_note n_hdr; > + u8 n_data[]; > +}; > + > +/* > + * Add build ID from .notes section as generated by the GNU ld(1) > + * or LLVM lld(1) --build-id option. > + */ > +static void add_build_id_vmcoreinfo(void) > +{ > + char build_id[BUILD_ID_MAX * 2 + 1]; > + int n_remain = NOTES_SIZE; > + > + while (n_remain >= sizeof(struct elf_note)) { > + const struct elf_note_section *note_sec = > + &__start_notes + NOTES_SIZE - n_remain; > + const u32 n_namesz = note_sec->n_hdr.n_namesz; > + > + if (note_sec->n_hdr.n_type == NT_GNU_BUILD_ID && > + n_namesz != 0 && > + !strcmp((char *)_sec->n_data[0], "GNU")) { Is it guaranteed that n_data[] is null-terminated? > + if (note_sec->n_hdr.n_descsz <= BUILD_ID_MAX) { > + const u32 n_descsz = note_sec->n_hdr.n_descsz; > + const u8 *s = _sec->n_data[n_namesz]; > + > + s = PTR_ALIGN(s, 4); > + bin2hex(build_id, s, n_descsz); > + build_id[2 * n_descsz] = '\0'; > + VMCOREINFO_BUILD_ID(build_id); > + return; > + } > + pr_warn("Build ID is too large to include in > vmcoreinfo: %u > %u\n", > + note_sec->n_hdr.n_descsz, > + BUILD_ID_MAX); > + return; > + } > + n_remain -= sizeof(struct elf_note) + > + ALIGN(note_sec->n_hdr.n_namesz, 4) + > + ALIGN(note_sec->n_hdr.n_descsz, 4); > + } > +} > + > static int __init crash_save_vmcoreinfo_init(void) > { > vmcoreinfo_data = (unsigned char *)get_zeroed_page(GFP_KERNEL); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] kexec: Do not verify the signature without the lockdown or mandatory signature
On Tue, 2 Jun 2020 12:59:52 +0800 Lianbo Jiang wrote: > Signature verification is an important security feature, to protect > system from being attacked with a kernel of unknown origin. Kexec > rebooting is a way to replace the running kernel, hence need be > secured carefully. I'm finding this changelog quite hard to understand, > In the current code of handling signature verification of kexec kernel, > the logic is very twisted. It mixes signature verification, IMA signature > appraising and kexec lockdown. > > If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of > signature, the supported crypto, and key, we don't think this is wrong, I think this is saying that in the absence of KEXEC_SIG_FORCE and if the signature/crypto/key are all incorrect, the kexec still succeeds, but it should not. > Unless kexec lockdown is executed. IMA is considered as another kind of > signature appraising method. > > If kexec kernel image has signature/crypto/key, it has to go through the > signature verification and pass. Otherwise it's seen as verification > failure, and won't be loaded. I don't know if this is describing the current situation or the post-patch situation. > Seems kexec kernel image with an unqualified signature is even worse than > those w/o signature at all, this sounds very unreasonable. E.g. If people > get a unsigned kernel to load, or a kernel signed with expired key, which > one is more dangerous? > > So, here, let's simplify the logic to improve code readability. If the > KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification > is mandated. Otherwise, we lift the bar for any kernel image. I think the whole thing needs a rewrite. Start out by fully describing the current situation. THen describe what is wrong with it, and why. Then describe the proposed change. Or something along these lines. The changelog should also make clear the end-user impact of the patch. In sufficient detail for others to decide which kernel version(s) should be patched. Your recommendations will also be valuable - which kernel version(s) do you think should be patched, and why? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
On Thu, 26 Mar 2020 18:07:29 + James Morse wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. > > ... > > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 > start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); I suppose we should do this as well: --- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix +++ a/mm/memory_hotplug.c @@ -129,7 +129,8 @@ static struct resource *register_memory_ resource_name, flags); if (!res) { - pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", + pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME + " region: %016llx->%016llx\n", start, start + size); return ERR_PTR(-EEXIST); } It assumes that MEMORY_HOTPLUG_RES_NAME will be a literal string, which is the case in [3/3]. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 1/4] device-dax: Don't leak kernel memory to user space after unloading kmem
On Fri, 8 May 2020 10:42:14 +0200 David Hildenbrand wrote: > Assume we have kmem configured and loaded: > [root@localhost ~]# cat /proc/iomem > ... > 14000-33fff : Persistent Memory$ > 14000-1481f : namespace0.0 > 15000-33fff : dax0.0 > 15000-33fff : System RAM > > Assume we try to unload kmem. This force-unloading will work, even if > memory cannot get removed from the system. > [root@localhost ~]# rmmod kmem > [ 86.380228] removing memory fails, because memory > [0x00015000-0x000157ff] is onlined > ... > [ 86.431225] kmem dax0.0: DAX region [mem 0x15000-0x33fff] cannot > be hotremoved until the next reboot > > Now, we can reconfigure the namespace: > [root@localhost ~]# ndctl create-namespace --force --reconfig=namespace0.0 > --mode=devdax > [ 131.409351] nd_pmem namespace0.0: could not reserve region [mem > 0x14000-0x33fff]dax > [ 131.410147] nd_pmem: probe of namespace0.0 failed with error > -16namespace0.0 --mode=devdax > ... > > This fails as expected due to the busy memory resource, and the memory > cannot be used. However, the dax0.0 device is removed, and along its name. > > The name of the memory resource now points at freed memory (name of the > device). > [root@localhost ~]# cat /proc/iomem > ... > 14000-33fff : Persistent Memory > 14000-1481f : namespace0.0 > 15000-33fff : �_�^7_��/_��wR��WQ���^��� ... > 15000-33fff : System RAM > > We have to make sure to duplicate the string. While at it, remove the > superfluous setting of the name and fixup a stale comment. > > Fixes: 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used > like normal RAM") > Cc: sta...@vger.kernel.org # v5.3 hm. Is this really -stable material? These are all privileged operations, I expect? Assuming "yes", I've queued this separately, staged for 5.7-rcX. I'll redo patches 2-4 as a three-patch series for 5.8-rc1. > Cc: Dan Williams > Cc: Vishal Verma > Cc: Dave Jiang > Cc: Pavel Tatashin Reviewers, please ;) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On Mon, 13 Apr 2020 08:15:23 -0500 ebied...@xmission.com (Eric W. Biederman) wrote: > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. Is that a nack for James's patchset? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
It's unclear (to me) what is the status of this patchset. But it does appear that an new version can be expected? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCHv2] mm/sparse: reset section's mem_map when fully deactivated
On Mon, 20 Jan 2020 08:29:39 +0100 Michal Hocko wrote: > On Mon 20-01-20 10:33:14, Pingfan Liu wrote: > > After commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug"), > > when a mem section is fully deactivated, section_mem_map still records the > > section's start pfn, which is not used any more and will be reassigned > > during re-added. > > > > In analogy with alloc/free pattern, it is better to clear all fields of > > section_mem_map. > > > > Beside this, it breaks the user space tool "makedumpfile" [1], which makes > > assumption that a hot-removed section has mem_map as NULL, instead of > > checking directly against SECTION_MARKED_PRESENT bit. (makedumpfile will be > > better to change the assumption, and need a patch) > > > > The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" , > > trigger a crash, and save vmcore by makedumpfile > > While makedumpfile lives very closely to the kernel and occasional > breakage is to be expected I still believe that Fixes: ba72b4c8cf60 > is due. But not a cc:stable? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 0/8] mm/kdump: allow to exclude pages that are logically offline
On Wed, 27 Feb 2019 13:32:14 +0800 Dave Young wrote: > This series have been in -next for some days, could we get this in > mainline? It's been in -next for two months? > Andrew, do you have plan about them, maybe next release? They're all reviewed except for "xen/balloon: mark inflated pages PG_offline". (https://ozlabs.org/~akpm/mmotm/broken-out/xen-balloon-mark-inflated-pages-pg_offline.patch). Yes, I plan on sending these to Linus during the merge window for 5.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He wrote: > Hi Andrew, > > On 07/18/18 at 03:33pm, Andrew Morton wrote: > > On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He wrote: > > > > > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which > > > is used to load kernel/initrd/purgatory is supposed to be allocated from > > > top to down. This is what we have been doing all along in the old kexec > > > loading interface and the kexec loading is still default setting in some > > > distributions. However, the current kexec_file loading interface doesn't > > > do like this. The function arch_kexec_walk_mem() it calls ignores checking > > > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through > > > all resources of System RAM from bottom to up, to try to find memory > > > region > > > which can contain the specific kexec buffer, then call > > > locate_mem_hole_callback() > > > to allocate memory in that found memory region from top to down. This > > > brings > > > confusion especially when KASLR is widely supported , users have to make > > > clear > > > why kexec/kdump kernel loading position is different between these two > > > interfaces in order to exclude unnecessary noises. Hence these two > > > interfaces > > > need be unified on behaviour. > > > > As far as I can tell, the above is the whole reason for the patchset, > > yes? To avoid confusing users. > > > In fact, it's not just trying to avoid confusing users. Kexec loading > and kexec_file loading are just do the same thing in essence. Just we > need do kernel image verification on uefi system, have to port kexec > loading code to kernel. > > Kexec has been a formal feature in our distro, and customers owning > those kind of very large machine can make use of this feature to speed > up the reboot process. On uefi machine, the kexec_file loading will > search place to put kernel under 4G from top to down. As we know, the > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume > it. It may have possibility to not be able to find a usable space for > kernel/initrd. From the top down of the whole memory space, we don't > have this worry. > > And at the first post, I just posted below with AKASHI's > walk_system_ram_res_rev() version. Later you suggested to use > list_head to link child sibling of resource, see what the code change > looks like. > http://lkml.kernel.org/r/20180322033722.9279-1-...@redhat.com > > Then I posted v2 > http://lkml.kernel.org/r/20180408024724.16812-1-...@redhat.com > Rob Herring mentioned that other components which has this tree struct > have planned to do the same thing, replacing the singly linked list with > list_head to link resource child sibling. Just quote Rob's words as > below. I think this could be another reason. > > ~ From Rob > The DT struct device_node also has the same tree structure with > parent, child, sibling pointers and converting to list_head had been > on the todo list for a while. ACPI also has some tree walking > functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a > common tree struct and helpers defined either on top of list_head or a > ~ > new struct if that saves some size. Please let's get all this into the changelogs? > > > > Is that sufficient? Can we instead simplify their lives by providing > > better documentation or informative printks or better Kconfig text, > > etc? > > > > And who *are* the people who are performing this configuration? Random > > system administrators? Linux distro engineers? If the latter then > > they presumably aren't easily confused! > > Kexec was invented for kernel developer to speed up their kernel > rebooting. Now high end sever admin, kernel developer and QE are also > keen to use it to reboot large box for faster feature testing, bug > debugging. Kernel dev could know this well, about kernel loading > position, admin or QE might not be aware of it very well. > > > > > In other words, I'm trying to understand how much benefit this patchset > > will provide to our users as a whole. > > Understood. The list_head replacing patch truly involes too many code > changes, it's risky. I am willing to try any idea from reviewers, won't > persuit they have to be accepted finally. If don't have a try, we don't > know what it looks like, and what impact it may have. I am fine to take > AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even > though it could be a little bit low efficient. The larger patch produces a better result. We can handle it ;) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He wrote: > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which > is used to load kernel/initrd/purgatory is supposed to be allocated from > top to down. This is what we have been doing all along in the old kexec > loading interface and the kexec loading is still default setting in some > distributions. However, the current kexec_file loading interface doesn't > do like this. The function arch_kexec_walk_mem() it calls ignores checking > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through > all resources of System RAM from bottom to up, to try to find memory region > which can contain the specific kexec buffer, then call > locate_mem_hole_callback() > to allocate memory in that found memory region from top to down. This brings > confusion especially when KASLR is widely supported , users have to make clear > why kexec/kdump kernel loading position is different between these two > interfaces in order to exclude unnecessary noises. Hence these two interfaces > need be unified on behaviour. As far as I can tell, the above is the whole reason for the patchset, yes? To avoid confusing users. Is that sufficient? Can we instead simplify their lives by providing better documentation or informative printks or better Kconfig text, etc? And who *are* the people who are performing this configuration? Random system administrators? Linux distro engineers? If the latter then they presumably aren't easily confused! In other words, I'm trying to understand how much benefit this patchset will provide to our users as a whole. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kdump: add default crashkernel reserve kernel config options
On Mon, 21 May 2018 10:53:37 +0800 Dave Youngwrote: > This is a rework of the crashkernel=auto patches back to 2009 although > I'm not sure if below is the last version of the old effort: > https://lkml.org/lkml/2009/8/12/61 > https://lwn.net/Articles/345344/ > > I changed the original design, instead of adding the auto reserve logic > in code, in this patch just introduce two kernel config options for > the default crashkernel value in MB and the threshold of system memory > in MB so that only reserve default when system memory is equal or > above the threshold. > > With the kernel configs distributions can easily change the default > values so that people do not need to manually set kernel cmdline > for common use cases and one can still overwrite the default value > with manual setup or disable it by using crashkernel=0 > > Signed-off-by: Dave Young > --- > Another difference is with original design the crashkernel size scales > with system memory, according to test, large machine may need more > memory in kdump kernel because of several factors: > 1. cpu numbers, because of the percpu memory allocated for cpus. >(kdump can use nr_cpus=1 to workaround this, but some > arches do not support nr_cpus=X for example powerpc) > 2. IO devices, large system can have a lot of io devices, although we >can try to only add those device drivers we needed, it is still a >problem because of some built-in drivers, some stacked logical devices >eg. device mapper devices, acpi etc. Even if only considering the >meta data for driver model it will still be a big number eg. sysfs >files etc. > 3. The minimum memory requirement for some device drivers are big, even >if some of them have implemented low meory profile. It is usual to see >10M memory use for a storage driver. > 4. user space initramfs size growing. Busybox is not usable if we need >to add udev support and some complicate storage support. Use dracut >with systemd, especially networking stuff need more memory. > > So probably add another kernel config option to scale the memory size > eg. CRASHKERNEL_DEFAULT_SCALE_RATIO is also good to have, in RHEL we > use base_value + system_mem >> (2^14) for x86. I'm still hesatating > how to describe and add this option. Any suggestions will be appreciated. > > ... > > --- linux-x86.orig/arch/Kconfig > +++ linux-x86/arch/Kconfig > @@ -10,6 +10,22 @@ config KEXEC_CORE > select CRASH_CORE > bool > > +config CRASHKERNEL_DEFAULT_THRESHOLD_MB > + int "System memory size threshold for kdump memory default reserving" > + depends on CRASH_CORE > + default 0 > + help > + CRASHKERNEL_DEFAULT_MB is used as default crashkernel value if > + the system memory size is equal or bigger than the threshold. "the threshold" is rather vague. Can it be clarified? In fact I'm really struggling to understand the logic here > +config CRASHKERNEL_DEFAULT_MB > + int "Default crashkernel memory size reserved for kdump" > + depends on CRASH_CORE > + default 0 > + help > + This is used as the default kdump reserved memory size in MB. > + crashkernel=X kernel cmdline can overwrite this value. > + > config HAVE_IMA_KEXEC > bool > > @@ -143,6 +144,24 @@ static int __init parse_crashkernel_simp > return 0; > } > > +static int __init get_crashkernel_default(unsigned long long system_ram, > + unsigned long long *size) > +{ > + unsigned long long sz = CONFIG_CRASHKERNEL_DEFAULT_MB; > + unsigned long long thres = CONFIG_CRASHKERNEL_DEFAULT_THRESHOLD_MB; > + > + thres *= SZ_1M; > + sz *= SZ_1M; > + > + if (sz >= system_ram || system_ram < thres) { > + pr_debug("crashkernel default size can not be used.\n"); > + return -EINVAL; In other words, if (system_ram <= CONFIG_CRASHKERNEL_DEFAULT_MB || system_ram < CONFIG_CRASHKERNEL_DEFAULT_THRESHOLD_MB) fail; yes? How come? What's happening here? Perhaps a (good) explanatory comment is needed. And clearer Kconfig text. All confused :( ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()
On Fri, 23 Mar 2018 11:10:13 +0800 Baoquan He <b...@redhat.com> wrote: > On 03/22/18 at 07:06pm, Andrew Morton wrote: > > On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan He <b...@redhat.com> wrote: > > > > > > erk, this is pretty nasty. Isn't there a better way :( > > > > > > Yes, this is not efficient. > > > > > > In struct resource{}, ->sibling list is a singly linked list. I ever > > > thought about changing it to doubly linked list, yet not very sure if > > > it will have effect since struct resource is a core data structure. > > > > Switching to a list_head sounds OK. The only issue really is memory > > consumption and surely we don't have tens of thousands of struct > > resources floating about(?). Or if we do have a lot, the machine is > > presumably huge (hope?). > > Yes. It doubles the memory consumption. > > AFAIK, the biggest number of resrouces I heard of possibly is mentioned > in this user space kexec_tools commit. In this commit, Xunlei told on > SGI system with 64TB RAM, the array which we have been using to store > "System RAM"|"Reserved"|"ACPI **" regions is not big enough. In that > case, we need extra 8Byte*2048=16KB at most. With my understanding, this > increase is system wide, since each resource instance only needs its own > list_head member, right? Yes. That sounds perfectly acceptable. It would be interesting to see what this approach looks like, if you have time to toss something together? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()
On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan Hewrote: > > erk, this is pretty nasty. Isn't there a better way :( > > Yes, this is not efficient. > > In struct resource{}, ->sibling list is a singly linked list. I ever > thought about changing it to doubly linked list, yet not very sure if > it will have effect since struct resource is a core data structure. Switching to a list_head sounds OK. The only issue really is memory consumption and surely we don't have tens of thousands of struct resources floating about(?). Or if we do have a lot, the machine is presumably huge (hope?). > AKASHI's method is more acceptable, and currently only kexec has this > requirement. What method is that? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/2] Kexec_file: Load kernel at top of system ram
On Thu, 22 Mar 2018 11:37:20 +0800 Baoquan Hewrote: > The current kexec_file ignores kexec_buf.top_down value when call > arch_kexec_walk_mem() to allocate memory for loading kernel/initrd > stuffs. This is not supposed to be what kexec_buf.top_down is used > for. OK, but why is this a problem? When fixing a bug, please fully describe the user-visible effects of that bug. That helps others understand the importance of the fix, whether it should be backported into various kernels, etc. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()
On Thu, 22 Mar 2018 11:37:21 +0800 Baoquan Hewrote: > From: AKASHI Takahiro > > This function, being a variant of walk_system_ram_res() introduced in > commit 8c86e70acead ("resource: provide new functions to walk through > resources"), walks through a list of all the resources of System RAM > in reversed order, i.e., from higher to lower. > > It will be used in kexec_file code. > > ... > > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -23,6 +23,8 @@ > #include > #include > #include > +#include > +#include > #include > > > @@ -470,6 +472,67 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, > } > > /* > + * This function, being a variant of walk_system_ram_res(), calls the @func > + * callback against all memory ranges of type System RAM which are marked as > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > + * higher to lower. > + */ This should document the return value, as should walk_system_ram_res(). Why does it return -1 on error rather than an errno (ENOMEM)? > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > + int (*func)(struct resource *, void *)) > +{ > + struct resource res, *rams; > + int rams_size = 16, i; > + int ret = -1; > + > + /* create a list */ > + rams = vmalloc(sizeof(struct resource) * rams_size); > + if (!rams) > + return ret; > + > + res.start = start; > + res.end = end; > + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + i = 0; > + while ((res.start < res.end) && > + (!find_next_iomem_res(, IORES_DESC_NONE, true))) { > + if (i >= rams_size) { > + /* re-alloc */ > + struct resource *rams_new; > + int rams_new_size; > + > + rams_new_size = rams_size + 16; > + rams_new = vmalloc(sizeof(struct resource) > + * rams_new_size); > + if (!rams_new) > + goto out; > + > + memcpy(rams_new, rams, > + sizeof(struct resource) * rams_size); > + vfree(rams); > + rams = rams_new; > + rams_size = rams_new_size; > + } > + > + rams[i].start = res.start; > + rams[i++].end = res.end; > + > + res.start = res.end + 1; > + res.end = end; > + } > + > + /* go reverse */ > + for (i--; i >= 0; i--) { > + ret = (*func)([i], arg); > + if (ret) > + break; > + } erk, this is pretty nasty. Isn't there a better way :( > +out: > + vfree(rams); > + return ret; > +} ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 00/11] kexec_file: Clean up purgatory load
On Wed, 21 Mar 2018 12:27:40 +0100 Philipp Rudowrote: > here is the updated patch set including Dave's comments. There were some syntactic clashes with the "kexec_file, x86, powerpc: refactoring for other architecutres" series. Please check (in linux-next) that I got it all correct and that both patch series are functioning correctly. I expect to send all this into Stephen for -next tomorrow. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATHC v2 0/9] ima: carry the measurement list across kexec
On Tue, 30 Aug 2016 18:40:02 -0400 Mimi Zoharwrote: > The TPM PCRs are only reset on a hard reboot. In order to validate a > TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list > of the running kernel must be saved and then restored on the subsequent > boot, possibly of a different architecture. > > The existing securityfs binary_runtime_measurements file conveniently > provides a serialized format of the IMA measurement list. This patch > set serializes the measurement list in this format and restores it. > > Up to now, the binary_runtime_measurements was defined as architecture > native format. The assumption being that userspace could and would > handle any architecture conversions. With the ability of carrying the > measurement list across kexec, possibly from one architecture to a > different one, the per boot architecture information is lost and with it > the ability of recalculating the template digest hash. To resolve this > problem, without breaking the existing ABI, this patch set introduces > the boot command line option "ima_canonical_fmt", which is arbitrarily > defined as little endian. > > The need for this boot command line option will be limited to the > existing version 1 format of the binary_runtime_measurements. > Subsequent formats will be defined as canonical format (eg. TPM 2.0 > support for larger digests). > > This patch set pre-req's Thiago Bauermann's "kexec_file: Add buffer > hand-over for the next kernel" patch set. > > These patches can also be found in the next-kexec-restore branch of: > git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git I'll merge these into -mm to get some linux-next exposure. I don't know what your upstream merge planes will be? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 0/5] kexec_file: Add buffer hand-over for the next kernel
On Thu, 25 Aug 2016 15:18:26 -0300 Thiago Jung Bauermannwrote: > Hello, > > This patch series implements a mechanism which allows the kernel to pass > on a buffer to the kernel that will be kexec'd. This buffer is passed > as a segment which is added to the kimage when it is being prepared > by kexec_file_load. > > How the second kernel is informed of this buffer is architecture-specific. > On powerpc, this is done via the device tree, by checking > the properties /chosen/linux,kexec-handover-buffer-start and > /chosen/linux,kexec-handover-buffer-end, which is analogous to how the > kernel finds the initrd. > > This is needed because the Integrity Measurement Architecture subsystem > needs to preserve its measurement list accross the kexec reboot. The > following patch series for the IMA subsystem uses this feature for that > purpose: > > https://lists.infradead.org/pipermail/kexec/2016-August/016745.html > > This is so that IMA can implement trusted boot support on the OpenPower > platform, because on such systems an intermediary Linux instance running > as part of the firmware is used to boot the target operating system via > kexec. Using this mechanism, IMA on this intermediary instance can > hand over to the target OS the measurements of the components that were > used to boot it. > > Because there could be additional measurement events between the > kexec_file_load call and the actual reboot, IMA needs a way to update the > buffer with those additional events before rebooting. One can minimize > the interval between the kexec_file_load and the reboot syscalls, but as > small as it can be, there is always the possibility that the measurement > list will be out of date at the time of reboot. > > To address this issue, this patch series also introduces > kexec_update_segment, which allows a reboot notifier to change the > contents of the image segment during the reboot process. > > The last patch is not intended to be merged, it just demonstrates how > this feature can be used. > > This series applies on top of v6 of the "kexec_file_load implementation > for PowerPC" patch series (which applies on top of v4.8-rc1): > > https://lists.infradead.org/pipermail/kexec/2016-August/016960.html I grabbed these two patch series. I also merged the "IMA: Demonstration code for kexec buffer passing." demonstration patch just to get things a bit of testing. I assume that once the "ima: carry the measurement list across kexec" series has stabilised, I should drop the demo patch and also grab those? If so, pelase start cc'ing me. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 4/6] kexec_file: Add mechanism to update kexec segments.
On Sat, 13 Aug 2016 00:18:23 -0300 Thiago Jung Bauermannwrote: > kexec_update_segment allows a given segment in kexec_image to have > its contents updated. This is useful if the current kernel wants to > send information to the next kernel that is up-to-date at the time of > reboot. > > ... > > @@ -721,6 +721,105 @@ static struct page *kimage_alloc_page(struct kimage > *image, > return page; > } > > +/** > + * kexec_update_segment - update the contents of a kimage segment > + * @buffer: New contents of the segment. > + * @bufsz: @buffer size. > + * @load_addr: Segment's physical address in the next kernel. > + * @memsz: Segment size. > + * > + * This function assumes kexec_mutex is held. > + * > + * Return: 0 on success, negative errno on error. > + */ > +int kexec_update_segment(const char *buffer, unsigned long bufsz, > + unsigned long load_addr, unsigned long memsz) > +{ > + int i; > + unsigned long entry; > + unsigned long *ptr = NULL; > + void *dest = NULL; > + > + if (kexec_image == NULL) { > + pr_err("Can't update segment: no kexec image loaded.\n"); > + return -EINVAL; > + } > + > + /* > + * kexec_add_buffer rounds up segment sizes to PAGE_SIZE, so > + * we have to do it here as well. > + */ > + memsz = ALIGN(memsz, PAGE_SIZE); > + > + for (i = 0; i < kexec_image->nr_segments; i++) > + /* We only support updating whole segments. */ > + if (load_addr == kexec_image->segment[i].mem && > + memsz == kexec_image->segment[i].memsz) { > + if (kexec_image->segment[i].do_checksum) { > + pr_err("Trying to update non-modifiable > segment.\n"); > + return -EINVAL; > + } > + > + break; > + } > + if (i == kexec_image->nr_segments) { > + pr_err("Couldn't find segment to update: 0x%lx, size 0x%lx\n", > +load_addr, memsz); > + return -EINVAL; > + } > + > + for (entry = kexec_image->head; !(entry & IND_DONE) && memsz; > + entry = *ptr++) { > + void *addr = (void *) (entry & PAGE_MASK); > + > + switch (entry & IND_FLAGS) { > + case IND_DESTINATION: > + dest = addr; > + break; > + case IND_INDIRECTION: > + ptr = __va(addr); > + break; > + case IND_SOURCE: > + /* Shouldn't happen, but verify just to be safe. */ > + if (dest == NULL) { > + pr_err("Invalid kexec entries list."); > + return -EINVAL; > + } > + > + if (dest == (void *) load_addr) { > + struct page *page; > + char *ptr; > + size_t uchunk, mchunk; > + > + page = kmap_to_page(addr); > + > + ptr = kmap(page); kmap_atomic() could be used here, and it is appreciably faster. > + ptr += load_addr & ~PAGE_MASK; > + mchunk = min_t(size_t, memsz, > +PAGE_SIZE - (load_addr & > ~PAGE_MASK)); > + uchunk = min(bufsz, mchunk); > + memcpy(ptr, buffer, uchunk); > + > + kunmap(page); > + > + bufsz -= uchunk; > + load_addr += mchunk; > + buffer += mchunk; > + memsz -= mchunk; > + } > + dest += PAGE_SIZE; > + } > + > + /* Shouldn't happen, but verify just to be safe. */ > + if (ptr == NULL) { > + pr_err("Invalid kexec entries list."); > + return -EINVAL; > + } > + } > + > + return 0; > +} > + ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] kexec: add resriction on the kexec_load
On Fri, 22 Jul 2016 13:36:22 +0800 zhongjiang <zhongji...@huawei.com> wrote: > From: zhong jiang <zhongji...@huawei.com> > > I hit the following question when run trinity in my system. The > kernel is 3.4 version. but the mainline have same question to be > solved. The root cause is the segment size is too large, it can > expand the most of the area or the whole memory, therefore, it > may waste an amount of time to abtain a useable page. and other > cases will block until the test case quit. at the some time, > OOM will come up. > > Call Trace: > [] __alloc_pages_nodemask+0x14c/0x8f0 > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] alloc_pages_current+0xaf/0x120 > [] kimage_alloc_pages+0x10/0x60 > [] kimage_alloc_control_pages+0x5d/0x270 > [] machine_kexec_prepare+0xe5/0x6c0 > [] ? kimage_free_page_list+0x52/0x70 > [] sys_kexec_load+0x141/0x600 > [] ? vfs_write+0x100/0x180 > [] system_call_fastpath+0x16/0x1b > > The patch just add condition on sanity_check_segment_list to > restriction the segment size. > > ... > > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -148,6 +148,7 @@ static struct page *kimage_alloc_page(struct kimage > *image, > int sanity_check_segment_list(struct kimage *image) > { > int result, i; > + unsigned long total_segments = 0; > unsigned long nr_segments = image->nr_segments; > > /* > @@ -209,6 +210,21 @@ int sanity_check_segment_list(struct kimage *image) > return result; > } > > + /* Verity all segment size donnot exceed the specified size. > + * if segment size from user space is too large, a large > + * amount of time will be wasted when allocating page. so, > + * softlockup may be come up. > + */ > + for (i = 0; i < nr_segments; i++) { > + if (image->segment[i].memsz > (totalram_pages / 2)) > + return result; > + > + total_segments += image->segment[i].memsz; > + } > + > + if (total_segments > (totalram_pages / 2)) > + return result; > + > /* >* Verify we have good destination addresses. Normally >* the caller is responsible for making certain we don't This needed a few adjustments for pending changes in linux-next's sanity_check_segment_list(). Mainly s/return result/return -EINVAL/. I also tweaked the patch changelog. Please check. From: zhong jiang <zhongji...@huawei.com> Subject: kexec: add restriction on kexec_load() segment sizes I hit the following issue when run trinity in my system. The kernel is 3.4 version, but mainline has the same issue. The root cause is that the segment size is too large so the kerenl spends too long trying to allocate a page. Other cases will block until the test case quits. Also, OOM conditions will occur. Call Trace: [] __alloc_pages_nodemask+0x14c/0x8f0 [] ? trace_hardirqs_on_thunk+0x3a/0x3c [] ? trace_hardirqs_on_thunk+0x3a/0x3c [] ? trace_hardirqs_on_thunk+0x3a/0x3c [] ? trace_hardirqs_on_thunk+0x3a/0x3c [] ? trace_hardirqs_on_thunk+0x3a/0x3c [] alloc_pages_current+0xaf/0x120 [] kimage_alloc_pages+0x10/0x60 [] kimage_alloc_control_pages+0x5d/0x270 [] machine_kexec_prepare+0xe5/0x6c0 [] ? kimage_free_page_list+0x52/0x70 [] sys_kexec_load+0x141/0x600 [] ? vfs_write+0x100/0x180 [] system_call_fastpath+0x16/0x1b The patch chnages sanity_check_segment_list() to verify that no segment is larger than half of memory. Link: http://lkml.kernel.org/r/1469165782-13193-1-git-send-email-zhongji...@huawei.com Signed-off-by: zhong jiang <zhongji...@huawei.com> Cc: Eric Biederman <ebied...@xmission.com> Cc: Vivek Goyal <vgo...@redhat.com> Cc: Dave Young <dyo...@redhat.com> Signed-off-by: Andrew Morton <a...@linux-foundation.org> --- kernel/kexec_core.c | 16 1 file changed, 16 insertions(+) diff -puN kernel/kexec_core.c~kexec-add-resriction-on-the-kexec_load kernel/kexec_core.c --- a/kernel/kexec_core.c~kexec-add-resriction-on-the-kexec_load +++ a/kernel/kexec_core.c @@ -154,6 +154,7 @@ static struct page *kimage_alloc_page(st int sanity_check_segment_list(struct kimage *image) { int i; + unsigned long total_segments = 0; unsigned long nr_segments = image->nr_segments; /* @@ -214,6 +215,21 @@ int sanity_check_segment_list(struct kim return -EINVAL; } + /* Verity all segment size donnot exceed the specified size. +* if segment size from user space is too large, a large +* amount of time will be wasted
Re: [PATCH 1/2] kexec: update VMCOREINFO for compound_order/dtor
On Tue, 1 Mar 2016 06:14:32 + Atsushi Kumagaiwrote: > makedumpfile refers page.lru.next to get the order of compound pages > for page filtering. However, now the order is stored in page.compound_order, > hence VMCOREINFO should be updated to export the offset of > page.compound_order. > > The fact is, page.compound_order was introduced already in kernel 4.0, > but the offset of it was the same as page.lru.next until kernel 4.3, > so this was not actual problem. > > The above can be said also for page.lru.prev and page.compound_dtor, > it's necessary to detect hugetlbfs pages. Further, the content was > changed from direct address to the ID which means dtor. It's unclear which kernels need the patch and why. I *think* that the patch is needed in 4.3.x, 4.4.x, 4.5.x and 4.6 in order to make makedumpfile work correctly. Is that right? And it appears that [patch 2/2] is needed in 4.0+? However in both cases I am uncertain - what are the end-user visible effects of these regressions? Why can bugs remain in place for so long without having been observed? Please make all these things clear when perparing changelogs for bugfixes: which kernel versions need fixing and why (ie: what are the end-user visible effects of the bug). ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: Introduce a protection mechanism for the crashkernel reserved memory
On Wed, 23 Dec 2015 19:12:25 +0800 Xunlei Pang <xlp...@redhat.com> wrote: > For the cases that some kernel (module) path stamps the crash > reserved memory(already mapped by the kernel) where has been > loaded the second kernel data, the kdump kernel will probably > fail to boot when panic happens (or even not happens) leaving > the culprit at large, this is unacceptable. > > The patch introduces a mechanism for detecting such cases: > 1) After each crash kexec loading, it simply marks the reserved > memory regions readonly since we no longer access it after that. > When someone stamps the region, the first kernel will panic and > trigger the kdump. The weak arch_kexec_protect_crashkres() is > introduced to do the actual protection. > > 2) To allow multiple loading, once 1) was done we also need to > remark the reserved memory to readwrite each time a system call > related to kdump is made. The weak arch_kexec_unprotect_crashkres() > is introduced to do the actual protection. > > The architecture can make its specific implementation by overriding > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres(). I'd like to see a bit more review activity on these patches from the other kexec developers, please. I'll include both the patches below. Also I don't have any record of reviews of these two: http://ozlabs.org/~akpm/mmotm/broken-out/kexec-make-a-pair-of-map-unmap-reserved-pages-in-error-path.patch http://ozlabs.org/~akpm/mmotm/broken-out/kexec-do-a-cleanup-for-function-kexec_load.patch Thanks. From: Xunlei Pang <xlp...@redhat.com> Subject: kexec: introduce a protection mechanism for the crashkernel reserved memory For the cases that some kernel (module) path stamps the crash reserved memory(already mapped by the kernel) where has been loaded the second kernel data, the kdump kernel will probably fail to boot when panic happens (or even not happens) leaving the culprit at large, this is unacceptable. The patch introduces a mechanism for detecting such cases: 1) After each crash kexec loading, it simply marks the reserved memory regions readonly since we no longer access it after that. When someone stamps the region, the first kernel will panic and trigger the kdump. The weak arch_kexec_protect_crashkres() is introduced to do the actual protection. 2) To allow multiple loading, once 1) was done we also need to remark the reserved memory to readwrite each time a system call related to kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced to do the actual protection. The architecture can make its specific implementation by overriding arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres(). Signed-off-by: Xunlei Pang <xlp...@redhat.com> Cc: Eric Biederman <ebied...@xmission.com> Cc: Dave Young <dyo...@redhat.com> Cc: Minfei Huang <mhu...@redhat.com> Cc: Vivek Goyal <vgo...@redhat.com> Signed-off-by: Andrew Morton <a...@linux-foundation.org> --- include/linux/kexec.h |2 ++ kernel/kexec.c|9 - kernel/kexec_core.c |6 ++ kernel/kexec_file.c |8 +++- 4 files changed, 23 insertions(+), 2 deletions(-) diff -puN include/linux/kexec.h~kexec-introduce-a-protection-mechanism-for-the-crashkernel-reserved-memory include/linux/kexec.h --- a/include/linux/kexec.h~kexec-introduce-a-protection-mechanism-for-the-crashkernel-reserved-memory +++ a/include/linux/kexec.h @@ -317,6 +317,8 @@ int __weak arch_kexec_apply_relocations_ Elf_Shdr *sechdrs, unsigned int relsec); int __weak arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, unsigned int relsec); +void arch_kexec_protect_crashkres(void); +void arch_kexec_unprotect_crashkres(void); #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; diff -puN kernel/kexec.c~kexec-introduce-a-protection-mechanism-for-the-crashkernel-reserved-memory kernel/kexec.c --- a/kernel/kexec.c~kexec-introduce-a-protection-mechanism-for-the-crashkernel-reserved-memory +++ a/kernel/kexec.c @@ -167,8 +167,12 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon return -EBUSY; dest_image = _image; - if (flags & KEXEC_ON_CRASH) + if (flags & KEXEC_ON_CRASH) { dest_image = _crash_image; + if (kexec_crash_image) + arch_kexec_unprotect_crashkres(); + } + if (nr_segments > 0) { unsigned long i; @@ -211,6 +215,9 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon image = xchg(dest_image, image); out: + if ((flags & KEXEC_ON_CRASH) && kexec_crash_image) + arch_kexec_protect_crashkres(); + mutex_unlock(_mutex); kimage_free(image); diff -puN kernel/kexec_core.c~kexec-introduce-a-protection-m
Re: [PATCH V2] proc-vmcore: wrong data type casting fix
On Fri, 11 Mar 2016 16:42:48 +0800 Dave Youngwrote: > On i686 PAE enabled machine the contiguous physical area could be large > and it can cause trimming down variables in below calculation in > read_vmcore() and mmap_vmcore(): > > tsz = min_t(size_t, m->offset + m->size - *fpos, buflen); > > Then the real size passed down is not correct any more. > Suppose m->offset + m->size - *fpos being truncated to 0, buflen >0 then > we will get tsz = 0. It is of course not an expected result. I don't really understand this. vmcore.offset if loff_t which is 64-bit vmcore.size is long long *fpos is loff_t so the expression should all be done with 64-bit arithmetic anyway. Maybe buflen (size_t) has the wrong type, but the result of the other expression should be in-range by the time we come to doing the comparison. > During our tests there are two problems caused by it: > 1) read_vmcore will refuse to continue so makedumpfile fails. > 2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range(). > > Use unsigned long long in min_t instead so that the variables are not > truncated. > > Signed-off-by: Baoquan He > Signed-off-by: Dave Young I think we'll need a cc:stable here. > --- linux-x86.orig/fs/proc/vmcore.c > +++ linux-x86/fs/proc/vmcore.c > @@ -231,7 +231,9 @@ static ssize_t __read_vmcore(char *buffe > > list_for_each_entry(m, _list, list) { > if (*fpos < m->offset + m->size) { > - tsz = min_t(size_t, m->offset + m->size - *fpos, > buflen); > + tsz = (size_t)min_t(unsigned long long, > + m->offset + m->size - *fpos, > + buflen); This is rather a mess. Can we please try to fix this bug by choosing appropriate types rather than all the typecasting? > start = m->paddr + *fpos - m->offset; > tmp = read_from_oldmem(buffer, tsz, , userbuf); > if (tmp < 0) > @@ -461,7 +463,8 @@ static int mmap_vmcore(struct file *file > if (start < m->offset + m->size) { > u64 paddr = 0; > > - tsz = min_t(size_t, m->offset + m->size - start, size); > + tsz = (size_t)min_t(unsigned long long, > + m->offset + m->size - start, size); > paddr = m->paddr + start - m->offset; > if (vmcore_remap_oldmem_pfn(vma, vma->vm_start + len, > paddr >> PAGE_SHIFT, tsz, ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH V2 2/2] kexec: Do a cleanup for function kexec_load
ntry, nr_segments, segments, flags); if ((flags & KEXEC_ON_CRASH) && kexec_crash_image) arch_kexec_protect_crashkres(); mutex_unlock(_mutex); return result; } From: Minfei Huang <mnfhu...@gmail.com> Subject: kexec: do a cleanup for function kexec_load There are a lof of work to be done in function kexec_load, not only for allocating structs and loading initram, but also for some misc. To make it more clear, wrap a new function do_kexec_load which is used to allocate structs and load initram. And the pre-work will be done in kexec_load. Signed-off-by: Minfei Huang <mnfhu...@gmail.com> Cc: Vivek Goyal <vgo...@redhat.com> Cc: "Eric W. Biederman" <ebied...@xmission.com> Signed-off-by: Andrew Morton <a...@linux-foundation.org> --- kernel/kexec.c | 125 +-- 1 file changed, 69 insertions(+), 56 deletions(-) diff -puN kernel/kexec.c~kexec-do-a-cleanup-for-function-kexec_load kernel/kexec.c --- a/kernel/kexec.c~kexec-do-a-cleanup-for-function-kexec_load +++ a/kernel/kexec.c @@ -103,6 +103,74 @@ out_free_image: return ret; } +static int do_kexec_load(unsigned long entry, unsigned long nr_segments, + struct kexec_segment __user *segments, unsigned long flags) +{ + struct kimage **dest_image, *image; + unsigned long i; + int ret; + + if (flags & KEXEC_ON_CRASH) { + dest_image = _crash_image; + if (kexec_crash_image) + arch_kexec_unprotect_crashkres(); + } else { + dest_image = _image; + } + + if (nr_segments == 0) { + /* Uninstall image */ + kimage_free(xchg(dest_image, NULL)); + return 0; + } + if (flags & KEXEC_ON_CRASH) { + /* +* Loading another kernel to switch to if this one +* crashes. Free any current crash dump kernel before +* we corrupt it. +*/ + kimage_free(xchg(_crash_image, NULL)); + } + + ret = kimage_alloc_init(, entry, nr_segments, segments, flags); + if (ret) + return ret; + + if (flags & KEXEC_ON_CRASH) + crash_map_reserved_pages(); + + if (flags & KEXEC_PRESERVE_CONTEXT) + image->preserve_context = 1; + + ret = machine_kexec_prepare(image); + if (ret) + goto out; + + for (i = 0; i < nr_segments; i++) { + ret = kimage_load_segment(image, >segment[i]); + if (ret) + goto out; + } + + kimage_terminate(image); + + /* Install the new kernel and uninstall the old */ + image = xchg(dest_image, image); + +out: + if ((flags & KEXEC_ON_CRASH) && kexec_crash_image) + arch_kexec_protect_crashkres(); + + /* +* Once the reserved memory is mapped, we should unmap this memory +* before returning +*/ + if (flags & KEXEC_ON_CRASH) + crash_unmap_reserved_pages(); + kimage_free(image); + return ret; +} + /* * Exec Kernel system call: for obvious reasons only root may call it. * @@ -127,7 +195,6 @@ out_free_image: SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, struct kexec_segment __user *, segments, unsigned long, flags) { - struct kimage **dest_image, *image; int result; /* We only trust the superuser with rebooting the system. */ @@ -152,9 +219,6 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon if (nr_segments > KEXEC_SEGMENT_MAX) return -EINVAL; - image = NULL; - result = 0; - /* Because we write directly to the reserved memory * region when loading crash kernels we need a mutex here to * prevent multiple crash kernels from attempting to load @@ -166,63 +230,12 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon if (!mutex_trylock(_mutex)) return -EBUSY; - dest_image = _image; - if (flags & KEXEC_ON_CRASH) { - dest_image = _crash_image; - if (kexec_crash_image) - arch_kexec_unprotect_crashkres(); - } - - if (nr_segments > 0) { - unsigned long i; - - if (flags & KEXEC_ON_CRASH) { - /* -* Loading another kernel to switch to if this one -* crashes. Free any current crash dump kernel before -* we corrupt it. -*/ - - kimage_free(xchg(_crash_image, NULL)); - result = kimage_alloc_init(, entry, nr_segments, - segments, flags); -
Re: [PATCH V2 1/2] kexec: Make a pair of map/unmap reserved pages in error path
On Tue, 1 Mar 2016 16:02:28 +0800 Minfei Huangwrote: > For some arch, kexec shall map the reserved pages, then use them, when > we try to start the kdump service. Which architectures are these, by the way? > kexec may return directly, without unmaping the reserved pages, if it > fails during starting service. To fix it, we make a pair of map/unmap > reserved pages both in generic path and error path. I'm having trouble understanding the urgency of this patch. Do you think it is needed in 4.5? -stable? If so, why? Thanks. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Make a pair of map/unmap reserved pages in error path
On Thu, 25 Feb 2016 22:02:40 +0800 Minfei Huangwrote: > From: Minfei Huang > > For some arch, kexec shall map the reserved pages, then use them, when > we try to start the kdump service. > > kexec may return directly, without unmaping the reserved pages, if it > fails during starting service. To fix it, we make a pair of map/unmap > reserved pages both in generic path and error path. This patch both refactors the code AND fixes the bug. It is a decent-looking refactoring, but mixing the two together makes it *much* harder to review the bugfix. These two steps should be separated please, with the bugfix patch coming first. > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -103,6 +103,68 @@ out_free_image: > return ret; > } > > +static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > + struct kexec_segment __user *segments, unsigned long flags) > +{ > + struct kimage **dest_image, *image; > + unsigned long i; > + int ret; > + > + if (flags & KEXEC_ON_CRASH) > + dest_image = _crash_image; > + else > + dest_image = _image; > + > + if (nr_segments == 0) { > + /* Uninstall image */ > + kimage_free(xchg(dest_image, NULL)); > + return 0; > + } > + if (flags & KEXEC_ON_CRASH) { > + /* > + * Loading another kernel to switch to if this one > + * crashes. Free any current crash dump kernel before > + * we corrupt it. > + */ > + kimage_free(xchg(_crash_image, NULL)); > + } > + > + ret = kimage_alloc_init(, entry, nr_segments, segments, flags); > + if (ret) > + return ret; This is a bug, isn't it? Missed kimage_free(). > + if (flags & KEXEC_ON_CRASH) > + crash_map_reserved_pages(); > + > + if (flags & KEXEC_PRESERVE_CONTEXT) > + image->preserve_context = 1; > + > + ret = machine_kexec_prepare(image); > + if (ret) > + goto out; > + > + for (i = 0; i < nr_segments; i++) { > + ret = kimage_load_segment(image, >segment[i]); > + if (ret) > + goto out; > + } > + > + kimage_terminate(image); > + > + /* Install the new kernel and uninstall the old */ > + image = xchg(dest_image, image); > + > +out: > + /* > + * Once the reserved memory is mapped, we should unmap this memory > + * before returning > + */ > + if (flags & KEXEC_ON_CRASH) > + crash_unmap_reserved_pages(); > + kimage_free(image); > + return ret; > +} > + > > ... > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: unmap reserved pages for each error-return way
On Thu, 28 Jan 2016 11:57:22 +0300 Dmitry Safonovwrote: > On 01/28/2016 09:29 AM, Minfei Huang wrote: > > On 01/27/16 at 02:48pm, Dmitry Safonov wrote: > >> For allocation of kimage failure or kexec_prepare or load segments > >> errors there is no need to keep crashkernel memory mapped. > >> It will affect only s390 as map/unmap hook defined only for it. > >> As on unmap s390 also changes os_info structure let's check return code > >> and add info only on success. > > Hi, Dmitry. > > > > Previously, I sent a patch to fix this issue. You can refer it in > > following link. > > > > http://lists.infradead.org/pipermail/kexec/2015-July/013960.html > Oh, scratch my patch - I'm fine with yours, wanted to do the similar thing > because it has dazzled me while I was debugging around. There were a bunch of patches tossed around in that thread but I'm not sure that anything actually got applied? Perhaps some resending is needed. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: unmap reserved pages for each error-return way
On Wed, 27 Jan 2016 14:48:31 +0300 Dmitry Safonovwrote: > For allocation of kimage failure or kexec_prepare or load segments > errors there is no need to keep crashkernel memory mapped. > It will affect only s390 as map/unmap hook defined only for it. > As on unmap s390 also changes os_info structure let's check return code > and add info only on success. > This conflicts (both mechanically and somewhat conceptually) with Xunlei Pang's "kexec: Introduce a protection mechanism for the crashkernel reserved memory" and "kexec: provide arch_kexec_protect(unprotect)_crashkres()". http://ozlabs.org/~akpm/mmots/broken-out/kexec-introduce-a-protection-mechanism-for-the-crashkernel-reserved-memory.patch http://ozlabs.org/~akpm/mmots/broken-out/kexec-introduce-a-protection-mechanism-for-the-crashkernel-reserved-memory-v4.patch and http://ozlabs.org/~akpm/mmots/broken-out/kexec-provide-arch_kexec_protectunprotect_crashkres.patch http://ozlabs.org/~akpm/mmots/broken-out/kexec-provide-arch_kexec_protectunprotect_crashkres-v4.patch so can we please sort all that out? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Move some memembers and definitions within the scope of CONFIG_KEXEC_FILE
On Tue, 22 Dec 2015 19:40:39 +0800 Xunlei Pangwrote: > > Following functions will be used only in kexec_file. Please wrap them in > > CONFIG_KEXEC_FILE. > > > > int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf, > > unsigned long buf_len); > > void * __weak arch_kexec_kernel_image_load(struct kimage *image); > > int __weak arch_kimage_file_post_load_cleanup(struct kimage *image); > > int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, > > unsigned long buf_len); > > int __weak arch_kexec_apply_relocations_add(const Elf_Ehdr *ehdr, > > Elf_Shdr *sechdrs, unsigned int relsec); > > int __weak arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr > > *sechdrs, > > unsigned int relsec); > > Thanks for the comment. > > I noticed this as well, but seems for the function declarations we don't need > do this, > since they don't consume the actual space. > > For example, in the include/linux/timekeeping.h > /* > * RTC specific > */ > extern bool timekeeping_rtc_skipsuspend(void); > extern bool timekeeping_rtc_skipresume(void); > > extern void timekeeping_inject_sleeptime64(struct timespec64 *delta); > > also not embraced by the corresponding macros. Yes. If we add the ifdefs then a programming error will be detected at compile time. If we don't add the ifdefs then that error will be detected at link time. So the ifdefs provide a quite small advantage, while making the code harder to read and harder to maintain. I believe that "no ifdefs" is the better side of this tradeoff. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()
On Tue, 29 Sep 2015 20:58:57 +0800 "Lee, Chun-Yi"wrote: > This patch modified the code in fill_up_crash_elf_data by using > walk_system_ram_res instead of walk_system_ram_range to count the max > number of crash memory ranges. That's because the walk_system_ram_range > filters out small memory regions that are resided in the same page, but > walk_system_ram_res does not. > > The oringial issue is page fault error that sometimes happened on big machines > when preparing ELF headers: > > [ 305.291522] BUG: unable to handle kernel paging request at c90613fc9000 > [ 305.299621] IP: [] > prepare_elf64_ram_headers_callback+0x165/0x260 > [ 305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0 > [ 305.315393] Oops: 0002 [#1] SMP > [...snip] > [ 305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: > 8e1c03ec2000 > [ 305.429292] RIP: 0010:[] [] > prepare_elf64_ra > m_headers_callback+0x165/0x260 > [...snip] > > After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback, > the code uses walk_system_ram_res to fill-in crash memory regions information > to program header, so it counts those small memory regions that are resided in > a page area. But, when kernel was using walk_system_ram_range in > fill_up_crash_elf_data to count the number of crash memory regions, it filters > out small regions. I printed those small memory regions, for example: > > kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, sz=0xdc0 > > Base on the code in walk_system_ram_range, this memory region will be filtered > out: > > pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593 > end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593 > end_pfn - pfn = 0x77593 - 0x77593 = 0 <=== if (end_pfn > pfn) is FALSE > > So, the max_nr_ranges that's counted by kernel doesn't include small memory > regions. That causes the page fault issue happened in later code path for > preparing EFL headers. > > This issus is not easy to reproduce on small machines that don't have too > many CPUs because the allocated page aligned ELF buffer has more free space > to cover those small memory regions' PT_LOAD headers. > fyi, I added a cc:stable to my copy of this patch. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [REPOST PATCH] kexec: Remove the unnecessary conditional judgement to simplify the code logic
On Thu, 13 Aug 2015 09:55:25 +0900 Simon Horman ho...@verge.net.au wrote: On Tue, Jul 28, 2015 at 12:46:42PM +0800, Minfei Huang wrote: Transforming PFN(Page Frame Number) to struct page is never failure, so we can simplify the code logic to do the image-control_page assignment directly in the loop, and remove the unnecessary conditional judgement. Signed-off-by: Minfei Huang mnfhu...@gmail.com Acked-by: Dave Young dyo...@redhat.com Acked-by: Vivek Goyal vgo...@redhat.com Andrew, could you consider picking this up. It seems to have been sufficiently reviewed, acked, etc... I grabbed this one of July 28. http://ozlabs.org/~akpm/mmotm/broken-out/kexec-remove-the-unnecessary-conditional-judgement-to-simplify-the-code-logic.patch ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [Patch v2] align crash_notes allocation to make it be inside one physical page
On Mon, 3 Aug 2015 20:50:43 +0800 Baoquan He b...@redhat.com wrote: People reported that crash_notes in /proc/vmcore were corrupted and this cause crash kdump failure. With code debugging and log we got the root cause. This is because percpu variable crash_notes are allocated in 2 vmalloc pages. Currently percpu is based on vmalloc by default. Vmalloc can't guarantee 2 continuous vmalloc pages are also on 2 continuous physical pages. So when 1st kernel exports the starting address and size of crash_notes through sysfs like below: /sys/devices/system/cpu/cpux/crash_notes /sys/devices/system/cpu/cpux/crash_notes_size kdump kernel use them to get the content of crash_notes. However the 2nd part may not be in the next neighbouring physical page as we expected if crash_notes are allocated accross 2 vmalloc pages. That's why nhdr_ptr-n_namesz or nhdr_ptr-n_descsz could be very huge in update_note_header_size_elf64() and cause note header merging failure or some warnings. In this patch change to call __alloc_percpu() to passed in the align value by rounding crash_notes_size up to the nearest power of two. This make sure the crash_notes is allocated inside one physical page since sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE. Meanwhile add a WARN_ON in case it grows to be bigger than PAGE_SIZE in the future. ... --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu) static int __init crash_notes_memory_init(void) { /* Allocate memory for saving cpu registers. */ - crash_notes = alloc_percpu(note_buf_t); + size_t size, align; + int order; + + size = sizeof(note_buf_t); + order = get_count_order(size); + align = min_t(size_t, (1order), PAGE_SIZE); + + WARN_ON(size PAGE_SIZE); + + crash_notes = __alloc_percpu(size, align); A code comment would be helpful - the reason for this code's existence is otherwise utterly unobvious. I think it can be done this way: align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE); I never noticed get_count_order() before. afaict it does the same as order_base_2(), except get_count_order() generates better code and has a ridiculous name. And I think the WARN_ON can be replaced with a BUILD_BUG_ON(sizeofPAGE_SIZE)? That would avoid adding runtime overhead. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] kdump, vmcoreinfo: report actual value of phys_base
(cc trimmed a bit) On Thu, 13 Nov 2014 11:30:11 +0900 (JST) HATAYAMA Daisuke d.hatay...@jp.fujitsu.com wrote: Currently, VMCOREINFO note information reports the virtual address of phys_base that is assigned to symbol phys_base. But this doesn't make sense because to refer to phys_base, it's necessary to get the value of phys_base itself we are now about to refer to. Folks, could we please get a bit of reviewing, acking or nacking for this one? From: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com Subject: kdump, vmcoreinfo: report actual value of phys_base Currently, VMCOREINFO note information reports the virtual address of phys_base that is assigned to symbol phys_base. But this doesn't make sense because to refer to phys_base, it's necessary to get the value of phys_base itself we are now about to refer to. Userland tools related to kdump such as makedumpfile and crash utility so far have made some efforts to calculate phys_base on crash dump formats generated by mechanisms running outside Linux kernel, such as virtual machine hypervisor such as qemu dump, which ordinary users use via virsh dump, or ones implemented on vendor specific firmware. That is, find a kernel data whose virtual and physical addresses are available via its note information and calculate phys_base from it. However, such data structure is not the one prepared for phys_base purpose. There's no guarantee that other crash dump mechanisms include such information that can be used to calculate phys_base similarly. To get VMCOREINFO in vmcore, it's easy to use strings and grep commands like this; VMCOREINFO consists of simple string: $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E .*VMCOREINFO.* -A 100 VMCOREINFO OSRELEASE=3.10.0-121.el7.x86_64 PAGESIZE=4096 ... This is also useful to get value of phys_base in kdump 2nd kernel contained in vmcore using the above-mentioned external crash dump mechanism; kdump 2nd kernel is an inherently relocated kernel. This commit doesn't remove VMCOREINFO_SYMBOL(phys_base) line because makedumpfile refers to it and if removing it, old versions makedumpfile doesn't work well. Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com Cc: Eric W. Biederman ebied...@xmission.com Cc: Vivek Goyal vgo...@redhat.com Cc: Atsushi Kumagai kumagai-atsu...@mxc.nes.nec.co.jp Cc: Dave Anderson ander...@redhat.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- arch/x86/kernel/machine_kexec_64.c |1 + include/linux/kexec.h |2 ++ 2 files changed, 3 insertions(+) diff -puN arch/x86/kernel/machine_kexec_64.c~kdump-vmcoreinfo-report-actual-value-of-phys_base arch/x86/kernel/machine_kexec_64.c --- a/arch/x86/kernel/machine_kexec_64.c~kdump-vmcoreinfo-report-actual-value-of-phys_base +++ a/arch/x86/kernel/machine_kexec_64.c @@ -334,6 +334,7 @@ void arch_crash_save_vmcoreinfo(void) #endif vmcoreinfo_append_str(KERNELOFFSET=%lx\n, (unsigned long)_text - __START_KERNEL); + VMCOREINFO_PHYS_BASE(phys_base); } /* arch-dependent functionality related to kexec file-based syscall */ diff -puN include/linux/kexec.h~kdump-vmcoreinfo-report-actual-value-of-phys_base include/linux/kexec.h --- a/include/linux/kexec.h~kdump-vmcoreinfo-report-actual-value-of-phys_base +++ a/include/linux/kexec.h @@ -258,6 +258,8 @@ unsigned long paddr_vmcoreinfo_note(void vmcoreinfo_append_str(NUMBER(%s)=%ld\n, #name, (long)name) #define VMCOREINFO_CONFIG(name) \ vmcoreinfo_append_str(CONFIG_%s=y\n, #name) +#define VMCOREINFO_PHYS_BASE(value) \ + vmcoreinfo_append_str(PHYS_BASE=%lx\n, (unsigned long)value) extern struct kimage *kexec_image; extern struct kimage *kexec_crash_image; _ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH V3] kernel, add bug_on_warn
On Thu, 23 Oct 2014 08:53:14 -0400 Prarit Bhargava pra...@redhat.com wrote: There have been several times where I have had to rebuild a kernel to cause a panic when hitting a WARN() in the code in order to get a crash dump from a system. Sometimes this is easy to do, other times (such as in the case of a remote admin) it is not trivial to send new images to the user. A much easier method would be a switch to change the WARN() over to a BUG(). This makes debugging easier in that I can now test the actual image the WARN() was seen on and I do not have to engage in remote debugging. This patch adds a bug_on_warn kernel parameter, which calls BUG() in the warn_slowpath_common() path. The function will still print out the location of the warning. The changelog doesn't mention /proc/sys/kernel/bug_on_warn? Why do we need both the sysctl and the kernel parameter? Only to trigger BUG for warnings which occur prior to initscripts. Is there a legitimate case for this? Is kdump even usable at this time? --- a/include/asm-generic/bug.h +++ b/include/asm-generic/bug.h @@ -75,10 +75,18 @@ extern void warn_slowpath_null(const char *file, const int line); #define __WARN_printf_taint(taint, arg...) \ warn_slowpath_fmt_taint(__FILE__, __LINE__, taint, arg) #else -#define __WARN() __WARN_TAINT(TAINT_WARN) +#define check_bug_on_warn() \ + do {\ + if (bug_on_warn)\ + BUG(); \ + } while (0) #define check_bug_on_warn() BUG_ON(bug_on_warn) would suffice? +#define __WARN() \ + do { __WARN_TAINT(TAINT_WARN); check_bug_on_warn(); } while (0) + #define __WARN_printf(arg...)do { printk(arg); __WARN(); } while (0) #define __WARN_printf_taint(taint, arg...) \ - do { printk(arg); __WARN_TAINT(taint); } while (0) + do { printk(arg); __WARN_TAINT(taint); check_bug_on_warn(); } while (0) #endif What's this code here for anyway? The changes to warn_slowpath_common() aren't sufficient? #ifndef WARN_ON diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 40728cf..4094a60 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -422,6 +422,7 @@ extern int panic_on_oops; extern int panic_on_unrecovered_nmi; extern int panic_on_io_nmi; extern int sysctl_panic_on_stackoverflow; +extern int bug_on_warn; /* * Only to be used by arch init code. If the user over-wrote the default * CONFIG_PANIC_TIMEOUT, honor it. diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h index 43aaba1..2ba0a58 100644 --- a/include/uapi/linux/sysctl.h +++ b/include/uapi/linux/sysctl.h @@ -153,6 +153,7 @@ enum KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */ KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ + KERN_BUG_ON_WARN=77, /* int: call BUG() in WARN() functions */ }; diff --git a/kernel/panic.c b/kernel/panic.c index d09dc5c..a6d2e2f 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -33,6 +33,7 @@ static int pause_on_oops; static int pause_on_oops_flag; static DEFINE_SPINLOCK(pause_on_oops_lock); static bool crash_kexec_post_notifiers; +int bug_on_warn; I suppose this should be __read_mostly. Assuming __read_mostly is useful :( ... ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/1] kexec: Verify the signature of signed PE bzImage
On Thu, 24 Jul 2014 10:16:42 -0400 Vivek Goyal vgo...@redhat.com wrote: I am wondering how this final kexec patch should be routed. Issue here is that this patch depends on David Howells's PKCS7 and PEFILE patches. Which are now sitting in security tree. (security/next). http://git.kernel.org/cgit/linux/kernel/git/jmorris/linux-security.git/log/?h=next Not sure how to handle this dependency issue. I have all these patches staged after linux-next so it fitted straight in. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 15/15] kexec: Support kexec/kdump on EFI systems
On Tue, 1 Jul 2014 20:46:05 +0100 Matt Fleming m...@console-pimps.org wrote: +int get_efi_runtime_map_size(void) +{ + return nr_efi_runtime_map * efi_memdesc_size; +} + +int get_efi_runtime_map_desc_size(void) +{ + return efi_memdesc_size; +} + +int efi_runtime_map_copy(void *buf, size_t bufsz) +{ + size_t sz = get_efi_runtime_map_size(); + + if (sz bufsz) + sz = bufsz; + + memcpy(buf, efi_runtime_map, sz); + return 0; +} Could we prefix these with efi_, e.g. efi_get_runtime_map_size() ? This? From: Andrew Morton a...@linux-foundation.org Subject: kexec-support-kexec-kdump-on-efi-systems-fix s/get_efi/efi_get/g, per Matt Cc: Vivek Goyal vgo...@redhat.com Cc: Matt Fleming m...@console-pimps.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- arch/x86/kernel/kexec-bzimage64.c |4 ++-- drivers/firmware/efi/runtime-map.c |6 +++--- include/linux/efi.h|8 3 files changed, 9 insertions(+), 9 deletions(-) diff -puN arch/x86/kernel/kexec-bzimage64.c~kexec-support-kexec-kdump-on-efi-systems-fix arch/x86/kernel/kexec-bzimage64.c --- a/arch/x86/kernel/kexec-bzimage64.c~kexec-support-kexec-kdump-on-efi-systems-fix +++ a/arch/x86/kernel/kexec-bzimage64.c @@ -181,7 +181,7 @@ setup_efi_state(struct boot_params *para ei-efi_systab_hi = current_ei-efi_systab_hi; ei-efi_memdesc_version = current_ei-efi_memdesc_version; - ei-efi_memdesc_size = get_efi_runtime_map_desc_size(); + ei-efi_memdesc_size = efi_get_runtime_map_desc_size(); setup_efi_info_memmap(params, params_load_addr, efi_map_offset, efi_map_sz); @@ -397,7 +397,7 @@ void *bzImage64_load(struct kimage *imag * have to create separate segment for each. Keeps things * little bit simple */ - efi_map_sz = get_efi_runtime_map_size(); + efi_map_sz = efi_get_runtime_map_size(); efi_map_sz = ALIGN(efi_map_sz, 16); params_cmdline_sz = sizeof(struct boot_params) + cmdline_len + MAX_ELFCOREHDR_STR_LEN; diff -puN drivers/firmware/efi/runtime-map.c~kexec-support-kexec-kdump-on-efi-systems-fix drivers/firmware/efi/runtime-map.c --- a/drivers/firmware/efi/runtime-map.c~kexec-support-kexec-kdump-on-efi-systems-fix +++ a/drivers/firmware/efi/runtime-map.c @@ -138,19 +138,19 @@ add_sysfs_runtime_map_entry(struct kobje return entry; } -int get_efi_runtime_map_size(void) +int efi_get_runtime_map_size(void) { return nr_efi_runtime_map * efi_memdesc_size; } -int get_efi_runtime_map_desc_size(void) +int efi_get_runtime_map_desc_size(void) { return efi_memdesc_size; } int efi_runtime_map_copy(void *buf, size_t bufsz) { - size_t sz = get_efi_runtime_map_size(); + size_t sz = efi_get_runtime_map_size(); if (sz bufsz) sz = bufsz; diff -puN include/linux/efi.h~kexec-support-kexec-kdump-on-efi-systems-fix include/linux/efi.h --- a/include/linux/efi.h~kexec-support-kexec-kdump-on-efi-systems-fix +++ a/include/linux/efi.h @@ -1151,8 +1151,8 @@ int efivars_sysfs_init(void); #ifdef CONFIG_EFI_RUNTIME_MAP int efi_runtime_map_init(struct kobject *); void efi_runtime_map_setup(void *, int, u32); -int get_efi_runtime_map_size(void); -int get_efi_runtime_map_desc_size(void); +int efi_get_runtime_map_size(void); +int efi_get_runtime_map_desc_size(void); int efi_runtime_map_copy(void *buf, size_t bufsz); #else static inline int efi_runtime_map_init(struct kobject *kobj) @@ -1163,12 +1163,12 @@ static inline int efi_runtime_map_init(s static inline void efi_runtime_map_setup(void *map, int nr_entries, u32 desc_size) {} -static inline int get_efi_runtime_map_size(void) +static inline int efi_get_runtime_map_size(void) { return 0; } -static inline int get_efi_runtime_map_desc_size(void) +static inline int efi_get_runtime_map_desc_size(void) { return 0; } _ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 00/15][V4] kexec: A new system call to allow in kernel loading
On Thu, 26 Jun 2014 16:33:29 -0400 Vivek Goyal vgo...@redhat.com wrote: This patch series does not do kernel signature verification yet. I plan to post another patch series for that. Now distributions are already signing PE/COFF bzImage with PKCS7 signature I plan to parse and verify those signatures. Primary goal of this patchset is to prepare groundwork so that kernel image can be signed and signatures be verified during kexec load. This should help with two things. - It should allow kexec/kdump on secureboot enabled machines. - In general it can help even without secureboot. By being able to verify kernel image signature in kexec, it should help with avoiding module signing restrictions. Matthew Garret showed how to boot into a custom kernel, modify first kernel's memory and then jump back to old kernel and bypass any policy one wants to. I hope these patches can be queued up for 3.17. Even without signature verification support, they provide new syscall functionality. But I wil leave it to maintainers to decide if they want signature verification support also be ready to merge before they merge this patchset. Well, this is an absolute ton of new code, much of it pretty complex. And I believe the entire point of this work is to enable image signature checking, but that hasn't been implemented yet? In which case I'm thinking it would be unwise to merge these parts into mainline - if signature checking doesn't work or fails review or if you get hit by a bus then we'd be left with a large lump of rather useless code? In which case I'm inclined to put this series into -next and keep it there pending completion of the signature checking part. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 09/15] kexec: Implementation of new syscall kexec_file_load
On Thu, 26 Jun 2014 16:33:38 -0400 Vivek Goyal vgo...@redhat.com wrote: Previous patch provided the interface definition and this patch prvides implementation of new syscall. Previously segment list was prepared in user space. Now user space just passes kernel fd, initrd fd and command line and kernel will create a segment list internally. This patch contains generic part of the code. Actual segment preparation and loading is done by arch and image specific loader. Which comes in next patch. ... --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -6,6 +6,8 @@ * Version 2. See the file COPYING for more details. */ +#define pr_fmt(fmt) kexec: fmt + #include linux/capability.h #include linux/mm.h #include linux/file.h @@ -326,6 +328,215 @@ out_free_image: return ret; } +static int copy_file_from_fd(int fd, void **buf, unsigned long *buf_len) +{ + struct fd f = fdget(fd); + int ret = 0; unneeded initialisation. + struct kstat stat; + loff_t pos; + ssize_t bytes = 0; + + if (!f.file) + return -EBADF; + + ret = vfs_getattr(f.file-f_path, stat); + if (ret) + goto out; + + if (stat.size INT_MAX) { + ret = -EFBIG; + goto out; + } + + /* Don't hand 0 to vmalloc, it whines. */ + if (stat.size == 0) { + ret = -EINVAL; + goto out; + } + + *buf = vmalloc(stat.size); + if (!*buf) { + ret = -ENOMEM; + goto out; + } + + pos = 0; + while (pos stat.size) { + bytes = kernel_read(f.file, pos, (char *)(*buf) + pos, + stat.size - pos); + if (bytes 0) { + vfree(*buf); + ret = bytes; + goto out; + } + + if (bytes == 0) + break; Here we can get a short read: (pos stat.size). Seems to me that it is risky to return this result to the caller as if all is well. + pos += bytes; + } + + *buf_len = pos; +out: + fdput(f); + return ret; +} ... ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kdump: Fix exported size of vmcoreinfo note
On Tue, 14 Jan 2014 14:33:11 -0500 Vivek Goyal vgo...@redhat.com wrote: Right now we seem to be exporting the max data size contained inside vmcoreinfo note. But this does not include the size of meta data around vmcore info data. Like name of the note and starting and ending elf_note. I think user space expects total size and that size is put in PT_NOTE elf header. Things seem to be fine so far because we are not using vmcoreinfo note to the maximum capacity. But as it starts filling up, to capacity, at some point of time, problem will be visible. urgh. This is what we get for adding undocumented interfaces. Looking through the fd59d231f81cb0287 changelog and the various email threads it points to I can find no mention of what vmcoreinfo is *supposed* to contain. It was just kinda silently tossed in there. So as a remedial step, could we please get this and any associated interfaces written down in a way which people can very belatedly review? Phrases like I think user space expects and Things seem to be fine so far don't inspire a ton of confidence. What are the chances of userspace breakage here? Would it be safer/saner to leave vmcoreinfo alone and add a new vmcoreinfo2 with the altered behaviour? --- linux-2.6.orig/kernel/ksysfs.c2014-01-14 14:09:42.107767503 -0500 +++ linux-2.6/kernel/ksysfs.c 2014-01-14 14:15:24.385510314 -0500 @@ -126,7 +126,7 @@ static ssize_t vmcoreinfo_show(struct ko { return sprintf(buf, %lx %x\n, paddr_vmcoreinfo_note(), -(unsigned int)vmcoreinfo_max_size); +(unsigned int)sizeof(vmcoreinfo_note)); } KERNEL_ATTR_RO(vmcoreinfo); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 00/14] kexec kernel efi runtime support
On Mon, 9 Dec 2013 17:42:13 +0800 Dave Young dyo...@redhat.com wrote: Here is the V5 patchset for supporting kexec kernel efi runtime. It's unclear (to me) who's supposed to merge this lot. It's a kexec patch which actually hits on efi and x86. I grabbed them for some -next testing but won't be offended if someone else merges them ;) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/7] Remove unused /dev/oldmem interface
On Tue, 28 May 2013 08:11:34 -0700 H. Peter Anvin h...@zytor.com wrote: On 05/28/2013 07:37 AM, Vivek Goyal wrote: Eric, Should we schedule the removal of this interface after 1-2 releases and give a warning once if anybody opens /dev/oldmem and tell them to use /proc/vmcore instead? I am kind of inclined towards warning approarch. If there is any xyz /dev/oldmem user in the wild out there, he/she atleast gets a chance to migrate to /proc/vmcore. This has been discussed ad nauseam at the Kernel Summit. It doesn't work. Who said that and what was their evidence? Giving users (especially the more technical ones) several months warning is eminently sensible. Although it doesn't appear to be necessary in this case. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/7] Documentation/devices.txt: Remove /dev/oldmem description
On Mon, 27 May 2013 09:27:42 +0800 Zhang Yanfei zhangyan...@cn.fujitsu.com wrote: ___ 2013___05___27___ 09:27, HATAYAMA Daisuke __: (2013/05/26 9:58), Zhang Yanfei wrote: ___ 2013___05___26___ 07:20, Eric W. Biederman __: Zhang Yanfei zhangyanfei@gmail.com writes: From: Zhang Yanfei zhangyan...@cn.fujitsu.com Won't we want to keep this reservation around to so that this number doesn't get reused, and cause people confusion when upgrading/downgrading kernels? Ah, yes. I will just keep this and add a note to make people know that it is removed in the next version. It looks enough writing obsolete according to the other parts of the same file. I've sent the v2 version and actually did what you said. I can't find any v2 version of this patchset. I did this: --- a/Documentation/devices.txt~dev-oldmem-remove-the-interface-fix +++ a/Documentation/devices.txt @@ -102,6 +102,7 @@ Your cooperation is appreciated. export the buffered printk records. 12 = /dev/oldmem Used by crashdump kernels to access the memory of the kernel that crashed. + (obsolete) 1 block RAM disk 0 = /dev/ram0 First RAM disk _ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
On Thu, 23 May 2013 14:25:07 +0900 HATAYAMA Daisuke d.hatay...@jp.fujitsu.com wrote: Allocate ELF headers on page-size boundary using __get_free_pages() instead of kmalloc(). Later patch will merge PT_NOTE entries into a single unique one and decrease the buffer size actually used. Keep original buffer size in variable elfcorebuf_sz_orig to kfree the buffer later and actually used buffer size with rounded up to page-size boundary in variable elfcorebuf_sz separately. The size of part of the ELF buffer exported from /proc/vmcore is elfcorebuf_sz. The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz, elfcorebuf_sz_orig], is filled with 0. Use size of the ELF headers as an initial offset value in set_vmcore_list_offsets_elf{64,32} and process_ptload_program_headers_elf{64,32} in order to indicate that the offset includes the holes towards the page boundary. As a result, both set_vmcore_list_offsets_elf{64,32} have the same definition. Merge them as set_vmcore_list_offsets. ... @@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void) } /* Read in all elf headers. */ - elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr); - elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL); + elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr); + elfcorebuf_sz = elfcorebuf_sz_orig; + elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, +get_order(elfcorebuf_sz_orig)); if (!elfcorebuf) return -ENOMEM; addr = elfcorehdr_addr; - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, addr, 0); + rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, addr, 0); if (rc 0) { - kfree(elfcorebuf); + free_pages((unsigned long)elfcorebuf, +get_order(elfcorebuf_sz_orig)); return rc; } /* Merge all PT_NOTE headers into one. */ rc = merge_note_headers_elf64(elfcorebuf, elfcorebuf_sz, vmcore_list); if (rc) { - kfree(elfcorebuf); + free_pages((unsigned long)elfcorebuf, +get_order(elfcorebuf_sz_orig)); return rc; } rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz, vmcore_list); if (rc) { - kfree(elfcorebuf); + free_pages((unsigned long)elfcorebuf, +get_order(elfcorebuf_sz_orig)); return rc; } - set_vmcore_list_offsets_elf64(elfcorebuf, vmcore_list); + set_vmcore_list_offsets(elfcorebuf_sz, vmcore_list); return 0; } @@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void) } /* Read in all elf headers. */ - elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr); - elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL); + elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr); + elfcorebuf_sz = elfcorebuf_sz_orig; + elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, +get_order(elfcorebuf_sz_orig)); if (!elfcorebuf) return -ENOMEM; addr = elfcorehdr_addr; - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, addr, 0); + rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, addr, 0); if (rc 0) { - kfree(elfcorebuf); + free_pages((unsigned long)elfcorebuf, +get_order(elfcorebuf_sz_orig)); return rc; } /* Merge all PT_NOTE headers into one. */ rc = merge_note_headers_elf32(elfcorebuf, elfcorebuf_sz, vmcore_list); if (rc) { - kfree(elfcorebuf); + free_pages((unsigned long)elfcorebuf, +get_order(elfcorebuf_sz_orig)); return rc; } rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz, vmcore_list); if (rc) { - kfree(elfcorebuf); + free_pages((unsigned long)elfcorebuf, +get_order(elfcorebuf_sz_orig)); return rc; } - set_vmcore_list_offsets_elf32(elfcorebuf, vmcore_list); + set_vmcore_list_offsets(elfcorebuf_sz, vmcore_list); return 0; } @@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void) return rc; /* Determine vmcore size. */ - vmcore_size = get_vmcore_size_elf64(elfcorebuf); + vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz); } else if (e_ident[EI_CLASS] == ELFCLASS32) { rc = parse_crash_elf32_headers();
Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke d.hatay...@jp.fujitsu.com wrote: Treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list. Formally, for each range [start, end], we set up the corresponding vmcore object in vmcore_list to [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)]. This change affects layout of /proc/vmcore. Well, changing a userspace interface is generally unacceptable because it can break existing userspace code. If you think the risk is acceptable then please do explain why. In great detail! The gaps generated by the rearrangement are newly made visible to applications as holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE), start] and [end, roundup(end, PAGE_SIZE)]. Suppose variable m points at a vmcore object in vmcore_list, and variable phdr points at the program header of PT_LOAD type the variable m corresponds to. Then, pictorially: m-offset+---+ | hole | phdr-p_offset = +---+ m-offset + (paddr - start) | |\ | kernel memory | phdr-p_memsz | |/ +---+ | hole | m-offset + m-size +---+ where m-offset and m-offset + m-size are always page-size aligned. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH]Add kmsg_dump() to kexec path
Guys, can we please get some review attention to this? From: Seiji Aguchi seiji.agu...@hds.com Subject: Add kmsg_dump() to kexec path Problem === From our support service experience, we always need to detect root cause of OS panic. And customers in enterprise area never forgive us if we can't detect the root cause of panic due to lack of materials for investigation. Kdump is a powerful troubleshooting feature, but it may accesses to multiple hardware, like HBA, FC-cable, to get to dump disk. This means kdump is not robust against hardware failure. Solution Logging kernel message to persistent device is an effective way to get materials for investigation in case of kdump failure. So this patch adds kmsg_dump() to a kexec path. Also, it adds KMSG_DUMP_KEXEC to pstore_cannot_block_path() so that it can avoid deadlocking in kexec path. Please see the detail of pstore_cannot_block_path(). https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/pstore/platform.c?id=9f244e9cfd70c7c0f82d3c92ce772ab2a92d9f64 Actually, there are some objections about kmsg_dump(KMSG_DUMP_KEXEC) and EFI below. But I still think adding kmsg_dump() to a kexec path is useful. - http://marc.info/?l=linux-kernelm=130698519720887w=2 (1) kdump already saves kernel messages inside /proc/vmcore - https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/kexec.c?id=a3dd3323058d281abd584b15ad4c5b65064d7a61 It is correct, but the content of /proc/vmcore is stored a dump disk as well. So, if kdump fails due to hardware failures, the kernel messages will be lost. (2) EFI firmware is buggy - http://marc.info/?l=linux-kernelm=130698519720887w=2 I haven't seen actual firmware bugs which may cause kdump failure. So I don't think we need to care about it too much. However, just to be safe, I introduced pstore_cannot_block_path() avoid deadlocking to pstore. Also, this patch doesn't affect almost all users because kmsg_dump() is kicked only when specifying both pstore.backend and printk.always_kmsg_dump parameters. Even if a buggy firmware causes a kdump failure and someone blames kdump, we can ask them to reproduce the kdump failure by removing the parameters. In addition, I checked current coding of platform drivers. There is no obvious problem as follows. - mtdoops/ramoops They are designed to be kicked in panic and oops cases only. So, they never run in a kexec path. - erst/efi/early_printk_mrst/nvram driver for powerpc I don't see any bugs which may causes kdump failure because deadlocking/dynamic memory allocation don't happen in their write callbacks. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/pstore/platform.c |4 include/linux/kmsg_dump.h |1 + kernel/kexec.c|2 ++ 3 files changed, 7 insertions(+) diff -puN fs/pstore/platform.c~add-kmsg_dump-to-kexec-path fs/pstore/platform.c --- a/fs/pstore/platform.c~add-kmsg_dump-to-kexec-path +++ a/fs/pstore/platform.c @@ -91,6 +91,8 @@ static const char *get_reason_str(enum k return Halt; case KMSG_DUMP_POWEROFF: return Poweroff; + case KMSG_DUMP_KEXEC: + return Kexec; default: return Unknown; } @@ -110,6 +112,8 @@ bool pstore_cannot_block_path(enum kmsg_ case KMSG_DUMP_PANIC: /* Emergency restart shouldn't be blocked by spin lock. */ case KMSG_DUMP_EMERG: + /* In kexec path, pstore shouldn't be blocked to avoid kexec failure. */ + case KMSG_DUMP_KEXEC: return true; default: return false; diff -puN include/linux/kmsg_dump.h~add-kmsg_dump-to-kexec-path include/linux/kmsg_dump.h --- a/include/linux/kmsg_dump.h~add-kmsg_dump-to-kexec-path +++ a/include/linux/kmsg_dump.h @@ -28,6 +28,7 @@ enum kmsg_dump_reason { KMSG_DUMP_RESTART, KMSG_DUMP_HALT, KMSG_DUMP_POWEROFF, + KMSG_DUMP_KEXEC, }; /** diff -puN kernel/kexec.c~add-kmsg_dump-to-kexec-path kernel/kexec.c --- a/kernel/kexec.c~add-kmsg_dump-to-kexec-path +++ a/kernel/kexec.c @@ -32,6 +32,7 @@ #include linux/vmalloc.h #include linux/swap.h #include linux/syscore_ops.h +#include linux/kmsg_dump.h #include asm/page.h #include asm/uaccess.h @@ -1089,6 +1090,7 @@ void crash_kexec(struct pt_regs *regs) crash_setup_regs(fixed_regs, regs); crash_save_vmcoreinfo(); + kmsg_dump(KMSG_DUMP_KEXEC); machine_crash_shutdown(fixed_regs); machine_kexec(kexec_crash_image); } _ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH]Add kmsg_dump() to kexec path
On Tue, 21 May 2013 15:47:41 -0700 gre...@linuxfoundation.org gre...@linuxfoundation.org wrote: On Tue, May 21, 2013 at 03:41:37PM -0700, Andrew Morton wrote: Guys, can we please get some review attention to this? I thought it was reviewed, and rejected, already. Did you miss those emails? I had memories of the same thing, but can't find any record in lkml archives. Either we're thinking of a different patch or this one has appeared under multiple titles. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec