Re: [PATCH 2/4] printk: store instead of processing cont parts
On Mon, 2020-07-20 at 11:30 -0700, Linus Torvalds wrote: > On Sun, Jul 19, 2020 at 6:51 PM Sergey Senozhatsky > wrote: > > Do I get it right, what you are saying is - when we process a PR_CONT > > message the cont buffer should already contain previous non-LOG_NEWLINE > > and non-PR_CONT message, otherwise it's a bug? > > No. > > I'm saying that the code that does PR_CONT should have done *some* > printing before, otherwise it's at the very least questionable. > > IOW, you can't just randomly start printing with PR_CONT, without > having established _some_ context for it. I believe there are at least a few cases that _only_ use pr_cont to emit complete lines. For example: SEQ_printf in kernel/sched/debug.c ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH][next] printk: ringbuffer: support dataless records
On (20/07/20 16:07), John Ogness wrote: > > +/* Determine if a logical position refers to a data-less block. */ > +#define LPOS_DATALESS(lpos) ((lpos) & 1UL) > + [..] > @@ -1402,7 +1396,9 @@ static int prb_read(struct printk_ringbuffer *rb, u64 > seq, > /* Copy text data. If it fails, this is a data-less record. */ > if (!copy_data(&rb->text_data_ring, &desc.text_blk_lpos, > desc.info.text_len, > r->text_buf, r->text_buf_size, line_count)) { > - return -ENOENT; > + /* Report an error if there should have been data. */ > + if (desc.info.text_len != 0) > + return -ENOENT; > } If this is a dataless record then should copy_data() return error? Otherwise, looks good to me Acked-by: Sergey Senozhatsky -ss ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 00/12] ima: Fix rule parsing bugs and extend KEXEC_CMDLINE rule support
[Cc'ing Sasha] On Thu, 2020-07-09 at 01:18 -0500, Tyler Hicks wrote: > I envision patches 1-7 going to stable. The series is ordered in a way > that has all the fixes up front, followed by cleanups, followed by the > feature patch. The breakdown of patches looks like so: > > Memory leak fixes: 1-3 > Parser strictness fixes: 4-7 > Code cleanups made possible by the fixes: 8-11 > Extend KEXEC_CMDLINE rule support: 12 I agree they should be backported, but they don't apply cleanly before linux-5.6. The changes aren't that major. Some patch hunks apply cleanly, but won't compile, while others patch hunks need to be dropped based on when the feature was upstreamed. For these reasons, I'm not Cc'ing stable. Feature upstreamed: - LSM policy update: linux 5.3 - key command line: linux 5.3 - blacklist: linux 5.5 - keyrings: linux 5.6 For Linux 5.3: - Dependency on backporting commit 483ec26eed42 ("ima: ima/lsm policy rule loading logic bug fixes") to apply " ima: Free the entire rule if it fails to parse". Mimi ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/4] printk: store instead of processing cont parts
On Sun, Jul 19, 2020 at 6:51 PM Sergey Senozhatsky wrote: > > Do I get it right, what you are saying is - when we process a PR_CONT > message the cont buffer should already contain previous non-LOG_NEWLINE > and non-PR_CONT message, otherwise it's a bug? No. I'm saying that the code that does PR_CONT should have done *some* printing before, otherwise it's at the very least questionable. IOW, you can't just randomly start printing with PR_CONT, without having established _some_ context for it. But that context could be a previous newline you created (the PR_CONT will be a no-op). That's certainly useful for printing a header and then after that printing possible other complex data that may or may not have line breaks in it. So your example looks fine. The context starts out with pr_warn("which would create a new lock dependency:\n"); and after that you can use KERN_CONT / pr_cont() as much as you want, since you've established a context for what you're printing. And then it ends with 'pr_cont("\n")'. So anything that was interrupted by this, and uses KERN_CONT / pr_cont() will have no ambiguous issues. The code you pointed at both started and ended a line. That said, we have traditionally used not just "current process", but also "last irq-level" as the context information, so I do think it would be good to continue to do that. At that point, "an interrupt printed something in the middle" isn't even an issue any more, because it's clear that the context has changed. Linus ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/4] printk: ringbuffer: support dataless records
On 2020-07-18, John Ogness wrote: > In order to support storage of continuous lines, dataless records must > be allowed. For example, these are generated with the legal calls: > > pr_info(""); > pr_cont("\n"); > > Currently dataless records are denoted by INVALID_LPOS in order to > recognize failed prb_reserve() calls. Change the code to use two > different identifiers (FAILED_LPOS and NO_LPOS) to distinguish > between failed prb_reserve() records and successful dataless records. This patch has been re-posted [0] as a regression fix for the first series that is already in linux-next. Only the commit message has been changed to reflect the regression fix rather than preparing for continuous line support. Assuming that patch is accepted, this one should be dropped. John Ogness [0] https://lkml.kernel.org/r/20200720140111.19935-1-john.ogn...@linutronix.de ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH][next] printk: ringbuffer: support dataless records
On Mon, Jul 20, 2020 at 04:07PM +0206, John Ogness wrote: > With commit ("printk: use the lockless ringbuffer"), printk() > started silently dropping messages without text because such > records are not supported by the new printk ringbuffer. > > Add support for such records. > > Currently dataless records are denoted by INVALID_LPOS in order > to recognize failed prb_reserve() calls. Change the ringbuffer > to instead use two different identifiers (FAILED_LPOS and > NO_LPOS) to distinguish between failed prb_reserve() records and > successful dataless records, respectively. > > Fixes: ("printk: use the lockless ringbuffer") > Fixes: https://lkml.kernel.org/r/20200718121053.ga691...@elver.google.com > Signed-off-by: John Ogness > --- > based on next-20200720 > > kernel/printk/printk_ringbuffer.c | 58 ++- > kernel/printk/printk_ringbuffer.h | 15 > 2 files changed, 35 insertions(+), 38 deletions(-) Thanks! Ran a couple tests and sanitizer report blank lines are back where they're expected. Tested-by: Marco Elver ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH][next] printk: ringbuffer: support dataless records
With commit ("printk: use the lockless ringbuffer"), printk() started silently dropping messages without text because such records are not supported by the new printk ringbuffer. Add support for such records. Currently dataless records are denoted by INVALID_LPOS in order to recognize failed prb_reserve() calls. Change the ringbuffer to instead use two different identifiers (FAILED_LPOS and NO_LPOS) to distinguish between failed prb_reserve() records and successful dataless records, respectively. Fixes: ("printk: use the lockless ringbuffer") Fixes: https://lkml.kernel.org/r/20200718121053.ga691...@elver.google.com Signed-off-by: John Ogness --- based on next-20200720 kernel/printk/printk_ringbuffer.c | 58 ++- kernel/printk/printk_ringbuffer.h | 15 2 files changed, 35 insertions(+), 38 deletions(-) diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c index 7355ca99e852..54b0a6324dbf 100644 --- a/kernel/printk/printk_ringbuffer.c +++ b/kernel/printk/printk_ringbuffer.c @@ -264,6 +264,9 @@ /* Determine how many times the data array has wrapped. */ #define DATA_WRAPS(data_ring, lpos)((lpos) >> (data_ring)->size_bits) +/* Determine if a logical position refers to a data-less block. */ +#define LPOS_DATALESS(lpos)((lpos) & 1UL) + /* Get the logical position at index 0 of the current wrap. */ #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \ ((lpos) & ~DATA_SIZE_MASK(data_ring)) @@ -320,21 +323,13 @@ static unsigned int to_blk_size(unsigned int size) * block does not exceed the maximum possible size that could fit within the * ringbuffer. This function provides that basic size check so that the * assumption is safe. - * - * Writers are also not allowed to write 0-sized (data-less) records. Such - * records are used only internally by the ringbuffer. */ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size) { struct prb_data_block *db = NULL; - /* -* Writers are not allowed to write data-less records. Such records -* are used only internally by the ringbuffer to denote records where -* their data failed to allocate or have been lost. -*/ if (size == 0) - return false; + return true; /* * Ensure the alignment padded size could possibly fit in the data @@ -568,8 +563,8 @@ static bool data_push_tail(struct printk_ringbuffer *rb, unsigned long tail_lpos; unsigned long next_lpos; - /* If @lpos is not valid, there is nothing to do. */ - if (lpos == INVALID_LPOS) + /* If @lpos is from a data-less block, there is nothing to do. */ + if (LPOS_DATALESS(lpos)) return true; /* @@ -962,8 +957,8 @@ static char *data_alloc(struct printk_ringbuffer *rb, if (size == 0) { /* Specify a data-less block. */ - blk_lpos->begin = INVALID_LPOS; - blk_lpos->next = INVALID_LPOS; + blk_lpos->begin = NO_LPOS; + blk_lpos->next = NO_LPOS; return NULL; } @@ -976,8 +971,8 @@ static char *data_alloc(struct printk_ringbuffer *rb, if (!data_push_tail(rb, data_ring, next_lpos - DATA_SIZE(data_ring))) { /* Failed to allocate, specify a data-less block. */ - blk_lpos->begin = INVALID_LPOS; - blk_lpos->next = INVALID_LPOS; + blk_lpos->begin = FAILED_LPOS; + blk_lpos->next = FAILED_LPOS; return NULL; } @@ -1025,6 +1020,10 @@ static char *data_alloc(struct printk_ringbuffer *rb, static unsigned int space_used(struct prb_data_ring *data_ring, struct prb_data_blk_lpos *blk_lpos) { + /* Data-less blocks take no space. */ + if (LPOS_DATALESS(blk_lpos->begin)) + return 0; + if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next)) { /* Data block does not wrap. */ return (DATA_INDEX(data_ring, blk_lpos->next) - @@ -1080,11 +1079,8 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb, if (!data_check_size(&rb->text_data_ring, r->text_buf_size)) goto fail; - /* Records are allowed to not have dictionaries. */ - if (r->dict_buf_size) { - if (!data_check_size(&rb->dict_data_ring, r->dict_buf_size)) - goto fail; - } + if (!data_check_size(&rb->dict_data_ring, r->dict_buf_size)) + goto fail; /* * Descriptors in the reserved state act as blockers to all further @@ -1212,10 +1208,8 @@ static char *get_data(struct
Re: [PATCH v4 03/12] powerpc/kexec_file: add helper functions for getting memory ranges
On 20/07/20 6:21 pm, Hari Bathini wrote: > In kexec case, the kernel to be loaded uses the same memory layout as > the running kernel. So, passing on the DT of the running kernel would > be good enough. > > But in case of kdump, different memory ranges are needed to manage > loading the kdump kernel, booting into it and exporting the elfcore > of the crashing kernel. The ranges are exclude memory ranges, usable > memory ranges, reserved memory ranges and crash memory ranges. > > Exclude memory ranges specify the list of memory ranges to avoid while > loading kdump segments. Usable memory ranges list the memory ranges > that could be used for booting kdump kernel. Reserved memory ranges > list the memory regions for the loading kernel's reserve map. Crash > memory ranges list the memory ranges to be exported as the crashing > kernel's elfcore. > > Add helper functions for setting up the above mentioned memory ranges. > This helpers facilitate in understanding the subsequent changes better > and make it easy to setup the different memory ranges listed above, as > and when appropriate. > > Signed-off-by: Hari Bathini > Tested-by: Pingfan Liu > --- > > v3 -> v4: > * Unchanged. Added Reviewed-by tag from Thiago. > > v2 -> v3: > * Unchanged. Added Acked-by & Tested-by tags from Dave & Pingfan. > > v1 -> v2: > * Introduced arch_kexec_locate_mem_hole() for override and dropped > weak arch_kexec_add_buffer(). > * Dropped __weak identifier for arch overridable functions. > * Fixed the missing declaration for arch_kimage_file_post_load_cleanup() > reported by lkp. lkp report for reference: > - https://lore.kernel.org/patchwork/patch/1264418/ Sorry, copy-paste error. The patch version changelog is as follows: v3 -> v4: * Updated sort_memory_ranges() function to reuse sort() from lib/sort.c and addressed other review comments from Thiago. v2 -> v3: * Unchanged. Added Tested-by tag from Pingfan. v1 -> v2: * Added an option to merge ranges while sorting to minimize reallocations for memory ranges list. * Dropped within_crashkernel option for add_opal_mem_range() & add_rtas_mem_range() as it is not really needed. Thanks Hari ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 12/12] ppc64/kexec_file: fix kexec load failure with lack of memory hole
The kexec purgatory has to run in real mode. Only the first memory block maybe accessible in real mode. And, unlike the case with panic kernel, no memory is set aside for regular kexec load. Another thing to note is, the memory for crashkernel is reserved at an offset of 128MB. So, when crashkernel memory is reserved, the memory ranges to load kexec segments shrink further as the generic code only looks for memblock free memory ranges and in all likelihood only a tiny bit of memory from 0 to 128MB would be available to load kexec segments. With kdump being used by default in general, kexec file load is likely to fail almost always. This can be fixed by changing the memory hole lookup logic for regular kexec to use the same method as kdump. This would mean that most kexec segments will overlap with crashkernel memory region. That should still be ok as the pages, whose destination address isn't available while loading, are placed in an intermediate location till a flush to the actual destination address happens during kexec boot sequence. Signed-off-by: Hari Bathini Tested-by: Pingfan Liu Reviewed-by: Thiago Jung Bauermann --- v3 -> v4: * Unchanged. Added Reviewed-by tag from Thiago. v2 -> v3: * Unchanged. Added Tested-by tag from Pingfan. v1 -> v2: * New patch to fix locating memory hole for kexec_file_load (kexec -s -l) when memory is reserved for crashkernel. arch/powerpc/kexec/file_load_64.c | 33 ++--- 1 file changed, 14 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 47642d5..694f305 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -1374,13 +1374,6 @@ int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf) u64 buf_min, buf_max; int ret; - /* -* Use the generic kexec_locate_mem_hole for regular -* kexec_file_load syscall -*/ - if (kbuf->image->type != KEXEC_TYPE_CRASH) - return kexec_locate_mem_hole(kbuf); - /* Look up the exclude ranges list while locating the memory hole */ emem = &(kbuf->image->arch.exclude_ranges); if (!(*emem) || ((*emem)->nr_ranges == 0)) { @@ -1388,11 +1381,15 @@ int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf) return kexec_locate_mem_hole(kbuf); } + buf_min = kbuf->buf_min; + buf_max = kbuf->buf_max; /* Segments for kdump kernel should be within crashkernel region */ - buf_min = (kbuf->buf_min < crashk_res.start ? - crashk_res.start : kbuf->buf_min); - buf_max = (kbuf->buf_max > crashk_res.end ? - crashk_res.end : kbuf->buf_max); + if (kbuf->image->type == KEXEC_TYPE_CRASH) { + buf_min = (buf_min < crashk_res.start ? + crashk_res.start : buf_min); + buf_max = (buf_max > crashk_res.end ? + crashk_res.end : buf_max); + } if (buf_min > buf_max) { pr_err("Invalid buffer min and/or max values\n"); @@ -1522,15 +1519,13 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi, int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, unsigned long buf_len) { - if (image->type == KEXEC_TYPE_CRASH) { - int ret; + int ret; - /* Get exclude memory ranges needed for setting up kdump segments */ - ret = get_exclude_memory_ranges(&(image->arch.exclude_ranges)); - if (ret) { - pr_err("Failed to setup exclude memory ranges for buffer lookup\n"); - return ret; - } + /* Get exclude memory ranges needed for setting up kexec segments */ + ret = get_exclude_memory_ranges(&(image->arch.exclude_ranges)); + if (ret) { + pr_err("Failed to setup exclude memory ranges for buffer lookup\n"); + return ret; } return kexec_image_probe_default(image, buf, buf_len); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 11/12] ppc64/kexec_file: add appropriate regions for memory reserve map
While initrd, elfcorehdr and backup regions are already added to the reserve map, there are a few missing regions that need to be added to the memory reserve map. Add them here. And now that all the changes to load panic kernel are in place, claim likewise. Signed-off-by: Hari Bathini Tested-by: Pingfan Liu Reviewed-by: Thiago Jung Bauermann --- v3 -> v4: * Fixed a spellcheck and added Reviewed-by tag from Thiago. v2 -> v3: * Unchanged. Added Tested-by tag from Pingfan. v1 -> v2: * Updated add_rtas_mem_range() & add_opal_mem_range() callsites based on the new prototype for these functions. arch/powerpc/kexec/file_load_64.c | 58 ++--- 1 file changed, 53 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 6840ddc..47642d5 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -203,6 +203,34 @@ static int get_crash_memory_ranges(struct crash_mem **mem_ranges) } /** + * get_reserved_memory_ranges - Get reserve memory ranges. This list includes + * memory regions that should be added to the + * memory reserve map to ensure the region is + * protected from any mischief. + * @mem_ranges: Range list to add the memory ranges to. + * + * Returns 0 on success, negative errno on error. + */ +static int get_reserved_memory_ranges(struct crash_mem **mem_ranges) +{ + int ret; + + ret = add_rtas_mem_range(mem_ranges); + if (ret) + goto out; + + ret = add_tce_mem_ranges(mem_ranges); + if (ret) + goto out; + + ret = add_reserved_ranges(mem_ranges); +out: + if (ret) + pr_err("Failed to setup reserved memory ranges\n"); + return ret; +} + +/** * __locate_mem_hole_top_down - Looks top down for a large enough memory hole * in the memory regions between buf_min & buf_max * for the buffer. If found, sets kbuf->mem. @@ -1259,8 +1287,8 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, unsigned long initrd_load_addr, unsigned long initrd_len, const char *cmdline) { - struct crash_mem *umem = NULL; - int ret; + struct crash_mem *umem = NULL, *rmem = NULL; + int i, nr_ranges, ret; ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline); if (ret) @@ -1303,7 +1331,27 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, } } + /* Update memory reserve map */ + ret = get_reserved_memory_ranges(&rmem); + if (ret) + goto out; + + nr_ranges = rmem ? rmem->nr_ranges : 0; + for (i = 0; i < nr_ranges; i++) { + u64 base, size; + + base = rmem->ranges[i].start; + size = rmem->ranges[i].end - base + 1; + ret = fdt_add_mem_rsv(fdt, base, size); + if (ret) { + pr_err("Error updating memory reserve map: %s\n", + fdt_strerror(ret)); + goto out; + } + } + out: + kfree(rmem); kfree(umem); return ret; } @@ -1479,10 +1527,10 @@ int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, /* Get exclude memory ranges needed for setting up kdump segments */ ret = get_exclude_memory_ranges(&(image->arch.exclude_ranges)); - if (ret) + if (ret) { pr_err("Failed to setup exclude memory ranges for buffer lookup\n"); - /* Return this until all changes for panic kernel are in */ - return -EOPNOTSUPP; + return ret; + } } return kexec_image_probe_default(image, buf, buf_len); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 08/12] ppc64/kexec_file: setup the stack for purgatory
To avoid any weird errors, the purgatory should run with its own stack. Set one up by adding the stack buffer to .data section of the purgatory. Also, setup opal base & entry values in r8 & r9 registers to help early OPAL debugging. Signed-off-by: Hari Bathini Tested-by: Pingfan Liu Reviewed-by: Thiago Jung Bauermann --- v3 -> v4: * Fixed stack_buf to be quadword aligned in accordance with ABI. * Added missing of_node_put() in setup_purgatory_ppc64(). * Added Reviewed-by tag from Thiago. v2 -> v3: * Unchanged. Added Tested-by tag from Pingfan. v1 -> v2: * Setting up opal base & entry values in r8 & r9 for early OPAL debug. arch/powerpc/include/asm/kexec.h |4 arch/powerpc/kexec/file_load_64.c | 30 ++ arch/powerpc/purgatory/trampoline_64.S | 32 3 files changed, 66 insertions(+) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 835dc92..00988da 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -45,6 +45,10 @@ #define KEXEC_ARCH KEXEC_ARCH_PPC #endif +#ifdef CONFIG_KEXEC_FILE +#define KEXEC_PURGATORY_STACK_SIZE 16384 /* 16KB stack size */ +#endif + #define KEXEC_STATE_NONE 0 #define KEXEC_STATE_IRQS_OFF 1 #define KEXEC_STATE_REAL_MODE 2 diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 20e638d..7f1f31c 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -946,6 +946,8 @@ int setup_purgatory_ppc64(struct kimage *image, const void *slave_code, const void *fdt, unsigned long kernel_load_addr, unsigned long fdt_load_addr) { + struct device_node *dn = NULL; + void *stack_buf; uint64_t val; int ret; @@ -969,13 +971,41 @@ int setup_purgatory_ppc64(struct kimage *image, const void *slave_code, goto out; } + /* Setup the stack top */ + stack_buf = kexec_purgatory_get_symbol_addr(image, "stack_buf"); + if (!stack_buf) + goto out; + + val = (u64)stack_buf + KEXEC_PURGATORY_STACK_SIZE; + ret = kexec_purgatory_get_set_symbol(image, "stack", &val, sizeof(val), +false); + if (ret) + goto out; + /* Setup the TOC pointer */ val = get_toc_ptr(&(image->purgatory_info)); ret = kexec_purgatory_get_set_symbol(image, "my_toc", &val, sizeof(val), false); + if (ret) + goto out; + + /* Setup OPAL base & entry values */ + dn = of_find_node_by_path("/ibm,opal"); + if (dn) { + of_property_read_u64(dn, "opal-base-address", &val); + ret = kexec_purgatory_get_set_symbol(image, "opal_base", &val, +sizeof(val), false); + if (ret) + goto out; + + of_property_read_u64(dn, "opal-entry-address", &val); + ret = kexec_purgatory_get_set_symbol(image, "opal_entry", &val, +sizeof(val), false); + } out: if (ret) pr_err("Failed to setup purgatory symbols"); + of_node_put(dn); return ret; } diff --git a/arch/powerpc/purgatory/trampoline_64.S b/arch/powerpc/purgatory/trampoline_64.S index b375843..1615dfc 100644 --- a/arch/powerpc/purgatory/trampoline_64.S +++ b/arch/powerpc/purgatory/trampoline_64.S @@ -9,6 +9,7 @@ * Copyright (C) 2013, Anton Blanchard, IBM Corporation */ +#include #include .machine ppc64 @@ -53,6 +54,8 @@ master: ld %r2,(my_toc - 0b)(%r18) /* setup toc */ + ld %r1,(stack - 0b)(%r18) /* setup stack */ + /* load device-tree address */ ld %r3, (dt_offset - 0b)(%r18) mr %r16,%r3/* save dt address in reg16 */ @@ -63,6 +66,10 @@ master: li %r4,28 STWX_BE %r17,%r3,%r4/* Store my cpu as __be32 at byte 28 */ 1: + /* Load opal base and entry values in r8 & r9 respectively */ + ld %r8,(opal_base - 0b)(%r18) + ld %r9,(opal_entry - 0b)(%r18) + /* load the kernel address */ ld %r4,(kernel - 0b)(%r18) @@ -110,6 +117,24 @@ my_toc: .8byte 0x0 .size my_toc, . - my_toc + .balign 8 + .globl stack +stack: + .8byte 0x0 + .size stack, . - stack + + .balign 8 + .globl opal_base +opal_base: + .8byte 0x0 + .size opal_base, . - opal_base + + .balign 8 + .globl opal_entry +opal_entry: + .8byte 0x0 + .size opal_entry, . - opal_entry + .data .balign 8 .globl purgatory_sha256_digest @@ -122,3 +147,10 @@ purgatory_sha256_digest: purgatory_sha_regions:
[PATCH v4 10/12] ppc64/kexec_file: prepare elfcore header for crashing kernel
Prepare elf headers for the crashing kernel's core file using crash_prepare_elf64_headers() and pass on this info to kdump kernel by updating its command line with elfcorehdr parameter. Also, add elfcorehdr location to reserve map to avoid it from being stomped on while booting. Signed-off-by: Hari Bathini Tested-by: Pingfan Liu --- v3 -> v4: * Added a FIXME tag to indicate issue in adding opal/rtas regions to core image. * Folded prepare_elf_headers() function into load_elfcorehdr_segment(). v2 -> v3: * Unchanged. Added Tested-by tag from Pingfan. v1 -> v2: * Tried merging adjacent memory ranges on hitting maximum ranges limit to reduce reallocations for memory ranges and also, minimize PT_LOAD segments for elfcore. * Updated add_rtas_mem_range() & add_opal_mem_range() callsites based on the new prototype for these functions. arch/powerpc/include/asm/kexec.h |6 + arch/powerpc/kexec/elf_64.c | 12 +++ arch/powerpc/kexec/file_load.c| 49 +++ arch/powerpc/kexec/file_load_64.c | 165 + 4 files changed, 232 insertions(+) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index c069f76..6f6317f 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -112,12 +112,18 @@ struct kimage_arch { unsigned long backup_start; void *backup_buf; + unsigned long elfcorehdr_addr; + unsigned long elf_headers_sz; + void *elf_headers; + #ifdef CONFIG_IMA_KEXEC phys_addr_t ima_buffer_addr; size_t ima_buffer_size; #endif }; +char *setup_kdump_cmdline(struct kimage *image, char *cmdline, + unsigned long cmdline_len); int setup_purgatory(struct kimage *image, const void *slave_code, const void *fdt, unsigned long kernel_load_addr, unsigned long fdt_load_addr); diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c index 0ecd88f..be38f72 100644 --- a/arch/powerpc/kexec/elf_64.c +++ b/arch/powerpc/kexec/elf_64.c @@ -35,6 +35,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, void *fdt; const void *slave_code; struct elfhdr ehdr; + char *modified_cmdline = NULL; struct kexec_elf_info elf_info; struct kexec_buf kbuf = { .image = image, .buf_min = 0, .buf_max = ppc64_rma_size }; @@ -75,6 +76,16 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, pr_err("Failed to load kdump kernel segments\n"); goto out; } + + /* Setup cmdline for kdump kernel case */ + modified_cmdline = setup_kdump_cmdline(image, cmdline, + cmdline_len); + if (!modified_cmdline) { + pr_err("Setting up cmdline for kdump kernel failed\n"); + ret = -EINVAL; + goto out; + } + cmdline = modified_cmdline; } if (initrd != NULL) { @@ -131,6 +142,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, pr_err("Error setting up the purgatory.\n"); out: + kfree(modified_cmdline); kexec_free_elf_info(&elf_info); /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */ diff --git a/arch/powerpc/kexec/file_load.c b/arch/powerpc/kexec/file_load.c index 38439ab..d52c097 100644 --- a/arch/powerpc/kexec/file_load.c +++ b/arch/powerpc/kexec/file_load.c @@ -18,11 +18,46 @@ #include #include #include +#include #include #define SLAVE_CODE_SIZE256 /* First 0x100 bytes */ /** + * setup_kdump_cmdline - Prepend "elfcorehdr= " to command line + * of kdump kernel for exporting the core. + * @image: Kexec image + * @cmdline: Command line parameters to update. + * @cmdline_len: Length of the cmdline parameters. + * + * kdump segment must be setup before calling this function. + * + * Returns new cmdline buffer for kdump kernel on success, NULL otherwise. + */ +char *setup_kdump_cmdline(struct kimage *image, char *cmdline, + unsigned long cmdline_len) +{ + int elfcorehdr_strlen; + char *cmdline_ptr; + + cmdline_ptr = kzalloc(COMMAND_LINE_SIZE, GFP_KERNEL); + if (!cmdline_ptr) + return NULL; + + elfcorehdr_strlen = sprintf(cmdline_ptr, "elfcorehdr=0x%lx ", + image->arch.elfcorehdr_addr); + + if (elfcorehdr_strlen + cmdline_len > COMMAND_LINE_SIZE) { + pr_err("Appending elfcorehdr= exceeds cmdline size\n"); + kfree(cmdline_ptr); + return NULL; + } + + memcpy(cmdline_ptr + elfcorehdr_strlen, cmdline, cmdline_len); + return c
[PATCH v4 09/12] ppc64/kexec_file: setup backup region for kdump kernel
Though kdump kernel boots from loaded address, the first 64K bytes of it is copied down to real 0. So, setup a backup region to copy the first 64K bytes of crashed kernel, in purgatory, before booting into kdump kernel. Also, update reserve map with backup region and crashed kernel's memory to avoid kdump kernel from accidentially using that memory. Reported-by: kernel test robot [lkp: In v1, purgatory() declaration was missing] Signed-off-by: Hari Bathini --- v3 -> v4: * Moved fdt_add_mem_rsv() for backup region under kdump flag, on Thiago's suggestion, as it is only relevant for kdump. v2 -> v3: * Dropped check for backup_start in trampoline_64.S as purgatory() takes care of it anyway. v1 -> v2: * Check if backup region is available before branching out. This is to keep `kexec -l -s` flow as before as much as possible. This would eventually change with more testing and addition of sha256 digest verification support. * Fixed missing prototype for purgatory() as reported by lkp. lkp report for reference: - https://lore.kernel.org/patchwork/patch/1264423/ arch/powerpc/include/asm/crashdump-ppc64.h | 10 +++ arch/powerpc/include/asm/kexec.h |7 ++ arch/powerpc/include/asm/purgatory.h | 11 +++ arch/powerpc/kexec/elf_64.c|9 +++ arch/powerpc/kexec/file_load_64.c | 95 +++- arch/powerpc/purgatory/Makefile| 28 arch/powerpc/purgatory/purgatory_64.c | 36 +++ arch/powerpc/purgatory/trampoline_64.S | 24 ++- 8 files changed, 210 insertions(+), 10 deletions(-) create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h create mode 100644 arch/powerpc/include/asm/purgatory.h create mode 100644 arch/powerpc/purgatory/purgatory_64.c diff --git a/arch/powerpc/include/asm/crashdump-ppc64.h b/arch/powerpc/include/asm/crashdump-ppc64.h new file mode 100644 index 000..7ba99ae --- /dev/null +++ b/arch/powerpc/include/asm/crashdump-ppc64.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _ASM_POWERPC_CRASHDUMP_PPC64_H +#define _ASM_POWERPC_CRASHDUMP_PPC64_H + +/* Backup region - first 64K bytes of System RAM. */ +#define BACKUP_SRC_START 0 +#define BACKUP_SRC_END 0x +#define BACKUP_SRC_SIZE(BACKUP_SRC_END - BACKUP_SRC_START + 1) + +#endif /* __ASM_POWERPC_CRASHDUMP_PPC64_H */ diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 00988da..c069f76 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -109,6 +109,9 @@ extern const struct kexec_file_ops kexec_elf64_ops; struct kimage_arch { struct crash_mem *exclude_ranges; + unsigned long backup_start; + void *backup_buf; + #ifdef CONFIG_IMA_KEXEC phys_addr_t ima_buffer_addr; size_t ima_buffer_size; @@ -124,6 +127,10 @@ int setup_new_fdt(const struct kimage *image, void *fdt, int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size); #ifdef CONFIG_PPC64 +struct kexec_buf; + +int load_crashdump_segments_ppc64(struct kimage *image, + struct kexec_buf *kbuf); int setup_purgatory_ppc64(struct kimage *image, const void *slave_code, const void *fdt, unsigned long kernel_load_addr, unsigned long fdt_load_addr); diff --git a/arch/powerpc/include/asm/purgatory.h b/arch/powerpc/include/asm/purgatory.h new file mode 100644 index 000..076d150 --- /dev/null +++ b/arch/powerpc/include/asm/purgatory.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _ASM_POWERPC_PURGATORY_H +#define _ASM_POWERPC_PURGATORY_H + +#ifndef __ASSEMBLY__ +#include + +void purgatory(void); +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_POWERPC_PURGATORY_H */ diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c index 64c15a5..0ecd88f 100644 --- a/arch/powerpc/kexec/elf_64.c +++ b/arch/powerpc/kexec/elf_64.c @@ -68,6 +68,15 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, pr_debug("Loaded purgatory at 0x%lx\n", pbuf.mem); + /* Setup additional segments needed for panic kernel */ + if (image->type == KEXEC_TYPE_CRASH) { + ret = load_crashdump_segments_ppc64(image, &kbuf); + if (ret) { + pr_err("Failed to load kdump kernel segments\n"); + goto out; + } + } + if (initrd != NULL) { kbuf.buffer = initrd; kbuf.bufsz = kbuf.memsz = initrd_len; diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 7f1f31c..41d748c 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -20,9 +20,11 @@ #include #include #include +#include #include #include #include +#include struct umem_info { uint64_t *buf; /*
[PATCH v4 07/12] ppc64/kexec_file: add support to relocate purgatory
Right now purgatory implementation is only minimal. But if purgatory code is to be enhanced to copy memory to the backup region and verify sha256 digest, relocations may have to be applied to the purgatory. So, add support to relocate purgatory in kexec_file_load system call by setting up TOC pointer and applying RELA relocations as needed. Reported-by: kernel test robot [lkp: In v1, 'struct mem_sym' was declared in parameter list] Signed-off-by: Hari Bathini --- * Michael, can you share your opinion on the below: - https://lore.kernel.org/patchwork/patch/1272027/ - My intention in cover note. v3 -> v4: * Updated error log message in get_toc_section() function. v2 -> v3: * Fixed get_toc_section() to return the section info that had relocations applied, to calculate the correct toc pointer. * Fixed how relocation value is converted to relative while applying R_PPC64_REL64 & R_PPC64_REL32 relocations. v1 -> v2: * Fixed wrong use of 'struct mem_sym' in local_entry_offset() as reported by lkp. lkp report for reference: - https://lore.kernel.org/patchwork/patch/1264421/ arch/powerpc/kexec/file_load_64.c | 337 arch/powerpc/purgatory/trampoline_64.S |7 + 2 files changed, 344 insertions(+) diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 71c1ba7..20e638d 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -692,6 +693,244 @@ static int update_usable_mem_fdt(void *fdt, struct crash_mem *usable_mem) } /** + * get_toc_section - Look for ".toc" symbol and return the corresponding section + * in the purgatory. + * @pi: Purgatory Info. + * + * Returns TOC section on success, NULL otherwise. + */ +static const Elf_Shdr *get_toc_section(const struct purgatory_info *pi) +{ + const Elf_Shdr *sechdrs; + const char *secstrings; + int i; + + if (!pi->ehdr) { + pr_err("Purgatory's elf info not found!\n"); + return NULL; + } + + sechdrs = (void *)pi->ehdr + pi->ehdr->e_shoff; + secstrings = (void *)pi->ehdr + sechdrs[pi->ehdr->e_shstrndx].sh_offset; + + for (i = 0; i < pi->ehdr->e_shnum; i++) { + if ((sechdrs[i].sh_size != 0) && + (strcmp(secstrings + sechdrs[i].sh_name, ".toc") == 0)) { + /* Return the relocated ".toc" section */ + return &(pi->sechdrs[i]); + } + } + + return NULL; +} + +/** + * get_toc_ptr - Get the TOC pointer (r2) of purgatory. + * @pi: Purgatory Info. + * + * Returns r2 on success, 0 otherwise. + */ +static unsigned long get_toc_ptr(const struct purgatory_info *pi) +{ + unsigned long toc_ptr = 0; + const Elf_Shdr *sechdr; + + sechdr = get_toc_section(pi); + if (!sechdr) + pr_err("Could not get the TOC section!\n"); + else + toc_ptr = sechdr->sh_addr + 0x8000; /* 0x8000 into TOC */ + + pr_debug("TOC pointer (r2) is 0x%lx\n", toc_ptr); + return toc_ptr; +} + +/* Helper functions to apply relocations */ +static int do_relative_toc(unsigned long val, uint16_t *loc, + unsigned long mask, int complain_signed) +{ + if (complain_signed && (val + 0x8000 > 0x)) { + pr_err("TOC16 relocation overflows (%lu)\n", val); + return -ENOEXEC; + } + + if ((~mask & 0x) & val) { + pr_err("Bad TOC16 relocation (%lu)\n", val); + return -ENOEXEC; + } + + *loc = (*loc & ~mask) | (val & mask); + return 0; +} +#ifdef PPC64_ELF_ABI_v2 +/* PowerPC64 specific values for the Elf64_Sym st_other field. */ +#define STO_PPC64_LOCAL_BIT5 +#define STO_PPC64_LOCAL_MASK (7 << STO_PPC64_LOCAL_BIT) +#define PPC64_LOCAL_ENTRY_OFFSET(other) \ + (((1 << (((other) & STO_PPC64_LOCAL_MASK) >> STO_PPC64_LOCAL_BIT)) \ +>> 2) << 2) + +static unsigned int local_entry_offset(const Elf64_Sym *sym) +{ + /* If this symbol has a local entry point, use it. */ + return PPC64_LOCAL_ENTRY_OFFSET(sym->st_other); +} +#else +static unsigned int local_entry_offset(const Elf64_Sym *sym) +{ + return 0; +} +#endif + +/** + * __kexec_do_relocs - Apply relocations based on relocation type. + * @my_r2: TOC pointer. + * @sym: Symbol to relocate. + * @r_type:Relocation type. + * @loc: Location to modify. + * @val: Relocated symbol value. + * @addr: Final location after relocation. + * + * Returns 0 on success, negative errno on error. + */ +static int __kexec_do_relocs(unsigned long my_r2, const Elf_Sym *sym, +int r_type, void *loc, unsigned long v
[PATCH v4 04/12] ppc64/kexec_file: avoid stomping memory used by special regions
crashkernel region could have an overlap with special memory regions like opal, rtas, tce-table & such. These regions are referred to as exclude memory ranges. Setup this ranges during image probe in order to avoid them while finding the buffer for different kdump segments. Override arch_kexec_locate_mem_hole() to locate a memory hole taking these ranges into account. Signed-off-by: Hari Bathini --- v3 -> v4: * Dropped KDUMP_BUF_MIN & KDUMP_BUF_MAX macros and fixed off-by-one error in arch_locate_mem_hole() helper routines. v2 -> v3: * If there are no exclude ranges, the right thing to do is fallbacking back to default kexec_locate_mem_hole() implementation instead of returning 0. Fixed that. v1 -> v2: * Did arch_kexec_locate_mem_hole() override to handle special regions. * Ensured holes in the memory are accounted for while locating mem hole. * Updated add_rtas_mem_range() & add_opal_mem_range() callsites based on the new prototype for these functions. arch/powerpc/include/asm/kexec.h |7 + arch/powerpc/kexec/elf_64.c |8 + arch/powerpc/kexec/file_load_64.c | 337 + 3 files changed, 348 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index ac8fd48..835dc92 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -100,14 +100,16 @@ void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_co #ifdef CONFIG_KEXEC_FILE extern const struct kexec_file_ops kexec_elf64_ops; -#ifdef CONFIG_IMA_KEXEC #define ARCH_HAS_KIMAGE_ARCH struct kimage_arch { + struct crash_mem *exclude_ranges; + +#ifdef CONFIG_IMA_KEXEC phys_addr_t ima_buffer_addr; size_t ima_buffer_size; -}; #endif +}; int setup_purgatory(struct kimage *image, const void *slave_code, const void *fdt, unsigned long kernel_load_addr, @@ -125,6 +127,7 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, unsigned long initrd_load_addr, unsigned long initrd_len, const char *cmdline); #endif /* CONFIG_PPC64 */ + #endif /* CONFIG_KEXEC_FILE */ #else /* !CONFIG_KEXEC_CORE */ diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c index 23ad04c..64c15a5 100644 --- a/arch/powerpc/kexec/elf_64.c +++ b/arch/powerpc/kexec/elf_64.c @@ -46,6 +46,14 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, if (ret) goto out; + if (image->type == KEXEC_TYPE_CRASH) { + /* min & max buffer values for kdump case */ + kbuf.buf_min = pbuf.buf_min = crashk_res.start; + kbuf.buf_max = pbuf.buf_max = + ((crashk_res.end < ppc64_rma_size) ? +crashk_res.end : (ppc64_rma_size - 1)); + } + ret = kexec_elf_load(image, &ehdr, &elf_info, &kbuf, &kernel_load_addr); if (ret) goto out; diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 41fe8b6..2df6f42 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include const struct kexec_file_ops * const kexec_file_loaders[] = { &kexec_elf64_ops, @@ -24,6 +26,254 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { }; /** + * get_exclude_memory_ranges - Get exclude memory ranges. This list includes + * regions like opal/rtas, tce-table, initrd, + * kernel, htab which should be avoided while + * setting up kexec load segments. + * @mem_ranges:Range list to add the memory ranges to. + * + * Returns 0 on success, negative errno on error. + */ +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges) +{ + int ret; + + ret = add_tce_mem_ranges(mem_ranges); + if (ret) + goto out; + + ret = add_initrd_mem_range(mem_ranges); + if (ret) + goto out; + + ret = add_htab_mem_range(mem_ranges); + if (ret) + goto out; + + ret = add_kernel_mem_range(mem_ranges); + if (ret) + goto out; + + ret = add_rtas_mem_range(mem_ranges); + if (ret) + goto out; + + ret = add_opal_mem_range(mem_ranges); + if (ret) + goto out; + + ret = add_reserved_ranges(mem_ranges); + if (ret) + goto out; + + /* exclude memory ranges should be sorted for easy lookup */ + sort_memory_ranges(*mem_ranges, true); +out: + if (ret) + pr_err("Failed to setup exclude memory ranges\n"); + return ret; +} + +/** + * __locate_mem_hole_top_down - Looks top down for a large enough memory hole + *
[PATCH v4 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel
Kdump kernel, used for capturing the kernel core image, is supposed to use only specific memory regions to avoid corrupting the image to be captured. The regions are crashkernel range - the memory reserved explicitly for kdump kernel, memory used for the tce-table, the OPAL region and RTAS region as applicable. Restrict kdump kernel memory to use only these regions by setting up usable-memory DT property. Also, tell the kdump kernel to run at the loaded address by setting the magic word at 0x5c. Signed-off-by: Hari Bathini Tested-by: Pingfan Liu --- v3 -> v4: * Updated get_node_path() to be an iterative function instead of a recursive one. * Added comment explaining why low memory is added to kdump kernel's usable memory ranges though it doesn't fall in crashkernel region. * For correctness, added fdt_add_mem_rsv() for the low memory being added to kdump kernel's usable memory ranges. * Fixed prop pointer update in add_usable_mem_property() and changed duple to tuple as suggested by Thiago. v2 -> v3: * Unchanged. Added Tested-by tag from Pingfan. v1 -> v2: * Fixed off-by-one error while setting up usable-memory properties. * Updated add_rtas_mem_range() & add_opal_mem_range() callsites based on the new prototype for these functions. arch/powerpc/kexec/file_load_64.c | 472 + 1 file changed, 471 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 2df6f42..71c1ba7 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -17,9 +17,21 @@ #include #include #include +#include #include +#include +#include #include +struct umem_info { + uint64_t *buf; /* data buffer for usable-memory property */ + uint32_t idx; /* current index */ + uint32_t size; /* size allocated for the data buffer */ + + /* usable memory ranges to look up */ + const struct crash_mem *umrngs; +}; + const struct kexec_file_ops * const kexec_file_loaders[] = { &kexec_elf64_ops, NULL @@ -75,6 +87,42 @@ static int get_exclude_memory_ranges(struct crash_mem **mem_ranges) } /** + * get_usable_memory_ranges - Get usable memory ranges. This list includes + *regions like crashkernel, opal/rtas & tce-table, + *that kdump kernel could use. + * @mem_ranges: Range list to add the memory ranges to. + * + * Returns 0 on success, negative errno on error. + */ +static int get_usable_memory_ranges(struct crash_mem **mem_ranges) +{ + int ret; + + /* +* prom code doesn't take kindly to missing low memory. So, add +* [0, crashk_res.end] instead of [crashk_res.start, crashk_res.end] +* to keep it happy. +*/ + ret = add_mem_range(mem_ranges, 0, crashk_res.end + 1); + if (ret) + goto out; + + ret = add_rtas_mem_range(mem_ranges); + if (ret) + goto out; + + ret = add_opal_mem_range(mem_ranges); + if (ret) + goto out; + + ret = add_tce_mem_ranges(mem_ranges); +out: + if (ret) + pr_err("Failed to setup usable memory ranges\n"); + return ret; +} + +/** * __locate_mem_hole_top_down - Looks top down for a large enough memory hole * in the memory regions between buf_min & buf_max * for the buffer. If found, sets kbuf->mem. @@ -274,6 +322,376 @@ static int locate_mem_hole_bottom_up_ppc64(struct kexec_buf *kbuf, } /** + * check_realloc_usable_mem - Reallocate buffer if it can't accommodate entries + * @um_info: Usable memory buffer and ranges info. + * @cnt: No. of entries to accommodate. + * + * Frees up the old buffer if memory reallocation fails. + * + * Returns buffer on success, NULL on error. + */ +static uint64_t *check_realloc_usable_mem(struct umem_info *um_info, int cnt) +{ + void *tbuf; + + if (um_info->size >= + ((um_info->idx + cnt) * sizeof(*(um_info->buf + return um_info->buf; + + um_info->size += MEM_RANGE_CHUNK_SZ; + tbuf = krealloc(um_info->buf, um_info->size, GFP_KERNEL); + if (!tbuf) { + um_info->size -= MEM_RANGE_CHUNK_SZ; + return NULL; + } + + memset(tbuf + um_info->idx, 0, MEM_RANGE_CHUNK_SZ); + return tbuf; +} + +/** + * add_usable_mem - Add the usable memory ranges within the given memory range + * to the buffer + * @um_info:Usable memory buffer and ranges info. + * @base: Base address of memory range to look for. + * @end:End address of memory range to look for. + * @cnt:No. of usable memory ranges added to buffer. + * + * Returns 0 on success, negative errno on error. + */ +static int add_usable_mem(struct umem_info *um_info, uint64_t base, +
[PATCH v4 02/12] powerpc/kexec_file: mark PPC64 specific code
Some of the kexec_file_load code isn't PPC64 specific. Move PPC64 specific code from kexec/file_load.c to kexec/file_load_64.c. Also, rename purgatory/trampoline.S to purgatory/trampoline_64.S in the same spirit. No functional changes. Signed-off-by: Hari Bathini Tested-by: Pingfan Liu Reviewed-by: Laurent Dufour Reviewed-by: Thiago Jung Bauermann --- v3 -> v4: * Moved common code back to set_new_fdt() from setup_new_fdt_ppc64() function. Added Reviewed-by tags from Laurent & Thiago. v2 -> v3: * Unchanged. Added Tested-by tag from Pingfan. v1 -> v2: * No changes. arch/powerpc/include/asm/kexec.h |9 ++ arch/powerpc/kexec/Makefile|2 - arch/powerpc/kexec/elf_64.c|7 +- arch/powerpc/kexec/file_load.c | 19 + arch/powerpc/kexec/file_load_64.c | 87 arch/powerpc/purgatory/Makefile|4 + arch/powerpc/purgatory/trampoline.S| 117 arch/powerpc/purgatory/trampoline_64.S | 117 8 files changed, 222 insertions(+), 140 deletions(-) create mode 100644 arch/powerpc/kexec/file_load_64.c delete mode 100644 arch/powerpc/purgatory/trampoline.S create mode 100644 arch/powerpc/purgatory/trampoline_64.S diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index c684768..ac8fd48 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -116,6 +116,15 @@ int setup_new_fdt(const struct kimage *image, void *fdt, unsigned long initrd_load_addr, unsigned long initrd_len, const char *cmdline); int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size); + +#ifdef CONFIG_PPC64 +int setup_purgatory_ppc64(struct kimage *image, const void *slave_code, + const void *fdt, unsigned long kernel_load_addr, + unsigned long fdt_load_addr); +int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, + unsigned long initrd_load_addr, + unsigned long initrd_len, const char *cmdline); +#endif /* CONFIG_PPC64 */ #endif /* CONFIG_KEXEC_FILE */ #else /* !CONFIG_KEXEC_CORE */ diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 86380c6..67c3553 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -7,7 +7,7 @@ obj-y += core.o crash.o core_$(BITS).o obj-$(CONFIG_PPC32)+= relocate_32.o -obj-$(CONFIG_KEXEC_FILE) += file_load.o elf_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o ifdef CONFIG_HAVE_IMA_KEXEC ifdef CONFIG_IMA diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c index 3072fd6..23ad04c 100644 --- a/arch/powerpc/kexec/elf_64.c +++ b/arch/powerpc/kexec/elf_64.c @@ -88,7 +88,8 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, goto out; } - ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline); + ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr, + initrd_len, cmdline); if (ret) goto out; @@ -107,8 +108,8 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, pr_debug("Loaded device tree at 0x%lx\n", fdt_load_addr); slave_code = elf_info.buffer + elf_info.proghdrs[0].p_offset; - ret = setup_purgatory(image, slave_code, fdt, kernel_load_addr, - fdt_load_addr); + ret = setup_purgatory_ppc64(image, slave_code, fdt, kernel_load_addr, + fdt_load_addr); if (ret) pr_err("Error setting up the purgatory.\n"); diff --git a/arch/powerpc/kexec/file_load.c b/arch/powerpc/kexec/file_load.c index 143c917..38439ab 100644 --- a/arch/powerpc/kexec/file_load.c +++ b/arch/powerpc/kexec/file_load.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * ppc64 code to implement the kexec_file_load syscall + * powerpc code to implement the kexec_file_load syscall * * Copyright (C) 2004 Adam Litke (a...@us.ibm.com) * Copyright (C) 2004 IBM Corp. @@ -20,22 +20,7 @@ #include #include -#define SLAVE_CODE_SIZE256 - -const struct kexec_file_ops * const kexec_file_loaders[] = { - &kexec_elf64_ops, - NULL -}; - -int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, - unsigned long buf_len) -{ - /* We don't support crash kernels yet. */ - if (image->type == KEXEC_TYPE_CRASH) - return -EOPNOTSUPP; - - return kexec_image_probe_default(image, buf, buf_len); -} +#define SLAVE_CODE_SIZE256 /* First 0x100 bytes */ /** * setup_purgatory - initialize the purgatory's global variables diff --git a/arch/powerpc/kexec/file_load_
[PATCH v4 05/12] powerpc/drmem: make lmb walk a bit more flexible
Currently, numa & prom are the users of drmem lmb walk code. Loading kdump with kexec_file also needs to walk the drmem LMBs to setup the usable memory ranges for kdump kernel. But there are couple of issues in using the code as is. One, walk_drmem_lmb() code is built into the .init section currently, while kexec_file needs it later. Two, there is no scope to pass data to the callback function for processing and/ or erroring out on certain conditions. Fix that by, moving drmem LMB walk code out of .init section, adding scope to pass data to the callback function and bailing out when an error is encountered in the callback function. Signed-off-by: Hari Bathini Tested-by: Pingfan Liu Reviewed-by: Thiago Jung Bauermann --- v3 -> v4: * Unchanged. Added Reviewed-by tag from Thiago. v2 -> v3: * Unchanged. Added Tested-by tag from Pingfan. v1 -> v2: * No changes. arch/powerpc/include/asm/drmem.h |9 ++-- arch/powerpc/kernel/prom.c | 13 +++--- arch/powerpc/mm/drmem.c | 87 +- arch/powerpc/mm/numa.c | 13 +++--- 4 files changed, 78 insertions(+), 44 deletions(-) diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h index 414d209..17ccc64 100644 --- a/arch/powerpc/include/asm/drmem.h +++ b/arch/powerpc/include/asm/drmem.h @@ -90,13 +90,14 @@ static inline bool drmem_lmb_reserved(struct drmem_lmb *lmb) } u64 drmem_lmb_memory_max(void); -void __init walk_drmem_lmbs(struct device_node *dn, - void (*func)(struct drmem_lmb *, const __be32 **)); +int walk_drmem_lmbs(struct device_node *dn, void *data, + int (*func)(struct drmem_lmb *, const __be32 **, void *)); int drmem_update_dt(void); #ifdef CONFIG_PPC_PSERIES -void __init walk_drmem_lmbs_early(unsigned long node, - void (*func)(struct drmem_lmb *, const __be32 **)); +int __init +walk_drmem_lmbs_early(unsigned long node, void *data, + int (*func)(struct drmem_lmb *, const __be32 **, void *)); #endif static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 9cc49f2..7df78de 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -468,8 +468,9 @@ static bool validate_mem_limit(u64 base, u64 *size) * This contains a list of memory blocks along with NUMA affinity * information. */ -static void __init early_init_drmem_lmb(struct drmem_lmb *lmb, - const __be32 **usm) +static int __init early_init_drmem_lmb(struct drmem_lmb *lmb, + const __be32 **usm, + void *data) { u64 base, size; int is_kexec_kdump = 0, rngs; @@ -484,7 +485,7 @@ static void __init early_init_drmem_lmb(struct drmem_lmb *lmb, */ if ((lmb->flags & DRCONF_MEM_RESERVED) || !(lmb->flags & DRCONF_MEM_ASSIGNED)) - return; + return 0; if (*usm) is_kexec_kdump = 1; @@ -499,7 +500,7 @@ static void __init early_init_drmem_lmb(struct drmem_lmb *lmb, */ rngs = dt_mem_next_cell(dt_root_size_cells, usm); if (!rngs) /* there are no (base, size) duple */ - return; + return 0; } do { @@ -524,6 +525,8 @@ static void __init early_init_drmem_lmb(struct drmem_lmb *lmb, if (lmb->flags & DRCONF_MEM_HOTREMOVABLE) memblock_mark_hotplug(base, size); } while (--rngs); + + return 0; } #endif /* CONFIG_PPC_PSERIES */ @@ -534,7 +537,7 @@ static int __init early_init_dt_scan_memory_ppc(unsigned long node, #ifdef CONFIG_PPC_PSERIES if (depth == 1 && strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) { - walk_drmem_lmbs_early(node, early_init_drmem_lmb); + walk_drmem_lmbs_early(node, NULL, early_init_drmem_lmb); return 0; } #endif diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c index 59327ce..b2eeea3 100644 --- a/arch/powerpc/mm/drmem.c +++ b/arch/powerpc/mm/drmem.c @@ -14,6 +14,8 @@ #include #include +static int n_root_addr_cells, n_root_size_cells; + static struct drmem_lmb_info __drmem_info; struct drmem_lmb_info *drmem_info = &__drmem_info; @@ -189,12 +191,13 @@ int drmem_update_dt(void) return rc; } -static void __init read_drconf_v1_cell(struct drmem_lmb *lmb, +static void read_drconf_v1_cell(struct drmem_lmb *lmb, const __be32 **prop) { const __be32 *p = *prop; - lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, &p); + lmb->base_addr = of_read_number(p, n_root_addr_cells); + p += n_root_addr_cells; lmb->drc_index = of_read_number(p++, 1);
[PATCH v4 00/12] ppc64: enable kdump support for kexec_file_load syscall
This patch series enables kdump support for kexec_file_load system call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools code but heavily modified for kernel consumption. The first patch adds a weak arch_kexec_locate_mem_hole() function to override locate memory hole logic suiting arch needs. There are some special regions in ppc64 which should be avoided while loading buffer & there are multiple callers to kexec_add_buffer making it complicated to maintain range sanity and using generic lookup at the same time. The second patch marks ppc64 specific code within arch/powerpc/kexec and arch/powerpc/purgatory to make the subsequent code changes easy to understand. The next patch adds helper function to setup different memory ranges needed for loading kdump kernel, booting into it and exporting the crashing kernel's elfcore. The fourth patch overrides arch_kexec_locate_mem_hole() function to locate memory hole for kdump segments by accounting for the special memory regions, referred to as excluded memory ranges, and sets kbuf->mem when a suitable memory region is found. The fifth patch moves walk_drmem_lmbs() out of .init section with a few changes to reuse it for setting up kdump kernel's usable memory ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs and set linux,drconf-usable-memory & linux,usable-memory properties in order to restrict kdump kernel's memory usage. The seventh patch adds relocation support for the purgatory. Patch 8 helps setup the stack for the purgatory. The next patch setups up backup region as a segment while loading kdump kernel and teaches purgatory to copy it from source to destination. Patch 10 builds the elfcore header for the running kernel & passes the info to kdump kernel via "elfcorehdr=" parameter to export as /proc/vmcore file. The next patch sets up the memory reserve map for the kexec kernel and also claims kdump support for kdump as all the necessary changes are added. The last patch fixes a lookup issue for `kexec -l -s` case when memory is reserved for crashkernel. There is scope to improve purgatory to print messages, verify sha256, move code common across archs - like arch_kexec_apply_relocations_add and sha256 digest verification, build purgatory as position independent code & other Makefile improvements in purgatory which can be dealt with in a separate patch series as a follow-up. Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER boxes, one with secureboot enabled and a simulator. v3 -> v4: * Updated get_node_path() to be an iterative function instead of a recursive one. * Added comment explaining why low memory is added to kdump kernel's usable memory ranges though it doesn't fall in crashkernel region. * Fixed stack_buf to be quadword aligned in accordance with ABI. * Added missing of_node_put() in setup_purgatory_ppc64(). * Added a FIXME tag to indicate issue in adding opal/rtas regions to core image. v2 -> v3: * Fixed TOC pointer calculation for purgatory by using section info that has relocations applied. * Fixed arch_kexec_locate_mem_hole() function to fallback to generic kexec_locate_mem_hole() lookup if exclude ranges list is empty. * Dropped check for backup_start in trampoline_64.S as purgatory() function takes care of it anyway. v1 -> v2: * Introduced arch_kexec_locate_mem_hole() for override and dropped weak arch_kexec_add_buffer(). * Addressed warnings reported by lkp. * Added patch to address kexec load issue when memory is reserved for crashkernel. * Used the appropriate license header for the new files added. * Added an option to merge ranges to minimize reallocations while adding memory ranges. * Dropped within_crashkernel parameter for add_opal_mem_range() & add_rtas_mem_range() functions as it is not really needed. --- Hari Bathini (12): kexec_file: allow archs to handle special regions while locating memory hole powerpc/kexec_file: mark PPC64 specific code powerpc/kexec_file: add helper functions for getting memory ranges ppc64/kexec_file: avoid stomping memory used by special regions powerpc/drmem: make lmb walk a bit more flexible ppc64/kexec_file: restrict memory usage of kdump kernel ppc64/kexec_file: add support to relocate purgatory ppc64/kexec_file: setup the stack for purgatory ppc64/kexec_file: setup backup region for kdump kernel ppc64/kexec_file: prepare elfcore header for crashing kernel ppc64/kexec_file: add appropriate regions for memory reserve map ppc64/kexec_file: fix kexec load failure with lack of memory hole arch/powerpc/include/asm/crashdump-ppc64.h | 10 arch/powerpc/include/asm/drmem.h |9 arch/powerpc/include/asm/kexec.h | 33 + arch/powerpc/include/asm/kexec_ranges.h| 25 arch/powerpc/include/asm/purgatory.h | 11 arch/powerpc/kernel/prom.c | 13 arch/powerpc/kexec/Makefile|2
[PATCH v4 03/12] powerpc/kexec_file: add helper functions for getting memory ranges
In kexec case, the kernel to be loaded uses the same memory layout as the running kernel. So, passing on the DT of the running kernel would be good enough. But in case of kdump, different memory ranges are needed to manage loading the kdump kernel, booting into it and exporting the elfcore of the crashing kernel. The ranges are exclude memory ranges, usable memory ranges, reserved memory ranges and crash memory ranges. Exclude memory ranges specify the list of memory ranges to avoid while loading kdump segments. Usable memory ranges list the memory ranges that could be used for booting kdump kernel. Reserved memory ranges list the memory regions for the loading kernel's reserve map. Crash memory ranges list the memory ranges to be exported as the crashing kernel's elfcore. Add helper functions for setting up the above mentioned memory ranges. This helpers facilitate in understanding the subsequent changes better and make it easy to setup the different memory ranges listed above, as and when appropriate. Signed-off-by: Hari Bathini Tested-by: Pingfan Liu --- v3 -> v4: * Unchanged. Added Reviewed-by tag from Thiago. v2 -> v3: * Unchanged. Added Acked-by & Tested-by tags from Dave & Pingfan. v1 -> v2: * Introduced arch_kexec_locate_mem_hole() for override and dropped weak arch_kexec_add_buffer(). * Dropped __weak identifier for arch overridable functions. * Fixed the missing declaration for arch_kimage_file_post_load_cleanup() reported by lkp. lkp report for reference: - https://lore.kernel.org/patchwork/patch/1264418/ arch/powerpc/include/asm/kexec_ranges.h | 25 ++ arch/powerpc/kexec/Makefile |2 arch/powerpc/kexec/ranges.c | 410 +++ 3 files changed, 436 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/kexec_ranges.h create mode 100644 arch/powerpc/kexec/ranges.c diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h new file mode 100644 index 000..78f3111 --- /dev/null +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _ASM_POWERPC_KEXEC_RANGES_H +#define _ASM_POWERPC_KEXEC_RANGES_H + +#define MEM_RANGE_CHUNK_SZ 2048/* Memory ranges size chunk */ + +struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); +int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int add_tce_mem_ranges(struct crash_mem **mem_ranges); +int add_initrd_mem_range(struct crash_mem **mem_ranges); +#ifdef CONFIG_PPC_BOOK3S_64 +int add_htab_mem_range(struct crash_mem **mem_ranges); +#else +static inline int add_htab_mem_range(struct crash_mem **mem_ranges) +{ + return 0; +} +#endif +int add_kernel_mem_range(struct crash_mem **mem_ranges); +int add_rtas_mem_range(struct crash_mem **mem_ranges); +int add_opal_mem_range(struct crash_mem **mem_ranges); +int add_reserved_ranges(struct crash_mem **mem_ranges); +void sort_memory_ranges(struct crash_mem *mrngs, bool merge); + +#endif /* _ASM_POWERPC_KEXEC_RANGES_H */ diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 67c3553..4aff684 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -7,7 +7,7 @@ obj-y += core.o crash.o core_$(BITS).o obj-$(CONFIG_PPC32)+= relocate_32.o -obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o ifdef CONFIG_HAVE_IMA_KEXEC ifdef CONFIG_IMA diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c new file mode 100644 index 000..713ce54 --- /dev/null +++ b/arch/powerpc/kexec/ranges.c @@ -0,0 +1,410 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * powerpc code to implement the kexec_file_load syscall + * + * Copyright (C) 2004 Adam Litke (a...@us.ibm.com) + * Copyright (C) 2004 IBM Corp. + * Copyright (C) 2004,2005 Milton D Miller II, IBM Corporation + * Copyright (C) 2005 R Sharada (shar...@in.ibm.com) + * Copyright (C) 2006 Mohan Kumar M (mo...@in.ibm.com) + * Copyright (C) 2020 IBM Corporation + * + * Based on kexec-tools' kexec-ppc64.c, fs2dt.c. + * Heavily modified for the kernel by + * Hari Bathini . + */ + +#undef DEBUG +#define pr_fmt(fmt) "kexec ranges: " fmt + +#include +#include +#include +#include +#include +#include + +/** + * get_max_nr_ranges - Get the max no. of ranges crash_mem structure + * could hold, given the size allocated for it. + * @size: Allocation size of crash_mem structure. + * + * Returns the maximum no. of ranges. + */ +static inline unsigned int get_max_nr_ranges(size_t size) +{ + return ((size - sizeof(struct crash_mem)) / + sizeof(struct crash_mem_range)); +} + +/** + * get_mem_rngs_size - Get the allocated size of mrngs based on + * max_nr_ranges and
[PATCH v4 01/12] kexec_file: allow archs to handle special regions while locating memory hole
Some architectures may have special memory regions, within the given memory range, which can't be used for the buffer in a kexec segment. Implement weak arch_kexec_locate_mem_hole() definition which arch code may override, to take care of special regions, while trying to locate a memory hole. Also, add the missing declarations for arch overridable functions and and drop the __weak descriptors in the declarations to avoid non-weak definitions from becoming weak. Reported-by: kernel test robot [lkp: In v1, arch_kimage_file_post_load_cleanup() declaration was missing] Signed-off-by: Hari Bathini Tested-by: Pingfan Liu Acked-by: Dave Young Reviewed-by: Thiago Jung Bauermann --- v3 -> v4: * Unchanged. Added Reviewed-by tag from Thiago. v2 -> v3: * Unchanged. Added Acked-by & Tested-by tags from Dave & Pingfan. v1 -> v2: * Introduced arch_kexec_locate_mem_hole() for override and dropped weak arch_kexec_add_buffer(). * Dropped __weak identifier for arch overridable functions. * Fixed the missing declaration for arch_kimage_file_post_load_cleanup() reported by lkp. lkp report for reference: - https://lore.kernel.org/patchwork/patch/1264418/ include/linux/kexec.h | 29 ++--- kernel/kexec_file.c | 16 ++-- 2 files changed, 32 insertions(+), 13 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index ea67910..9e93bef 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -183,17 +183,24 @@ int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name, bool get_value); void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name); -int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf, -unsigned long buf_len); -void * __weak arch_kexec_kernel_image_load(struct kimage *image); -int __weak arch_kexec_apply_relocations_add(struct purgatory_info *pi, - Elf_Shdr *section, - const Elf_Shdr *relsec, - const Elf_Shdr *symtab); -int __weak arch_kexec_apply_relocations(struct purgatory_info *pi, - Elf_Shdr *section, - const Elf_Shdr *relsec, - const Elf_Shdr *symtab); +/* Architectures may override the below functions */ +int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, + unsigned long buf_len); +void *arch_kexec_kernel_image_load(struct kimage *image); +int arch_kexec_apply_relocations_add(struct purgatory_info *pi, +Elf_Shdr *section, +const Elf_Shdr *relsec, +const Elf_Shdr *symtab); +int arch_kexec_apply_relocations(struct purgatory_info *pi, +Elf_Shdr *section, +const Elf_Shdr *relsec, +const Elf_Shdr *symtab); +int arch_kimage_file_post_load_cleanup(struct kimage *image); +#ifdef CONFIG_KEXEC_SIG +int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, +unsigned long buf_len); +#endif +int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf); extern int kexec_add_buffer(struct kexec_buf *kbuf); int kexec_locate_mem_hole(struct kexec_buf *kbuf); diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 09cc78d..e89912d 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -636,6 +636,19 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf) } /** + * arch_kexec_locate_mem_hole - Find free memory to place the segments. + * @kbuf: Parameters for the memory search. + * + * On success, kbuf->mem will have the start address of the memory region found. + * + * Return: 0 on success, negative errno on error. + */ +int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf) +{ + return kexec_locate_mem_hole(kbuf); +} + +/** * kexec_add_buffer - place a buffer in a kexec segment * @kbuf: Buffer contents and memory parameters. * @@ -647,7 +660,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf) */ int kexec_add_buffer(struct kexec_buf *kbuf) { - struct kexec_segment *ksegment; int ret; @@ -675,7 +687,7 @@ int kexec_add_buffer(struct kexec_buf *kbuf) kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE); /* Walk the RAM ranges and allocate a suitable range for the buffer */ - ret = kexec_locate_mem_hole(kbuf); + ret = arch_kexec_locate_mem_hole(kbuf); if (ret) return ret; ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 4/4] printk: use the lockless ringbuffer
On Mon, 20 Jul 2020 at 12:20, John Ogness wrote: > > On 2020-07-18, Marco Elver wrote: > > It seems this causes a regression observed at least with newline-only > > printks. > > [...] > > -- >8 -- > > > > --- a/init/main.c > > +++ b/init/main.c > > @@ -1039,6 +1039,10 @@ asmlinkage __visible void __init start_kernel(void) > > sfi_init_late(); > > kcsan_init(); > > > > + pr_info("EXPECT BLANK LINE --vv\n"); > > + pr_info("\n"); > > + pr_info("EXPECT BLANK LINE --^^\n"); > > + > > /* Do the rest non-__init'ed, we're now alive */ > > arch_call_rest_init(); > > Thanks for the example. This is an unintentional regression in the > series. I will submit a patch to fix this. > > Note that this regression does not exist when the followup series [0] > (reimplementing LOG_CONT) is applied. All the more reason that the 1st > series should be fixed before pushing the 2nd series to linux-next. Great, thank you for clarifying! :-) -- Marco > John Ogness > > [0] https://lkml.kernel.org/r/20200717234818.8622-1-john.ogn...@linutronix.de ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 4/4] printk: use the lockless ringbuffer
On 2020-07-18, Marco Elver wrote: > It seems this causes a regression observed at least with newline-only > printks. > [...] > -- >8 -- > > --- a/init/main.c > +++ b/init/main.c > @@ -1039,6 +1039,10 @@ asmlinkage __visible void __init start_kernel(void) > sfi_init_late(); > kcsan_init(); > > + pr_info("EXPECT BLANK LINE --vv\n"); > + pr_info("\n"); > + pr_info("EXPECT BLANK LINE --^^\n"); > + > /* Do the rest non-__init'ed, we're now alive */ > arch_call_rest_init(); Thanks for the example. This is an unintentional regression in the series. I will submit a patch to fix this. Note that this regression does not exist when the followup series [0] (reimplementing LOG_CONT) is applied. All the more reason that the 1st series should be fixed before pushing the 2nd series to linux-next. John Ogness [0] https://lkml.kernel.org/r/20200717234818.8622-1-john.ogn...@linutronix.de ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 4/4] printk: use the lockless ringbuffer
On Mon, Jul 20, 2020 at 11:41 AM Marco Elver wrote: > > On Mon, 20 Jul 2020 at 10:41, Sergey Senozhatsky > wrote: > > > > On (20/07/20 08:43), Marco Elver wrote: > > > On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote: > > > > > > As I said, a number of debugging tools use them to format reports to be > > > more readable (visually separate title and report body, and separate > > > parts of the report). Also, such reports are often parsed by CI systems, > > > and by changing the reports, these CI systems may break. But those are > > > just the usecases I'm acutely aware of > > > > Can you give example of such CI systems? // that's a real question // > > None of ours should break; I agree the CI system is brittle if it > relies on newlines. Parsed and displayed reports are changing, however > -- what irks me is now all the reports sent to the LKML look ugly. > > Some random KASAN reports (just compare formatting): > next (ugly): > https://lore.kernel.org/lkml/c87b7305aadb6...@google.com/ > mainline (normal): > https://lore.kernel.org/lkml/f4ef6a05aa92e...@google.com/ > > The same problem exists with lockdep reports, KCSAN reports, ... If > newline-printks to insert blank lines are now banned, what are we to > do? Send dozens of patches to switch everyone to printk(" \n")? Or > some better suggestion? I cannot yet see how that is an improvement. > (And if the behaviour is not reverted, please document the new > behaviour.) > > That also doesn't yet address the ~400 other newline-printk users, and > somebody needs to do the due diligence to understand if it's just a > flush, or an intentional blank line. Empty lines improve readability of long crash reports significantly. New lines in sanitizer reports originated in Go race reports 7 years ago and then spread to user-space ASAN/MSAN/TSAN b/c that was an improvement and then were specifically added to kernel sanitizers. This is even more important now that we have up to 5 stacks in KASAN reports. Please keep them. Also having lots of printk("\n") sprinkled in kernel code and turning them into no-op separately does not look like the right solution. These printk("\n") are confusing and add clutter. A better solution would be to remove these printk("\n") from the code. But this also naturally allows selective removal. Say, keeping for sanitizers and some other cases, but removing some that are not useful. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 4/4] printk: use the lockless ringbuffer
On Mon, 20 Jul 2020 at 10:41, Sergey Senozhatsky wrote: > > On (20/07/20 08:43), Marco Elver wrote: > > On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote: > > > > As I said, a number of debugging tools use them to format reports to be > > more readable (visually separate title and report body, and separate > > parts of the report). Also, such reports are often parsed by CI systems, > > and by changing the reports, these CI systems may break. But those are > > just the usecases I'm acutely aware of > > Can you give example of such CI systems? // that's a real question // None of ours should break; I agree the CI system is brittle if it relies on newlines. Parsed and displayed reports are changing, however -- what irks me is now all the reports sent to the LKML look ugly. Some random KASAN reports (just compare formatting): next (ugly): https://lore.kernel.org/lkml/c87b7305aadb6...@google.com/ mainline (normal): https://lore.kernel.org/lkml/f4ef6a05aa92e...@google.com/ The same problem exists with lockdep reports, KCSAN reports, ... If newline-printks to insert blank lines are now banned, what are we to do? Send dozens of patches to switch everyone to printk(" \n")? Or some better suggestion? I cannot yet see how that is an improvement. (And if the behaviour is not reverted, please document the new behaviour.) That also doesn't yet address the ~400 other newline-printk users, and somebody needs to do the due diligence to understand if it's just a flush, or an intentional blank line. Thanks, -- Marco ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 4/4] printk: use the lockless ringbuffer
On (20/07/20 08:43), Marco Elver wrote: > On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote: > > As I said, a number of debugging tools use them to format reports to be > more readable (visually separate title and report body, and separate > parts of the report). Also, such reports are often parsed by CI systems, > and by changing the reports, these CI systems may break. But those are > just the usecases I'm acutely aware of Can you give example of such CI systems? // that's a real question // -ss ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 4/4] printk: use the lockless ringbuffer
On (20/07/20 08:43), Marco Elver wrote: [..] > please see a full list of newline-print users below. [..] > $> git grep -En '\<(printk|pr_err|pr_warn|pr_info)\>\("\\n"\)' > arch/alpha/kernel/core_wildfire.c:650:printk("\n"); > arch/alpha/kernel/core_wildfire.c:658:printk("\n"); > arch/alpha/kernel/traps.c:120:printk("\n"); [..] In many cases printk("\n") is not for "print a blank line", but rather for "flush pr_cont buffer". -ss ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
RE: [PATCH 07/13] fs/kernel_read_file: Switch buffer size arg to size_t
From: Kees Cook > Sent: 17 July 2020 18:43 > In preparation for further refactoring of kernel_read_file*(), rename > the "max_size" argument to the more accurate "buf_size", and correct > its type to size_t. Add kerndoc to explain the specifics of how the > arguments will be used. Note that with buf_size now size_t, it can no > longer be negative (and was never called with a negative value). Adjust > callers to use it as a "maximum size" when *buf is NULL. > > Signed-off-by: Kees Cook > --- > fs/kernel_read_file.c| 34 +++- > include/linux/kernel_read_file.h | 8 > security/integrity/digsig.c | 2 +- > security/integrity/ima/ima_fs.c | 2 +- > 4 files changed, 31 insertions(+), 15 deletions(-) > > diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c > index dc28a8def597..e21a76001fff 100644 > --- a/fs/kernel_read_file.c > +++ b/fs/kernel_read_file.c > @@ -5,15 +5,31 @@ > #include > #include > > +/** > + * kernel_read_file() - read file contents into a kernel buffer > + * > + * @file file to read from > + * @buf pointer to a "void *" buffer for reading into (if > + * *@buf is NULL, a buffer will be allocated, and > + * @buf_size will be ignored) > + * @buf_size size of buf, if already allocated. If @buf not > + * allocated, this is the largest size to allocate. > + * @id the kernel_read_file_id identifying the type of > + * file contents being read (for LSMs to examine) > + * > + * Returns number of bytes read (no single read will be bigger > + * than INT_MAX), or negative on error. > + * > + */ That seems to be self-inconsistent. If '*buf' is NULL is both says that buf_size is ignored and is treated as a limit. To make life easier, zero should probably be treated as no-limit. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 4/4] printk: use the lockless ringbuffer
On Mon, Jul 20, 2020 at 9:45 AM Marco Elver wrote: > On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote: > > On (20/07/18 14:10), Marco Elver wrote: > > > > > > It seems this causes a regression observed at least with newline-only > > > printks. I noticed this during -next testing because various debugging > > > tools (K*SAN, lockdep, etc.) use e.g. pr_{err,warn,info}("\n") to format > > > reports. > > > > > > Without wanting to wait for a report from one of these debugging tools, > > > a simple reproducer is below. Without this patch, the expected newline > > > is printed. > > > > Empty/blank lines carry no valuable payload, could you please explain > > why do you consider this to be a regression? > > Empty/blank lines are visually valuable. > > Did I miss a discussion somewhere that this change is acceptable? > Unfortunately, I can't find it mentioned in the commit message, and > therefore assumed it's a regression. > > As I said, a number of debugging tools use them to format reports to be > more readable (visually separate title and report body, and separate > parts of the report). While I can find it useful in some cases, though messages can be interleaved, ... > Also, such reports are often parsed by CI systems, > and by changing the reports, these CI systems may break. But those are > just the usecases I'm acutely aware of -- please see a full list of > newline-print users below. ...but this is a weak argument. If your CI relies on message rather on the ABI, you earn the breakage. Go and fix your CI to do sane things instead. > Breaking the observable behaviour of a widely used interface such as > printk doesn't seem right. Where the newline-print is inappropriate, > wouldn't removing that newline-print be more appropriate (instead of > forcing this behaviour on everyone)? -- With Best Regards, Andy Shevchenko ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec