Re: [PATCH 2/4] printk: store instead of processing cont parts

2020-07-20 Thread Joe Perches
On Mon, 2020-07-20 at 11:30 -0700, Linus Torvalds wrote:
> On Sun, Jul 19, 2020 at 6:51 PM Sergey Senozhatsky
>  wrote:
> > Do I get it right, what you are saying is - when we process a PR_CONT
> > message the cont buffer should already contain previous non-LOG_NEWLINE
> > and non-PR_CONT message, otherwise it's a bug?
> 
> No.
> 
> I'm saying that the code that does PR_CONT should have done *some*
> printing before, otherwise it's at the very least questionable.
> 
> IOW, you can't just randomly start printing with PR_CONT, without
> having established _some_ context for it.

I believe there are at least a few cases that
_only_ use pr_cont to emit
complete lines.

For example: SEQ_printf in kernel/sched/debug.c



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH][next] printk: ringbuffer: support dataless records

2020-07-20 Thread Sergey Senozhatsky
On (20/07/20 16:07), John Ogness wrote:
>  
> +/* Determine if a logical position refers to a data-less block. */
> +#define LPOS_DATALESS(lpos)  ((lpos) & 1UL)
> +

[..]

> @@ -1402,7 +1396,9 @@ static int prb_read(struct printk_ringbuffer *rb, u64 
> seq,
>   /* Copy text data. If it fails, this is a data-less record. */
>   if (!copy_data(&rb->text_data_ring, &desc.text_blk_lpos, 
> desc.info.text_len,
>  r->text_buf, r->text_buf_size, line_count)) {
> - return -ENOENT;
> + /* Report an error if there should have been data. */
> + if (desc.info.text_len != 0)
> + return -ENOENT;
>   }

If this is a dataless record then should copy_data() return error?

Otherwise, looks good to me
Acked-by: Sergey Senozhatsky 

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 00/12] ima: Fix rule parsing bugs and extend KEXEC_CMDLINE rule support

2020-07-20 Thread Mimi Zohar
[Cc'ing Sasha]

On Thu, 2020-07-09 at 01:18 -0500, Tyler Hicks wrote:

> I envision patches 1-7 going to stable. The series is ordered in a way
> that has all the fixes up front, followed by cleanups, followed by the
> feature patch. The breakdown of patches looks like so:
> 
>  Memory leak fixes: 1-3
>  Parser strictness fixes: 4-7
>  Code cleanups made possible by the fixes: 8-11
>  Extend KEXEC_CMDLINE rule support: 12

I agree they should be backported, but they don't apply cleanly before
linux-5.6.  The changes aren't that major.  Some patch hunks apply
cleanly, but won't compile, while others patch hunks need to be
dropped based on when the feature was upstreamed.  For these reasons,
I'm not Cc'ing stable.

Feature upstreamed:
- LSM policy update: linux 5.3
- key command line: linux 5.3
- blacklist: linux 5.5
- keyrings: linux 5.6

For Linux 5.3:
- Dependency on backporting commit 483ec26eed42 ("ima: ima/lsm policy
rule loading logic bug fixes") to apply " ima: Free the entire rule if
it fails to parse".

Mimi

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/4] printk: store instead of processing cont parts

2020-07-20 Thread Linus Torvalds
On Sun, Jul 19, 2020 at 6:51 PM Sergey Senozhatsky
 wrote:
>
> Do I get it right, what you are saying is - when we process a PR_CONT
> message the cont buffer should already contain previous non-LOG_NEWLINE
> and non-PR_CONT message, otherwise it's a bug?

No.

I'm saying that the code that does PR_CONT should have done *some*
printing before, otherwise it's at the very least questionable.

IOW, you can't just randomly start printing with PR_CONT, without
having established _some_ context for it.

But that context could be a previous newline you created (the PR_CONT
will be a no-op). That's certainly useful for printing a header and
then after that printing possible other complex data that may or may
not have line breaks in it.

So your example looks fine. The context starts out with

pr_warn("which would create a new lock dependency:\n");

and after that you can use KERN_CONT / pr_cont() as much as you want,
since you've established a context for what you're printing.

And then it ends with 'pr_cont("\n")'.

So anything that was interrupted by this, and uses KERN_CONT /
pr_cont() will have no ambiguous issues. The code you pointed at both
started and ended a line.

That said, we have traditionally used not just "current process", but
also "last irq-level" as the context information, so I do think it
would be good to continue to do that.

At that point, "an interrupt printed something in the middle" isn't
even an issue any more, because it's clear that the context has
changed.

 Linus

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/4] printk: ringbuffer: support dataless records

2020-07-20 Thread John Ogness
On 2020-07-18, John Ogness  wrote:
> In order to support storage of continuous lines, dataless records must
> be allowed. For example, these are generated with the legal calls:
>
> pr_info("");
> pr_cont("\n");
>
> Currently dataless records are denoted by INVALID_LPOS in order to
> recognize failed prb_reserve() calls. Change the code to use two
> different identifiers (FAILED_LPOS and NO_LPOS) to distinguish
> between failed prb_reserve() records and successful dataless records.

This patch has been re-posted [0] as a regression fix for the first
series that is already in linux-next. Only the commit message has been
changed to reflect the regression fix rather than preparing for
continuous line support.

Assuming that patch is accepted, this one should be dropped.

John Ogness

[0] https://lkml.kernel.org/r/20200720140111.19935-1-john.ogn...@linutronix.de

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH][next] printk: ringbuffer: support dataless records

2020-07-20 Thread Marco Elver
On Mon, Jul 20, 2020 at 04:07PM +0206, John Ogness wrote:
> With commit ("printk: use the lockless ringbuffer"), printk()
> started silently dropping messages without text because such
> records are not supported by the new printk ringbuffer.
> 
> Add support for such records.
> 
> Currently dataless records are denoted by INVALID_LPOS in order
> to recognize failed prb_reserve() calls. Change the ringbuffer
> to instead use two different identifiers (FAILED_LPOS and
> NO_LPOS) to distinguish between failed prb_reserve() records and
> successful dataless records, respectively.
> 
> Fixes: ("printk: use the lockless ringbuffer")
> Fixes: https://lkml.kernel.org/r/20200718121053.ga691...@elver.google.com
> Signed-off-by: John Ogness 
> ---
>  based on next-20200720
> 
>  kernel/printk/printk_ringbuffer.c | 58 ++-
>  kernel/printk/printk_ringbuffer.h | 15 
>  2 files changed, 35 insertions(+), 38 deletions(-)

Thanks! Ran a couple tests and sanitizer report blank lines are back
where they're expected.

Tested-by: Marco Elver 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH][next] printk: ringbuffer: support dataless records

2020-07-20 Thread John Ogness
With commit ("printk: use the lockless ringbuffer"), printk()
started silently dropping messages without text because such
records are not supported by the new printk ringbuffer.

Add support for such records.

Currently dataless records are denoted by INVALID_LPOS in order
to recognize failed prb_reserve() calls. Change the ringbuffer
to instead use two different identifiers (FAILED_LPOS and
NO_LPOS) to distinguish between failed prb_reserve() records and
successful dataless records, respectively.

Fixes: ("printk: use the lockless ringbuffer")
Fixes: https://lkml.kernel.org/r/20200718121053.ga691...@elver.google.com
Signed-off-by: John Ogness 
---
 based on next-20200720

 kernel/printk/printk_ringbuffer.c | 58 ++-
 kernel/printk/printk_ringbuffer.h | 15 
 2 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 7355ca99e852..54b0a6324dbf 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -264,6 +264,9 @@
 /* Determine how many times the data array has wrapped. */
 #define DATA_WRAPS(data_ring, lpos)((lpos) >> (data_ring)->size_bits)
 
+/* Determine if a logical position refers to a data-less block. */
+#define LPOS_DATALESS(lpos)((lpos) & 1UL)
+
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
 ((lpos) & ~DATA_SIZE_MASK(data_ring))
@@ -320,21 +323,13 @@ static unsigned int to_blk_size(unsigned int size)
  * block does not exceed the maximum possible size that could fit within the
  * ringbuffer. This function provides that basic size check so that the
  * assumption is safe.
- *
- * Writers are also not allowed to write 0-sized (data-less) records. Such
- * records are used only internally by the ringbuffer.
  */
 static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
 {
struct prb_data_block *db = NULL;
 
-   /*
-* Writers are not allowed to write data-less records. Such records
-* are used only internally by the ringbuffer to denote records where
-* their data failed to allocate or have been lost.
-*/
if (size == 0)
-   return false;
+   return true;
 
/*
 * Ensure the alignment padded size could possibly fit in the data
@@ -568,8 +563,8 @@ static bool data_push_tail(struct printk_ringbuffer *rb,
unsigned long tail_lpos;
unsigned long next_lpos;
 
-   /* If @lpos is not valid, there is nothing to do. */
-   if (lpos == INVALID_LPOS)
+   /* If @lpos is from a data-less block, there is nothing to do. */
+   if (LPOS_DATALESS(lpos))
return true;
 
/*
@@ -962,8 +957,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
if (size == 0) {
/* Specify a data-less block. */
-   blk_lpos->begin = INVALID_LPOS;
-   blk_lpos->next = INVALID_LPOS;
+   blk_lpos->begin = NO_LPOS;
+   blk_lpos->next = NO_LPOS;
return NULL;
}
 
@@ -976,8 +971,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
if (!data_push_tail(rb, data_ring, next_lpos - 
DATA_SIZE(data_ring))) {
/* Failed to allocate, specify a data-less block. */
-   blk_lpos->begin = INVALID_LPOS;
-   blk_lpos->next = INVALID_LPOS;
+   blk_lpos->begin = FAILED_LPOS;
+   blk_lpos->next = FAILED_LPOS;
return NULL;
}
 
@@ -1025,6 +1020,10 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 static unsigned int space_used(struct prb_data_ring *data_ring,
   struct prb_data_blk_lpos *blk_lpos)
 {
+   /* Data-less blocks take no space. */
+   if (LPOS_DATALESS(blk_lpos->begin))
+   return 0;
+
if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next)) {
/* Data block does not wrap. */
return (DATA_INDEX(data_ring, blk_lpos->next) -
@@ -1080,11 +1079,8 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
if (!data_check_size(&rb->text_data_ring, r->text_buf_size))
goto fail;
 
-   /* Records are allowed to not have dictionaries. */
-   if (r->dict_buf_size) {
-   if (!data_check_size(&rb->dict_data_ring, r->dict_buf_size))
-   goto fail;
-   }
+   if (!data_check_size(&rb->dict_data_ring, r->dict_buf_size))
+   goto fail;
 
/*
 * Descriptors in the reserved state act as blockers to all further
@@ -1212,10 +1208,8 @@ static char *get_data(struct 

Re: [PATCH v4 03/12] powerpc/kexec_file: add helper functions for getting memory ranges

2020-07-20 Thread Hari Bathini



On 20/07/20 6:21 pm, Hari Bathini wrote:
> In kexec case, the kernel to be loaded uses the same memory layout as
> the running kernel. So, passing on the DT of the running kernel would
> be good enough.
> 
> But in case of kdump, different memory ranges are needed to manage
> loading the kdump kernel, booting into it and exporting the elfcore
> of the crashing kernel. The ranges are exclude memory ranges, usable
> memory ranges, reserved memory ranges and crash memory ranges.
> 
> Exclude memory ranges specify the list of memory ranges to avoid while
> loading kdump segments. Usable memory ranges list the memory ranges
> that could be used for booting kdump kernel. Reserved memory ranges
> list the memory regions for the loading kernel's reserve map. Crash
> memory ranges list the memory ranges to be exported as the crashing
> kernel's elfcore.
> 
> Add helper functions for setting up the above mentioned memory ranges.
> This helpers facilitate in understanding the subsequent changes better
> and make it easy to setup the different memory ranges listed above, as
> and when appropriate.
> 
> Signed-off-by: Hari Bathini 
> Tested-by: Pingfan Liu 
> ---

> 
> v3 -> v4:
> * Unchanged. Added Reviewed-by tag from Thiago.
> 
> v2 -> v3:
> * Unchanged. Added Acked-by & Tested-by tags from Dave & Pingfan.
> 
> v1 -> v2:
> * Introduced arch_kexec_locate_mem_hole() for override and dropped
>   weak arch_kexec_add_buffer().
> * Dropped __weak identifier for arch overridable functions.
> * Fixed the missing declaration for arch_kimage_file_post_load_cleanup()
>   reported by lkp. lkp report for reference:
> - https://lore.kernel.org/patchwork/patch/1264418/

Sorry, copy-paste error. The patch version changelog is as follows:

v3 -> v4:
* Updated sort_memory_ranges() function to reuse sort() from lib/sort.c
  and addressed other review comments from Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* Added an option to merge ranges while sorting to minimize reallocations
  for memory ranges list.
* Dropped within_crashkernel option for add_opal_mem_range() &
  add_rtas_mem_range() as it is not really needed.


Thanks
Hari

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v4 12/12] ppc64/kexec_file: fix kexec load failure with lack of memory hole

2020-07-20 Thread Hari Bathini
The kexec purgatory has to run in real mode. Only the first memory
block maybe accessible in real mode. And, unlike the case with panic
kernel, no memory is set aside for regular kexec load. Another thing
to note is, the memory for crashkernel is reserved at an offset of
128MB. So, when crashkernel memory is reserved, the memory ranges to
load kexec segments shrink further as the generic code only looks for
memblock free memory ranges and in all likelihood only a tiny bit of
memory from 0 to 128MB would be available to load kexec segments.

With kdump being used by default in general, kexec file load is likely
to fail almost always. This can be fixed by changing the memory hole
lookup logic for regular kexec to use the same method as kdump. This
would mean that most kexec segments will overlap with crashkernel
memory region. That should still be ok as the pages, whose destination
address isn't available while loading, are placed in an intermediate
location till a flush to the actual destination address happens during
kexec boot sequence.

Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
Reviewed-by: Thiago Jung Bauermann 
---

v3 -> v4:
* Unchanged. Added Reviewed-by tag from Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* New patch to fix locating memory hole for kexec_file_load (kexec -s -l)
  when memory is reserved for crashkernel.


 arch/powerpc/kexec/file_load_64.c |   33 ++---
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 47642d5..694f305 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -1374,13 +1374,6 @@ int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
u64 buf_min, buf_max;
int ret;
 
-   /*
-* Use the generic kexec_locate_mem_hole for regular
-* kexec_file_load syscall
-*/
-   if (kbuf->image->type != KEXEC_TYPE_CRASH)
-   return kexec_locate_mem_hole(kbuf);
-
/* Look up the exclude ranges list while locating the memory hole */
emem = &(kbuf->image->arch.exclude_ranges);
if (!(*emem) || ((*emem)->nr_ranges == 0)) {
@@ -1388,11 +1381,15 @@ int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
return kexec_locate_mem_hole(kbuf);
}
 
+   buf_min = kbuf->buf_min;
+   buf_max = kbuf->buf_max;
/* Segments for kdump kernel should be within crashkernel region */
-   buf_min = (kbuf->buf_min < crashk_res.start ?
-  crashk_res.start : kbuf->buf_min);
-   buf_max = (kbuf->buf_max > crashk_res.end ?
-  crashk_res.end : kbuf->buf_max);
+   if (kbuf->image->type == KEXEC_TYPE_CRASH) {
+   buf_min = (buf_min < crashk_res.start ?
+  crashk_res.start : buf_min);
+   buf_max = (buf_max > crashk_res.end ?
+  crashk_res.end : buf_max);
+   }
 
if (buf_min > buf_max) {
pr_err("Invalid buffer min and/or max values\n");
@@ -1522,15 +1519,13 @@ int arch_kexec_apply_relocations_add(struct 
purgatory_info *pi,
 int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
  unsigned long buf_len)
 {
-   if (image->type == KEXEC_TYPE_CRASH) {
-   int ret;
+   int ret;
 
-   /* Get exclude memory ranges needed for setting up kdump 
segments */
-   ret = get_exclude_memory_ranges(&(image->arch.exclude_ranges));
-   if (ret) {
-   pr_err("Failed to setup exclude memory ranges for 
buffer lookup\n");
-   return ret;
-   }
+   /* Get exclude memory ranges needed for setting up kexec segments */
+   ret = get_exclude_memory_ranges(&(image->arch.exclude_ranges));
+   if (ret) {
+   pr_err("Failed to setup exclude memory ranges for buffer 
lookup\n");
+   return ret;
}
 
return kexec_image_probe_default(image, buf, buf_len);


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v4 11/12] ppc64/kexec_file: add appropriate regions for memory reserve map

2020-07-20 Thread Hari Bathini
While initrd, elfcorehdr and backup regions are already added to the
reserve map, there are a few missing regions that need to be added to
the memory reserve map. Add them here. And now that all the changes
to load panic kernel are in place, claim likewise.

Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
Reviewed-by: Thiago Jung Bauermann 
---

v3 -> v4:
* Fixed a spellcheck and added Reviewed-by tag from Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* Updated add_rtas_mem_range() & add_opal_mem_range() callsites based on
  the new prototype for these functions.


 arch/powerpc/kexec/file_load_64.c |   58 ++---
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 6840ddc..47642d5 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -203,6 +203,34 @@ static int get_crash_memory_ranges(struct crash_mem 
**mem_ranges)
 }
 
 /**
+ * get_reserved_memory_ranges - Get reserve memory ranges. This list includes
+ *  memory regions that should be added to the
+ *  memory reserve map to ensure the region is
+ *  protected from any mischief.
+ * @mem_ranges: Range list to add the memory ranges to.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int get_reserved_memory_ranges(struct crash_mem **mem_ranges)
+{
+   int ret;
+
+   ret = add_rtas_mem_range(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_tce_mem_ranges(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_reserved_ranges(mem_ranges);
+out:
+   if (ret)
+   pr_err("Failed to setup reserved memory ranges\n");
+   return ret;
+}
+
+/**
  * __locate_mem_hole_top_down - Looks top down for a large enough memory hole
  *  in the memory regions between buf_min & buf_max
  *  for the buffer. If found, sets kbuf->mem.
@@ -1259,8 +1287,8 @@ int setup_new_fdt_ppc64(const struct kimage *image, void 
*fdt,
unsigned long initrd_load_addr,
unsigned long initrd_len, const char *cmdline)
 {
-   struct crash_mem *umem = NULL;
-   int ret;
+   struct crash_mem *umem = NULL, *rmem = NULL;
+   int i, nr_ranges, ret;
 
ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline);
if (ret)
@@ -1303,7 +1331,27 @@ int setup_new_fdt_ppc64(const struct kimage *image, void 
*fdt,
}
}
 
+   /* Update memory reserve map */
+   ret = get_reserved_memory_ranges(&rmem);
+   if (ret)
+   goto out;
+
+   nr_ranges = rmem ? rmem->nr_ranges : 0;
+   for (i = 0; i < nr_ranges; i++) {
+   u64 base, size;
+
+   base = rmem->ranges[i].start;
+   size = rmem->ranges[i].end - base + 1;
+   ret = fdt_add_mem_rsv(fdt, base, size);
+   if (ret) {
+   pr_err("Error updating memory reserve map: %s\n",
+  fdt_strerror(ret));
+   goto out;
+   }
+   }
+
 out:
+   kfree(rmem);
kfree(umem);
return ret;
 }
@@ -1479,10 +1527,10 @@ int arch_kexec_kernel_image_probe(struct kimage *image, 
void *buf,
 
/* Get exclude memory ranges needed for setting up kdump 
segments */
ret = get_exclude_memory_ranges(&(image->arch.exclude_ranges));
-   if (ret)
+   if (ret) {
pr_err("Failed to setup exclude memory ranges for 
buffer lookup\n");
-   /* Return this until all changes for panic kernel are in */
-   return -EOPNOTSUPP;
+   return ret;
+   }
}
 
return kexec_image_probe_default(image, buf, buf_len);


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v4 08/12] ppc64/kexec_file: setup the stack for purgatory

2020-07-20 Thread Hari Bathini
To avoid any weird errors, the purgatory should run with its own
stack. Set one up by adding the stack buffer to .data section of
the purgatory. Also, setup opal base & entry values in r8 & r9
registers to help early OPAL debugging.

Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
Reviewed-by: Thiago Jung Bauermann 
---

v3 -> v4:
* Fixed stack_buf to be quadword aligned in accordance with ABI.
* Added missing of_node_put() in setup_purgatory_ppc64().
* Added Reviewed-by tag from Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* Setting up opal base & entry values in r8 & r9 for early OPAL debug.


 arch/powerpc/include/asm/kexec.h   |4 
 arch/powerpc/kexec/file_load_64.c  |   30 ++
 arch/powerpc/purgatory/trampoline_64.S |   32 
 3 files changed, 66 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 835dc92..00988da 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -45,6 +45,10 @@
 #define KEXEC_ARCH KEXEC_ARCH_PPC
 #endif
 
+#ifdef CONFIG_KEXEC_FILE
+#define KEXEC_PURGATORY_STACK_SIZE 16384   /* 16KB stack size */
+#endif
+
 #define KEXEC_STATE_NONE 0
 #define KEXEC_STATE_IRQS_OFF 1
 #define KEXEC_STATE_REAL_MODE 2
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 20e638d..7f1f31c 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -946,6 +946,8 @@ int setup_purgatory_ppc64(struct kimage *image, const void 
*slave_code,
  const void *fdt, unsigned long kernel_load_addr,
  unsigned long fdt_load_addr)
 {
+   struct device_node *dn = NULL;
+   void *stack_buf;
uint64_t val;
int ret;
 
@@ -969,13 +971,41 @@ int setup_purgatory_ppc64(struct kimage *image, const 
void *slave_code,
goto out;
}
 
+   /* Setup the stack top */
+   stack_buf = kexec_purgatory_get_symbol_addr(image, "stack_buf");
+   if (!stack_buf)
+   goto out;
+
+   val = (u64)stack_buf + KEXEC_PURGATORY_STACK_SIZE;
+   ret = kexec_purgatory_get_set_symbol(image, "stack", &val, sizeof(val),
+false);
+   if (ret)
+   goto out;
+
/* Setup the TOC pointer */
val = get_toc_ptr(&(image->purgatory_info));
ret = kexec_purgatory_get_set_symbol(image, "my_toc", &val, sizeof(val),
 false);
+   if (ret)
+   goto out;
+
+   /* Setup OPAL base & entry values */
+   dn = of_find_node_by_path("/ibm,opal");
+   if (dn) {
+   of_property_read_u64(dn, "opal-base-address", &val);
+   ret = kexec_purgatory_get_set_symbol(image, "opal_base", &val,
+sizeof(val), false);
+   if (ret)
+   goto out;
+
+   of_property_read_u64(dn, "opal-entry-address", &val);
+   ret = kexec_purgatory_get_set_symbol(image, "opal_entry", &val,
+sizeof(val), false);
+   }
 out:
if (ret)
pr_err("Failed to setup purgatory symbols");
+   of_node_put(dn);
return ret;
 }
 
diff --git a/arch/powerpc/purgatory/trampoline_64.S 
b/arch/powerpc/purgatory/trampoline_64.S
index b375843..1615dfc 100644
--- a/arch/powerpc/purgatory/trampoline_64.S
+++ b/arch/powerpc/purgatory/trampoline_64.S
@@ -9,6 +9,7 @@
  * Copyright (C) 2013, Anton Blanchard, IBM Corporation
  */
 
+#include 
 #include 
 
.machine ppc64
@@ -53,6 +54,8 @@ master:
 
ld  %r2,(my_toc - 0b)(%r18) /* setup toc */
 
+   ld  %r1,(stack - 0b)(%r18)  /* setup stack */
+
/* load device-tree address */
ld  %r3, (dt_offset - 0b)(%r18)
mr  %r16,%r3/* save dt address in reg16 */
@@ -63,6 +66,10 @@ master:
li  %r4,28
STWX_BE %r17,%r3,%r4/* Store my cpu as __be32 at byte 28 */
 1:
+   /* Load opal base and entry values in r8 & r9 respectively */
+   ld  %r8,(opal_base - 0b)(%r18)
+   ld  %r9,(opal_entry - 0b)(%r18)
+
/* load the kernel address */
ld  %r4,(kernel - 0b)(%r18)
 
@@ -110,6 +117,24 @@ my_toc:
.8byte  0x0
.size my_toc, . - my_toc
 
+   .balign 8
+   .globl stack
+stack:
+   .8byte  0x0
+   .size stack, . - stack
+
+   .balign 8
+   .globl opal_base
+opal_base:
+   .8byte  0x0
+   .size opal_base, . - opal_base
+
+   .balign 8
+   .globl opal_entry
+opal_entry:
+   .8byte  0x0
+   .size opal_entry, . - opal_entry
+
.data
.balign 8
 .globl purgatory_sha256_digest
@@ -122,3 +147,10 @@ purgatory_sha256_digest:
 purgatory_sha_regions:
 

[PATCH v4 10/12] ppc64/kexec_file: prepare elfcore header for crashing kernel

2020-07-20 Thread Hari Bathini
Prepare elf headers for the crashing kernel's core file using
crash_prepare_elf64_headers() and pass on this info to kdump
kernel by updating its command line with elfcorehdr parameter.
Also, add elfcorehdr location to reserve map to avoid it from
being stomped on while booting.

Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
---

v3 -> v4:
* Added a FIXME tag to indicate issue in adding opal/rtas regions to
  core image.
* Folded prepare_elf_headers() function into load_elfcorehdr_segment().

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* Tried merging adjacent memory ranges on hitting maximum ranges limit
  to reduce reallocations for memory ranges and also, minimize PT_LOAD
  segments for elfcore.
* Updated add_rtas_mem_range() & add_opal_mem_range() callsites based on
  the new prototype for these functions.


 arch/powerpc/include/asm/kexec.h  |6 +
 arch/powerpc/kexec/elf_64.c   |   12 +++
 arch/powerpc/kexec/file_load.c|   49 +++
 arch/powerpc/kexec/file_load_64.c |  165 +
 4 files changed, 232 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index c069f76..6f6317f 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -112,12 +112,18 @@ struct kimage_arch {
unsigned long backup_start;
void *backup_buf;
 
+   unsigned long elfcorehdr_addr;
+   unsigned long elf_headers_sz;
+   void *elf_headers;
+
 #ifdef CONFIG_IMA_KEXEC
phys_addr_t ima_buffer_addr;
size_t ima_buffer_size;
 #endif
 };
 
+char *setup_kdump_cmdline(struct kimage *image, char *cmdline,
+ unsigned long cmdline_len);
 int setup_purgatory(struct kimage *image, const void *slave_code,
const void *fdt, unsigned long kernel_load_addr,
unsigned long fdt_load_addr);
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 0ecd88f..be38f72 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -35,6 +35,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
void *fdt;
const void *slave_code;
struct elfhdr ehdr;
+   char *modified_cmdline = NULL;
struct kexec_elf_info elf_info;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
  .buf_max = ppc64_rma_size };
@@ -75,6 +76,16 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
pr_err("Failed to load kdump kernel segments\n");
goto out;
}
+
+   /* Setup cmdline for kdump kernel case */
+   modified_cmdline = setup_kdump_cmdline(image, cmdline,
+  cmdline_len);
+   if (!modified_cmdline) {
+   pr_err("Setting up cmdline for kdump kernel failed\n");
+   ret = -EINVAL;
+   goto out;
+   }
+   cmdline = modified_cmdline;
}
 
if (initrd != NULL) {
@@ -131,6 +142,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
pr_err("Error setting up the purgatory.\n");
 
 out:
+   kfree(modified_cmdline);
kexec_free_elf_info(&elf_info);
 
/* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
diff --git a/arch/powerpc/kexec/file_load.c b/arch/powerpc/kexec/file_load.c
index 38439ab..d52c097 100644
--- a/arch/powerpc/kexec/file_load.c
+++ b/arch/powerpc/kexec/file_load.c
@@ -18,11 +18,46 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define SLAVE_CODE_SIZE256 /* First 0x100 bytes */
 
 /**
+ * setup_kdump_cmdline - Prepend "elfcorehdr= " to command line
+ *   of kdump kernel for exporting the core.
+ * @image:   Kexec image
+ * @cmdline: Command line parameters to update.
+ * @cmdline_len: Length of the cmdline parameters.
+ *
+ * kdump segment must be setup before calling this function.
+ *
+ * Returns new cmdline buffer for kdump kernel on success, NULL otherwise.
+ */
+char *setup_kdump_cmdline(struct kimage *image, char *cmdline,
+ unsigned long cmdline_len)
+{
+   int elfcorehdr_strlen;
+   char *cmdline_ptr;
+
+   cmdline_ptr = kzalloc(COMMAND_LINE_SIZE, GFP_KERNEL);
+   if (!cmdline_ptr)
+   return NULL;
+
+   elfcorehdr_strlen = sprintf(cmdline_ptr, "elfcorehdr=0x%lx ",
+   image->arch.elfcorehdr_addr);
+
+   if (elfcorehdr_strlen + cmdline_len > COMMAND_LINE_SIZE) {
+   pr_err("Appending elfcorehdr= exceeds cmdline size\n");
+   kfree(cmdline_ptr);
+   return NULL;
+   }
+
+   memcpy(cmdline_ptr + elfcorehdr_strlen, cmdline, cmdline_len);
+   return c

[PATCH v4 09/12] ppc64/kexec_file: setup backup region for kdump kernel

2020-07-20 Thread Hari Bathini
Though kdump kernel boots from loaded address, the first 64K bytes
of it is copied down to real 0. So, setup a backup region to copy
the first 64K bytes of crashed kernel, in purgatory, before booting
into kdump kernel. Also, update reserve map with backup region and
crashed kernel's memory to avoid kdump kernel from accidentially
using that memory.

Reported-by: kernel test robot 
[lkp: In v1, purgatory() declaration was missing]
Signed-off-by: Hari Bathini 
---

v3 -> v4:
* Moved fdt_add_mem_rsv() for backup region under kdump flag, on Thiago's
  suggestion, as it is only relevant for kdump.

v2 -> v3:
* Dropped check for backup_start in trampoline_64.S as purgatory() takes
  care of it anyway.

v1 -> v2:
* Check if backup region is available before branching out. This is
  to keep `kexec -l -s` flow as before as much as possible. This would
  eventually change with more testing and addition of sha256 digest
  verification support.
* Fixed missing prototype for purgatory() as reported by lkp.
  lkp report for reference:
- https://lore.kernel.org/patchwork/patch/1264423/


 arch/powerpc/include/asm/crashdump-ppc64.h |   10 +++
 arch/powerpc/include/asm/kexec.h   |7 ++
 arch/powerpc/include/asm/purgatory.h   |   11 +++
 arch/powerpc/kexec/elf_64.c|9 +++
 arch/powerpc/kexec/file_load_64.c  |   95 +++-
 arch/powerpc/purgatory/Makefile|   28 
 arch/powerpc/purgatory/purgatory_64.c  |   36 +++
 arch/powerpc/purgatory/trampoline_64.S |   24 ++-
 8 files changed, 210 insertions(+), 10 deletions(-)
 create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
 create mode 100644 arch/powerpc/include/asm/purgatory.h
 create mode 100644 arch/powerpc/purgatory/purgatory_64.c

diff --git a/arch/powerpc/include/asm/crashdump-ppc64.h 
b/arch/powerpc/include/asm/crashdump-ppc64.h
new file mode 100644
index 000..7ba99ae
--- /dev/null
+++ b/arch/powerpc/include/asm/crashdump-ppc64.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_POWERPC_CRASHDUMP_PPC64_H
+#define _ASM_POWERPC_CRASHDUMP_PPC64_H
+
+/* Backup region - first 64K bytes of System RAM. */
+#define BACKUP_SRC_START   0
+#define BACKUP_SRC_END 0x
+#define BACKUP_SRC_SIZE(BACKUP_SRC_END - BACKUP_SRC_START + 1)
+
+#endif /* __ASM_POWERPC_CRASHDUMP_PPC64_H */
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 00988da..c069f76 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -109,6 +109,9 @@ extern const struct kexec_file_ops kexec_elf64_ops;
 struct kimage_arch {
struct crash_mem *exclude_ranges;
 
+   unsigned long backup_start;
+   void *backup_buf;
+
 #ifdef CONFIG_IMA_KEXEC
phys_addr_t ima_buffer_addr;
size_t ima_buffer_size;
@@ -124,6 +127,10 @@ int setup_new_fdt(const struct kimage *image, void *fdt,
 int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size);
 
 #ifdef CONFIG_PPC64
+struct kexec_buf;
+
+int load_crashdump_segments_ppc64(struct kimage *image,
+ struct kexec_buf *kbuf);
 int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
  const void *fdt, unsigned long kernel_load_addr,
  unsigned long fdt_load_addr);
diff --git a/arch/powerpc/include/asm/purgatory.h 
b/arch/powerpc/include/asm/purgatory.h
new file mode 100644
index 000..076d150
--- /dev/null
+++ b/arch/powerpc/include/asm/purgatory.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_POWERPC_PURGATORY_H
+#define _ASM_POWERPC_PURGATORY_H
+
+#ifndef __ASSEMBLY__
+#include 
+
+void purgatory(void);
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_POWERPC_PURGATORY_H */
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 64c15a5..0ecd88f 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -68,6 +68,15 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
 
pr_debug("Loaded purgatory at 0x%lx\n", pbuf.mem);
 
+   /* Setup additional segments needed for panic kernel */
+   if (image->type == KEXEC_TYPE_CRASH) {
+   ret = load_crashdump_segments_ppc64(image, &kbuf);
+   if (ret) {
+   pr_err("Failed to load kdump kernel segments\n");
+   goto out;
+   }
+   }
+
if (initrd != NULL) {
kbuf.buffer = initrd;
kbuf.bufsz = kbuf.memsz = initrd_len;
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 7f1f31c..41d748c 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -20,9 +20,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 struct umem_info {
uint64_t *buf; /* 

[PATCH v4 07/12] ppc64/kexec_file: add support to relocate purgatory

2020-07-20 Thread Hari Bathini
Right now purgatory implementation is only minimal. But if purgatory
code is to be enhanced to copy memory to the backup region and verify
sha256 digest, relocations may have to be applied to the purgatory.
So, add support to relocate purgatory in kexec_file_load system call
by setting up TOC pointer and applying RELA relocations as needed.

Reported-by: kernel test robot 
[lkp: In v1, 'struct mem_sym' was declared in parameter list]
Signed-off-by: Hari Bathini 
---

* Michael, can you share your opinion on the below:
- https://lore.kernel.org/patchwork/patch/1272027/
- My intention in cover note.

v3 -> v4:
* Updated error log message in get_toc_section() function.

v2 -> v3:
* Fixed get_toc_section() to return the section info that had relocations
  applied, to calculate the correct toc pointer.
* Fixed how relocation value is converted to relative while applying
  R_PPC64_REL64 & R_PPC64_REL32 relocations.

v1 -> v2:
* Fixed wrong use of 'struct mem_sym' in local_entry_offset() as
  reported by lkp. lkp report for reference:
- https://lore.kernel.org/patchwork/patch/1264421/


 arch/powerpc/kexec/file_load_64.c  |  337 
 arch/powerpc/purgatory/trampoline_64.S |7 +
 2 files changed, 344 insertions(+)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 71c1ba7..20e638d 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -692,6 +693,244 @@ static int update_usable_mem_fdt(void *fdt, struct 
crash_mem *usable_mem)
 }
 
 /**
+ * get_toc_section - Look for ".toc" symbol and return the corresponding 
section
+ *   in the purgatory.
+ * @pi:  Purgatory Info.
+ *
+ * Returns TOC section on success, NULL otherwise.
+ */
+static const Elf_Shdr *get_toc_section(const struct purgatory_info *pi)
+{
+   const Elf_Shdr *sechdrs;
+   const char *secstrings;
+   int i;
+
+   if (!pi->ehdr) {
+   pr_err("Purgatory's elf info not found!\n");
+   return NULL;
+   }
+
+   sechdrs = (void *)pi->ehdr + pi->ehdr->e_shoff;
+   secstrings = (void *)pi->ehdr + sechdrs[pi->ehdr->e_shstrndx].sh_offset;
+
+   for (i = 0; i < pi->ehdr->e_shnum; i++) {
+   if ((sechdrs[i].sh_size != 0) &&
+   (strcmp(secstrings + sechdrs[i].sh_name, ".toc") == 0)) {
+   /* Return the relocated ".toc" section */
+   return &(pi->sechdrs[i]);
+   }
+   }
+
+   return NULL;
+}
+
+/**
+ * get_toc_ptr - Get the TOC pointer (r2) of purgatory.
+ * @pi:  Purgatory Info.
+ *
+ * Returns r2 on success, 0 otherwise.
+ */
+static unsigned long get_toc_ptr(const struct purgatory_info *pi)
+{
+   unsigned long toc_ptr = 0;
+   const Elf_Shdr *sechdr;
+
+   sechdr = get_toc_section(pi);
+   if (!sechdr)
+   pr_err("Could not get the TOC section!\n");
+   else
+   toc_ptr = sechdr->sh_addr + 0x8000; /* 0x8000 into TOC */
+
+   pr_debug("TOC pointer (r2) is 0x%lx\n", toc_ptr);
+   return toc_ptr;
+}
+
+/* Helper functions to apply relocations */
+static int do_relative_toc(unsigned long val, uint16_t *loc,
+  unsigned long mask, int complain_signed)
+{
+   if (complain_signed && (val + 0x8000 > 0x)) {
+   pr_err("TOC16 relocation overflows (%lu)\n", val);
+   return -ENOEXEC;
+   }
+
+   if ((~mask & 0x) & val) {
+   pr_err("Bad TOC16 relocation (%lu)\n", val);
+   return -ENOEXEC;
+   }
+
+   *loc = (*loc & ~mask) | (val & mask);
+   return 0;
+}
+#ifdef PPC64_ELF_ABI_v2
+/* PowerPC64 specific values for the Elf64_Sym st_other field.  */
+#define STO_PPC64_LOCAL_BIT5
+#define STO_PPC64_LOCAL_MASK   (7 << STO_PPC64_LOCAL_BIT)
+#define PPC64_LOCAL_ENTRY_OFFSET(other)
\
+   (((1 << (((other) & STO_PPC64_LOCAL_MASK) >> STO_PPC64_LOCAL_BIT)) \
+>> 2) << 2)
+
+static unsigned int local_entry_offset(const Elf64_Sym *sym)
+{
+   /* If this symbol has a local entry point, use it. */
+   return PPC64_LOCAL_ENTRY_OFFSET(sym->st_other);
+}
+#else
+static unsigned int local_entry_offset(const Elf64_Sym *sym)
+{
+   return 0;
+}
+#endif
+
+/**
+ * __kexec_do_relocs - Apply relocations based on relocation type.
+ * @my_r2: TOC pointer.
+ * @sym:   Symbol to relocate.
+ * @r_type:Relocation type.
+ * @loc:   Location to modify.
+ * @val:   Relocated symbol value.
+ * @addr:  Final location after relocation.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int __kexec_do_relocs(unsigned long my_r2, const Elf_Sym *sym,
+int r_type, void *loc, unsigned long v

[PATCH v4 04/12] ppc64/kexec_file: avoid stomping memory used by special regions

2020-07-20 Thread Hari Bathini
crashkernel region could have an overlap with special memory regions
like  opal, rtas, tce-table & such. These regions are referred to as
exclude memory ranges. Setup this ranges during image probe in order
to avoid them while finding the buffer for different kdump segments.
Override arch_kexec_locate_mem_hole() to locate a memory hole taking
these ranges into account.

Signed-off-by: Hari Bathini 
---

v3 -> v4:
* Dropped KDUMP_BUF_MIN & KDUMP_BUF_MAX macros and fixed off-by-one error
  in arch_locate_mem_hole() helper routines.

v2 -> v3:
* If there are no exclude ranges, the right thing to do is fallbacking
  back to default kexec_locate_mem_hole() implementation instead of
  returning 0. Fixed that.

v1 -> v2:
* Did arch_kexec_locate_mem_hole() override to handle special regions.
* Ensured holes in the memory are accounted for while locating mem hole.
* Updated add_rtas_mem_range() & add_opal_mem_range() callsites based on
  the new prototype for these functions.


 arch/powerpc/include/asm/kexec.h  |7 +
 arch/powerpc/kexec/elf_64.c   |8 +
 arch/powerpc/kexec/file_load_64.c |  337 +
 3 files changed, 348 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index ac8fd48..835dc92 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -100,14 +100,16 @@ void relocate_new_kernel(unsigned long indirection_page, 
unsigned long reboot_co
 #ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_elf64_ops;
 
-#ifdef CONFIG_IMA_KEXEC
 #define ARCH_HAS_KIMAGE_ARCH
 
 struct kimage_arch {
+   struct crash_mem *exclude_ranges;
+
+#ifdef CONFIG_IMA_KEXEC
phys_addr_t ima_buffer_addr;
size_t ima_buffer_size;
-};
 #endif
+};
 
 int setup_purgatory(struct kimage *image, const void *slave_code,
const void *fdt, unsigned long kernel_load_addr,
@@ -125,6 +127,7 @@ int setup_new_fdt_ppc64(const struct kimage *image, void 
*fdt,
unsigned long initrd_load_addr,
unsigned long initrd_len, const char *cmdline);
 #endif /* CONFIG_PPC64 */
+
 #endif /* CONFIG_KEXEC_FILE */
 
 #else /* !CONFIG_KEXEC_CORE */
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 23ad04c..64c15a5 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -46,6 +46,14 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
if (ret)
goto out;
 
+   if (image->type == KEXEC_TYPE_CRASH) {
+   /* min & max buffer values for kdump case */
+   kbuf.buf_min = pbuf.buf_min = crashk_res.start;
+   kbuf.buf_max = pbuf.buf_max =
+   ((crashk_res.end < ppc64_rma_size) ?
+crashk_res.end : (ppc64_rma_size - 1));
+   }
+
ret = kexec_elf_load(image, &ehdr, &elf_info, &kbuf, &kernel_load_addr);
if (ret)
goto out;
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 41fe8b6..2df6f42 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 const struct kexec_file_ops * const kexec_file_loaders[] = {
&kexec_elf64_ops,
@@ -24,6 +26,254 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
 };
 
 /**
+ * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
+ * regions like opal/rtas, tce-table, initrd,
+ * kernel, htab which should be avoided while
+ * setting up kexec load segments.
+ * @mem_ranges:Range list to add the memory ranges to.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
+{
+   int ret;
+
+   ret = add_tce_mem_ranges(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_initrd_mem_range(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_htab_mem_range(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_kernel_mem_range(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_rtas_mem_range(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_opal_mem_range(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_reserved_ranges(mem_ranges);
+   if (ret)
+   goto out;
+
+   /* exclude memory ranges should be sorted for easy lookup */
+   sort_memory_ranges(*mem_ranges, true);
+out:
+   if (ret)
+   pr_err("Failed to setup exclude memory ranges\n");
+   return ret;
+}
+
+/**
+ * __locate_mem_hole_top_down - Looks top down for a large enough memory hole
+ *

[PATCH v4 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel

2020-07-20 Thread Hari Bathini
Kdump kernel, used for capturing the kernel core image, is supposed
to use only specific memory regions to avoid corrupting the image to
be captured. The regions are crashkernel range - the memory reserved
explicitly for kdump kernel, memory used for the tce-table, the OPAL
region and RTAS region as applicable. Restrict kdump kernel memory
to use only these regions by setting up usable-memory DT property.
Also, tell the kdump kernel to run at the loaded address by setting
the magic word at 0x5c.

Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
---

v3 -> v4:
* Updated get_node_path() to be an iterative function instead of a
  recursive one.
* Added comment explaining why low memory is added to kdump kernel's
  usable memory ranges though it doesn't fall in crashkernel region.
* For correctness, added fdt_add_mem_rsv() for the low memory being
  added to kdump kernel's usable memory ranges.
* Fixed prop pointer update in add_usable_mem_property() and changed
  duple to tuple as suggested by Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* Fixed off-by-one error while setting up usable-memory properties.
* Updated add_rtas_mem_range() & add_opal_mem_range() callsites based on
  the new prototype for these functions.


 arch/powerpc/kexec/file_load_64.c |  472 +
 1 file changed, 471 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 2df6f42..71c1ba7 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -17,9 +17,21 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 
+struct umem_info {
+   uint64_t *buf; /* data buffer for usable-memory property */
+   uint32_t idx;  /* current index */
+   uint32_t size; /* size allocated for the data buffer */
+
+   /* usable memory ranges to look up */
+   const struct crash_mem *umrngs;
+};
+
 const struct kexec_file_ops * const kexec_file_loaders[] = {
&kexec_elf64_ops,
NULL
@@ -75,6 +87,42 @@ static int get_exclude_memory_ranges(struct crash_mem 
**mem_ranges)
 }
 
 /**
+ * get_usable_memory_ranges - Get usable memory ranges. This list includes
+ *regions like crashkernel, opal/rtas & tce-table,
+ *that kdump kernel could use.
+ * @mem_ranges:   Range list to add the memory ranges to.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int get_usable_memory_ranges(struct crash_mem **mem_ranges)
+{
+   int ret;
+
+   /*
+* prom code doesn't take kindly to missing low memory. So, add
+* [0, crashk_res.end] instead of [crashk_res.start, crashk_res.end]
+* to keep it happy.
+*/
+   ret = add_mem_range(mem_ranges, 0, crashk_res.end + 1);
+   if (ret)
+   goto out;
+
+   ret = add_rtas_mem_range(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_opal_mem_range(mem_ranges);
+   if (ret)
+   goto out;
+
+   ret = add_tce_mem_ranges(mem_ranges);
+out:
+   if (ret)
+   pr_err("Failed to setup usable memory ranges\n");
+   return ret;
+}
+
+/**
  * __locate_mem_hole_top_down - Looks top down for a large enough memory hole
  *  in the memory regions between buf_min & buf_max
  *  for the buffer. If found, sets kbuf->mem.
@@ -274,6 +322,376 @@ static int locate_mem_hole_bottom_up_ppc64(struct 
kexec_buf *kbuf,
 }
 
 /**
+ * check_realloc_usable_mem - Reallocate buffer if it can't accommodate entries
+ * @um_info:  Usable memory buffer and ranges info.
+ * @cnt:  No. of entries to accommodate.
+ *
+ * Frees up the old buffer if memory reallocation fails.
+ *
+ * Returns buffer on success, NULL on error.
+ */
+static uint64_t *check_realloc_usable_mem(struct umem_info *um_info, int cnt)
+{
+   void *tbuf;
+
+   if (um_info->size >=
+   ((um_info->idx + cnt) * sizeof(*(um_info->buf
+   return um_info->buf;
+
+   um_info->size += MEM_RANGE_CHUNK_SZ;
+   tbuf = krealloc(um_info->buf, um_info->size, GFP_KERNEL);
+   if (!tbuf) {
+   um_info->size -= MEM_RANGE_CHUNK_SZ;
+   return NULL;
+   }
+
+   memset(tbuf + um_info->idx, 0, MEM_RANGE_CHUNK_SZ);
+   return tbuf;
+}
+
+/**
+ * add_usable_mem - Add the usable memory ranges within the given memory range
+ *  to the buffer
+ * @um_info:Usable memory buffer and ranges info.
+ * @base:   Base address of memory range to look for.
+ * @end:End address of memory range to look for.
+ * @cnt:No. of usable memory ranges added to buffer.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int add_usable_mem(struct umem_info *um_info, uint64_t base,
+  

[PATCH v4 02/12] powerpc/kexec_file: mark PPC64 specific code

2020-07-20 Thread Hari Bathini
Some of the kexec_file_load code isn't PPC64 specific. Move PPC64
specific code from kexec/file_load.c to kexec/file_load_64.c. Also,
rename purgatory/trampoline.S to purgatory/trampoline_64.S in the
same spirit. No functional changes.

Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
Reviewed-by: Laurent Dufour 
Reviewed-by: Thiago Jung Bauermann 
---

v3 -> v4:
* Moved common code back to set_new_fdt() from setup_new_fdt_ppc64()
  function. Added Reviewed-by tags from Laurent & Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* No changes.


 arch/powerpc/include/asm/kexec.h   |9 ++
 arch/powerpc/kexec/Makefile|2 -
 arch/powerpc/kexec/elf_64.c|7 +-
 arch/powerpc/kexec/file_load.c |   19 +
 arch/powerpc/kexec/file_load_64.c  |   87 
 arch/powerpc/purgatory/Makefile|4 +
 arch/powerpc/purgatory/trampoline.S|  117 
 arch/powerpc/purgatory/trampoline_64.S |  117 
 8 files changed, 222 insertions(+), 140 deletions(-)
 create mode 100644 arch/powerpc/kexec/file_load_64.c
 delete mode 100644 arch/powerpc/purgatory/trampoline.S
 create mode 100644 arch/powerpc/purgatory/trampoline_64.S

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index c684768..ac8fd48 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -116,6 +116,15 @@ int setup_new_fdt(const struct kimage *image, void *fdt,
  unsigned long initrd_load_addr, unsigned long initrd_len,
  const char *cmdline);
 int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size);
+
+#ifdef CONFIG_PPC64
+int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
+ const void *fdt, unsigned long kernel_load_addr,
+ unsigned long fdt_load_addr);
+int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
+   unsigned long initrd_load_addr,
+   unsigned long initrd_len, const char *cmdline);
+#endif /* CONFIG_PPC64 */
 #endif /* CONFIG_KEXEC_FILE */
 
 #else /* !CONFIG_KEXEC_CORE */
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 86380c6..67c3553 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -7,7 +7,7 @@ obj-y   += core.o crash.o core_$(BITS).o
 
 obj-$(CONFIG_PPC32)+= relocate_32.o
 
-obj-$(CONFIG_KEXEC_FILE)   += file_load.o elf_$(BITS).o
+obj-$(CONFIG_KEXEC_FILE)   += file_load.o file_load_$(BITS).o elf_$(BITS).o
 
 ifdef CONFIG_HAVE_IMA_KEXEC
 ifdef CONFIG_IMA
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 3072fd6..23ad04c 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -88,7 +88,8 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
goto out;
}
 
-   ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline);
+   ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr,
+ initrd_len, cmdline);
if (ret)
goto out;
 
@@ -107,8 +108,8 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
pr_debug("Loaded device tree at 0x%lx\n", fdt_load_addr);
 
slave_code = elf_info.buffer + elf_info.proghdrs[0].p_offset;
-   ret = setup_purgatory(image, slave_code, fdt, kernel_load_addr,
- fdt_load_addr);
+   ret = setup_purgatory_ppc64(image, slave_code, fdt, kernel_load_addr,
+   fdt_load_addr);
if (ret)
pr_err("Error setting up the purgatory.\n");
 
diff --git a/arch/powerpc/kexec/file_load.c b/arch/powerpc/kexec/file_load.c
index 143c917..38439ab 100644
--- a/arch/powerpc/kexec/file_load.c
+++ b/arch/powerpc/kexec/file_load.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * ppc64 code to implement the kexec_file_load syscall
+ * powerpc code to implement the kexec_file_load syscall
  *
  * Copyright (C) 2004  Adam Litke (a...@us.ibm.com)
  * Copyright (C) 2004  IBM Corp.
@@ -20,22 +20,7 @@
 #include 
 #include 
 
-#define SLAVE_CODE_SIZE256
-
-const struct kexec_file_ops * const kexec_file_loaders[] = {
-   &kexec_elf64_ops,
-   NULL
-};
-
-int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
- unsigned long buf_len)
-{
-   /* We don't support crash kernels yet. */
-   if (image->type == KEXEC_TYPE_CRASH)
-   return -EOPNOTSUPP;
-
-   return kexec_image_probe_default(image, buf, buf_len);
-}
+#define SLAVE_CODE_SIZE256 /* First 0x100 bytes */
 
 /**
  * setup_purgatory - initialize the purgatory's global variables
diff --git a/arch/powerpc/kexec/file_load_

[PATCH v4 05/12] powerpc/drmem: make lmb walk a bit more flexible

2020-07-20 Thread Hari Bathini
Currently, numa & prom are the users of drmem lmb walk code. Loading
kdump with kexec_file also needs to walk the drmem LMBs to setup the
usable memory ranges for kdump kernel. But there are couple of issues
in using the code as is. One, walk_drmem_lmb() code is built into the
.init section currently, while kexec_file needs it later. Two, there
is no scope to pass data to the callback function for processing and/
or erroring out on certain conditions.

Fix that by, moving drmem LMB walk code out of .init section, adding
scope to pass data to the callback function and bailing out when
an error is encountered in the callback function.

Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
Reviewed-by: Thiago Jung Bauermann 
---

v3 -> v4:
* Unchanged. Added Reviewed-by tag from Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* No changes.


 arch/powerpc/include/asm/drmem.h |9 ++--
 arch/powerpc/kernel/prom.c   |   13 +++---
 arch/powerpc/mm/drmem.c  |   87 +-
 arch/powerpc/mm/numa.c   |   13 +++---
 4 files changed, 78 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 414d209..17ccc64 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -90,13 +90,14 @@ static inline bool drmem_lmb_reserved(struct drmem_lmb *lmb)
 }
 
 u64 drmem_lmb_memory_max(void);
-void __init walk_drmem_lmbs(struct device_node *dn,
-   void (*func)(struct drmem_lmb *, const __be32 **));
+int walk_drmem_lmbs(struct device_node *dn, void *data,
+   int (*func)(struct drmem_lmb *, const __be32 **, void *));
 int drmem_update_dt(void);
 
 #ifdef CONFIG_PPC_PSERIES
-void __init walk_drmem_lmbs_early(unsigned long node,
-   void (*func)(struct drmem_lmb *, const __be32 **));
+int __init
+walk_drmem_lmbs_early(unsigned long node, void *data,
+ int (*func)(struct drmem_lmb *, const __be32 **, void *));
 #endif
 
 static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9cc49f2..7df78de 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -468,8 +468,9 @@ static bool validate_mem_limit(u64 base, u64 *size)
  * This contains a list of memory blocks along with NUMA affinity
  * information.
  */
-static void __init early_init_drmem_lmb(struct drmem_lmb *lmb,
-   const __be32 **usm)
+static int  __init early_init_drmem_lmb(struct drmem_lmb *lmb,
+   const __be32 **usm,
+   void *data)
 {
u64 base, size;
int is_kexec_kdump = 0, rngs;
@@ -484,7 +485,7 @@ static void __init early_init_drmem_lmb(struct drmem_lmb 
*lmb,
 */
if ((lmb->flags & DRCONF_MEM_RESERVED) ||
!(lmb->flags & DRCONF_MEM_ASSIGNED))
-   return;
+   return 0;
 
if (*usm)
is_kexec_kdump = 1;
@@ -499,7 +500,7 @@ static void __init early_init_drmem_lmb(struct drmem_lmb 
*lmb,
 */
rngs = dt_mem_next_cell(dt_root_size_cells, usm);
if (!rngs) /* there are no (base, size) duple */
-   return;
+   return 0;
}
 
do {
@@ -524,6 +525,8 @@ static void __init early_init_drmem_lmb(struct drmem_lmb 
*lmb,
if (lmb->flags & DRCONF_MEM_HOTREMOVABLE)
memblock_mark_hotplug(base, size);
} while (--rngs);
+
+   return 0;
 }
 #endif /* CONFIG_PPC_PSERIES */
 
@@ -534,7 +537,7 @@ static int __init early_init_dt_scan_memory_ppc(unsigned 
long node,
 #ifdef CONFIG_PPC_PSERIES
if (depth == 1 &&
strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) {
-   walk_drmem_lmbs_early(node, early_init_drmem_lmb);
+   walk_drmem_lmbs_early(node, NULL, early_init_drmem_lmb);
return 0;
}
 #endif
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 59327ce..b2eeea3 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -14,6 +14,8 @@
 #include 
 #include 
 
+static int n_root_addr_cells, n_root_size_cells;
+
 static struct drmem_lmb_info __drmem_info;
 struct drmem_lmb_info *drmem_info = &__drmem_info;
 
@@ -189,12 +191,13 @@ int drmem_update_dt(void)
return rc;
 }
 
-static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
+static void read_drconf_v1_cell(struct drmem_lmb *lmb,
   const __be32 **prop)
 {
const __be32 *p = *prop;
 
-   lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, &p);
+   lmb->base_addr = of_read_number(p, n_root_addr_cells);
+   p += n_root_addr_cells;
lmb->drc_index = of_read_number(p++, 1);
 

[PATCH v4 00/12] ppc64: enable kdump support for kexec_file_load syscall

2020-07-20 Thread Hari Bathini
This patch series enables kdump support for kexec_file_load system
call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools
code but heavily modified for kernel consumption.

The first patch adds a weak arch_kexec_locate_mem_hole() function to
override locate memory hole logic suiting arch needs. There are some
special regions in ppc64 which should be avoided while loading buffer
& there are multiple callers to kexec_add_buffer making it complicated
to maintain range sanity and using generic lookup at the same time.

The second patch marks ppc64 specific code within arch/powerpc/kexec
and arch/powerpc/purgatory to make the subsequent code changes easy
to understand.

The next patch adds helper function to setup different memory ranges
needed for loading kdump kernel, booting into it and exporting the
crashing kernel's elfcore.

The fourth patch overrides arch_kexec_locate_mem_hole() function to
locate memory hole for kdump segments by accounting for the special
memory regions, referred to as excluded memory ranges, and sets
kbuf->mem when a suitable memory region is found.

The fifth patch moves walk_drmem_lmbs() out of .init section with
a few changes to reuse it for setting up kdump kernel's usable memory
ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs
and set linux,drconf-usable-memory & linux,usable-memory properties
in order to restrict kdump kernel's memory usage.

The seventh patch adds relocation support for the purgatory. Patch 8
helps setup the stack for the purgatory. The next patch setups up
backup region as a segment while loading kdump kernel and teaches
purgatory to copy it from source to destination.

Patch 10 builds the elfcore header for the running kernel & passes
the info to kdump kernel via "elfcorehdr=" parameter to export as
/proc/vmcore file. The next patch sets up the memory reserve map
for the kexec kernel and also claims kdump support for kdump as
all the necessary changes are added.

The last patch fixes a lookup issue for `kexec -l -s` case when
memory is reserved for crashkernel.

There is scope to improve purgatory to print messages, verify sha256,
move code common across archs - like arch_kexec_apply_relocations_add
and sha256 digest verification, build purgatory as position independent
code & other Makefile improvements in purgatory which can be dealt with
in a separate patch series as a follow-up.

Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER
boxes, one with secureboot enabled and a simulator.

v3 -> v4:
* Updated get_node_path() to be an iterative function instead of a
  recursive one.
* Added comment explaining why low memory is added to kdump kernel's
  usable memory ranges though it doesn't fall in crashkernel region.
* Fixed stack_buf to be quadword aligned in accordance with ABI.
* Added missing of_node_put() in setup_purgatory_ppc64().
* Added a FIXME tag to indicate issue in adding opal/rtas regions to
  core image.

v2 -> v3:
* Fixed TOC pointer calculation for purgatory by using section info
  that has relocations applied.
* Fixed arch_kexec_locate_mem_hole() function to fallback to generic
  kexec_locate_mem_hole() lookup if exclude ranges list is empty.
* Dropped check for backup_start in trampoline_64.S as purgatory()
  function takes care of it anyway.

v1 -> v2:
* Introduced arch_kexec_locate_mem_hole() for override and dropped
  weak arch_kexec_add_buffer().
* Addressed warnings reported by lkp.
* Added patch to address kexec load issue when memory is reserved
  for crashkernel.
* Used the appropriate license header for the new files added.
* Added an option to merge ranges to minimize reallocations while
  adding memory ranges.
* Dropped within_crashkernel parameter for add_opal_mem_range() &
  add_rtas_mem_range() functions as it is not really needed.

---

Hari Bathini (12):
  kexec_file: allow archs to handle special regions while locating memory 
hole
  powerpc/kexec_file: mark PPC64 specific code
  powerpc/kexec_file: add helper functions for getting memory ranges
  ppc64/kexec_file: avoid stomping memory used by special regions
  powerpc/drmem: make lmb walk a bit more flexible
  ppc64/kexec_file: restrict memory usage of kdump kernel
  ppc64/kexec_file: add support to relocate purgatory
  ppc64/kexec_file: setup the stack for purgatory
  ppc64/kexec_file: setup backup region for kdump kernel
  ppc64/kexec_file: prepare elfcore header for crashing kernel
  ppc64/kexec_file: add appropriate regions for memory reserve map
  ppc64/kexec_file: fix kexec load failure with lack of memory hole


 arch/powerpc/include/asm/crashdump-ppc64.h |   10 
 arch/powerpc/include/asm/drmem.h   |9 
 arch/powerpc/include/asm/kexec.h   |   33 +
 arch/powerpc/include/asm/kexec_ranges.h|   25 
 arch/powerpc/include/asm/purgatory.h   |   11 
 arch/powerpc/kernel/prom.c |   13 
 arch/powerpc/kexec/Makefile|2 
 

[PATCH v4 03/12] powerpc/kexec_file: add helper functions for getting memory ranges

2020-07-20 Thread Hari Bathini
In kexec case, the kernel to be loaded uses the same memory layout as
the running kernel. So, passing on the DT of the running kernel would
be good enough.

But in case of kdump, different memory ranges are needed to manage
loading the kdump kernel, booting into it and exporting the elfcore
of the crashing kernel. The ranges are exclude memory ranges, usable
memory ranges, reserved memory ranges and crash memory ranges.

Exclude memory ranges specify the list of memory ranges to avoid while
loading kdump segments. Usable memory ranges list the memory ranges
that could be used for booting kdump kernel. Reserved memory ranges
list the memory regions for the loading kernel's reserve map. Crash
memory ranges list the memory ranges to be exported as the crashing
kernel's elfcore.

Add helper functions for setting up the above mentioned memory ranges.
This helpers facilitate in understanding the subsequent changes better
and make it easy to setup the different memory ranges listed above, as
and when appropriate.

Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
---

v3 -> v4:
* Unchanged. Added Reviewed-by tag from Thiago.

v2 -> v3:
* Unchanged. Added Acked-by & Tested-by tags from Dave & Pingfan.

v1 -> v2:
* Introduced arch_kexec_locate_mem_hole() for override and dropped
  weak arch_kexec_add_buffer().
* Dropped __weak identifier for arch overridable functions.
* Fixed the missing declaration for arch_kimage_file_post_load_cleanup()
  reported by lkp. lkp report for reference:
- https://lore.kernel.org/patchwork/patch/1264418/


 arch/powerpc/include/asm/kexec_ranges.h |   25 ++
 arch/powerpc/kexec/Makefile |2 
 arch/powerpc/kexec/ranges.c |  410 +++
 3 files changed, 436 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/kexec_ranges.h
 create mode 100644 arch/powerpc/kexec/ranges.c

diff --git a/arch/powerpc/include/asm/kexec_ranges.h 
b/arch/powerpc/include/asm/kexec_ranges.h
new file mode 100644
index 000..78f3111
--- /dev/null
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_POWERPC_KEXEC_RANGES_H
+#define _ASM_POWERPC_KEXEC_RANGES_H
+
+#define MEM_RANGE_CHUNK_SZ 2048/* Memory ranges size chunk */
+
+struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
+int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
+int add_tce_mem_ranges(struct crash_mem **mem_ranges);
+int add_initrd_mem_range(struct crash_mem **mem_ranges);
+#ifdef CONFIG_PPC_BOOK3S_64
+int add_htab_mem_range(struct crash_mem **mem_ranges);
+#else
+static inline int add_htab_mem_range(struct crash_mem **mem_ranges)
+{
+   return 0;
+}
+#endif
+int add_kernel_mem_range(struct crash_mem **mem_ranges);
+int add_rtas_mem_range(struct crash_mem **mem_ranges);
+int add_opal_mem_range(struct crash_mem **mem_ranges);
+int add_reserved_ranges(struct crash_mem **mem_ranges);
+void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
+
+#endif /* _ASM_POWERPC_KEXEC_RANGES_H */
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 67c3553..4aff684 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -7,7 +7,7 @@ obj-y   += core.o crash.o core_$(BITS).o
 
 obj-$(CONFIG_PPC32)+= relocate_32.o
 
-obj-$(CONFIG_KEXEC_FILE)   += file_load.o file_load_$(BITS).o elf_$(BITS).o
+obj-$(CONFIG_KEXEC_FILE)   += file_load.o ranges.o file_load_$(BITS).o 
elf_$(BITS).o
 
 ifdef CONFIG_HAVE_IMA_KEXEC
 ifdef CONFIG_IMA
diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
new file mode 100644
index 000..713ce54
--- /dev/null
+++ b/arch/powerpc/kexec/ranges.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * powerpc code to implement the kexec_file_load syscall
+ *
+ * Copyright (C) 2004  Adam Litke (a...@us.ibm.com)
+ * Copyright (C) 2004  IBM Corp.
+ * Copyright (C) 2004,2005  Milton D Miller II, IBM Corporation
+ * Copyright (C) 2005  R Sharada (shar...@in.ibm.com)
+ * Copyright (C) 2006  Mohan Kumar M (mo...@in.ibm.com)
+ * Copyright (C) 2020  IBM Corporation
+ *
+ * Based on kexec-tools' kexec-ppc64.c, fs2dt.c.
+ * Heavily modified for the kernel by
+ * Hari Bathini .
+ */
+
+#undef DEBUG
+#define pr_fmt(fmt) "kexec ranges: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * get_max_nr_ranges - Get the max no. of ranges crash_mem structure
+ * could hold, given the size allocated for it.
+ * @size:  Allocation size of crash_mem structure.
+ *
+ * Returns the maximum no. of ranges.
+ */
+static inline unsigned int get_max_nr_ranges(size_t size)
+{
+   return ((size - sizeof(struct crash_mem)) /
+   sizeof(struct crash_mem_range));
+}
+
+/**
+ * get_mem_rngs_size - Get the allocated size of mrngs based on
+ * max_nr_ranges and

[PATCH v4 01/12] kexec_file: allow archs to handle special regions while locating memory hole

2020-07-20 Thread Hari Bathini
Some architectures may have special memory regions, within the given
memory range, which can't be used for the buffer in a kexec segment.
Implement weak arch_kexec_locate_mem_hole() definition which arch code
may override, to take care of special regions, while trying to locate
a memory hole.

Also, add the missing declarations for arch overridable functions and
and drop the __weak descriptors in the declarations to avoid non-weak
definitions from becoming weak.

Reported-by: kernel test robot 
[lkp: In v1, arch_kimage_file_post_load_cleanup() declaration was missing]
Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
Acked-by: Dave Young 
Reviewed-by: Thiago Jung Bauermann 
---

v3 -> v4:
* Unchanged. Added Reviewed-by tag from Thiago.

v2 -> v3:
* Unchanged. Added Acked-by & Tested-by tags from Dave & Pingfan.

v1 -> v2:
* Introduced arch_kexec_locate_mem_hole() for override and dropped
  weak arch_kexec_add_buffer().
* Dropped __weak identifier for arch overridable functions.
* Fixed the missing declaration for arch_kimage_file_post_load_cleanup()
  reported by lkp. lkp report for reference:
- https://lore.kernel.org/patchwork/patch/1264418/


 include/linux/kexec.h |   29 ++---
 kernel/kexec_file.c   |   16 ++--
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index ea67910..9e93bef 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -183,17 +183,24 @@ int kexec_purgatory_get_set_symbol(struct kimage *image, 
const char *name,
   bool get_value);
 void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name);
 
-int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
-unsigned long buf_len);
-void * __weak arch_kexec_kernel_image_load(struct kimage *image);
-int __weak arch_kexec_apply_relocations_add(struct purgatory_info *pi,
-   Elf_Shdr *section,
-   const Elf_Shdr *relsec,
-   const Elf_Shdr *symtab);
-int __weak arch_kexec_apply_relocations(struct purgatory_info *pi,
-   Elf_Shdr *section,
-   const Elf_Shdr *relsec,
-   const Elf_Shdr *symtab);
+/* Architectures may override the below functions */
+int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
+ unsigned long buf_len);
+void *arch_kexec_kernel_image_load(struct kimage *image);
+int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
+Elf_Shdr *section,
+const Elf_Shdr *relsec,
+const Elf_Shdr *symtab);
+int arch_kexec_apply_relocations(struct purgatory_info *pi,
+Elf_Shdr *section,
+const Elf_Shdr *relsec,
+const Elf_Shdr *symtab);
+int arch_kimage_file_post_load_cleanup(struct kimage *image);
+#ifdef CONFIG_KEXEC_SIG
+int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
+unsigned long buf_len);
+#endif
+int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);
 
 extern int kexec_add_buffer(struct kexec_buf *kbuf);
 int kexec_locate_mem_hole(struct kexec_buf *kbuf);
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 09cc78d..e89912d 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -636,6 +636,19 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
 }
 
 /**
+ * arch_kexec_locate_mem_hole - Find free memory to place the segments.
+ * @kbuf:   Parameters for the memory search.
+ *
+ * On success, kbuf->mem will have the start address of the memory region 
found.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
+{
+   return kexec_locate_mem_hole(kbuf);
+}
+
+/**
  * kexec_add_buffer - place a buffer in a kexec segment
  * @kbuf:  Buffer contents and memory parameters.
  *
@@ -647,7 +660,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
  */
 int kexec_add_buffer(struct kexec_buf *kbuf)
 {
-
struct kexec_segment *ksegment;
int ret;
 
@@ -675,7 +687,7 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
 
/* Walk the RAM ranges and allocate a suitable range for the buffer */
-   ret = kexec_locate_mem_hole(kbuf);
+   ret = arch_kexec_locate_mem_hole(kbuf);
if (ret)
return ret;
 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread Marco Elver
On Mon, 20 Jul 2020 at 12:20, John Ogness  wrote:
>
> On 2020-07-18, Marco Elver  wrote:
> > It seems this causes a regression observed at least with newline-only
> > printks.
> > [...]
> > -- >8 --
> >
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -1039,6 +1039,10 @@ asmlinkage __visible void __init start_kernel(void)
> >   sfi_init_late();
> >   kcsan_init();
> >
> > + pr_info("EXPECT BLANK LINE --vv\n");
> > + pr_info("\n");
> > + pr_info("EXPECT BLANK LINE --^^\n");
> > +
> >   /* Do the rest non-__init'ed, we're now alive */
> >   arch_call_rest_init();
>
> Thanks for the example. This is an unintentional regression in the
> series. I will submit a patch to fix this.
>
> Note that this regression does not exist when the followup series [0]
> (reimplementing LOG_CONT) is applied. All the more reason that the 1st
> series should be fixed before pushing the 2nd series to linux-next.

Great, thank you for clarifying! :-)

-- Marco

> John Ogness
>
> [0] https://lkml.kernel.org/r/20200717234818.8622-1-john.ogn...@linutronix.de

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread John Ogness
On 2020-07-18, Marco Elver  wrote:
> It seems this causes a regression observed at least with newline-only
> printks.
> [...]
> -- >8 --
>
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1039,6 +1039,10 @@ asmlinkage __visible void __init start_kernel(void)
>   sfi_init_late();
>   kcsan_init();
>  
> + pr_info("EXPECT BLANK LINE --vv\n");
> + pr_info("\n");
> + pr_info("EXPECT BLANK LINE --^^\n");
> +
>   /* Do the rest non-__init'ed, we're now alive */
>   arch_call_rest_init();

Thanks for the example. This is an unintentional regression in the
series. I will submit a patch to fix this.

Note that this regression does not exist when the followup series [0]
(reimplementing LOG_CONT) is applied. All the more reason that the 1st
series should be fixed before pushing the 2nd series to linux-next.

John Ogness

[0] https://lkml.kernel.org/r/20200717234818.8622-1-john.ogn...@linutronix.de

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread Dmitry Vyukov
On Mon, Jul 20, 2020 at 11:41 AM Marco Elver  wrote:
>
> On Mon, 20 Jul 2020 at 10:41, Sergey Senozhatsky
>  wrote:
> >
> > On (20/07/20 08:43), Marco Elver wrote:
> > > On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote:
> > >
> > > As I said, a number of debugging tools use them to format reports to be
> > > more readable (visually separate title and report body, and separate
> > > parts of the report). Also, such reports are often parsed by CI systems,
> > > and by changing the reports, these CI systems may break. But those are
> > > just the usecases I'm acutely aware of
> >
> > Can you give example of such CI systems? // that's a real question //
>
> None of ours should break; I agree the CI system is brittle if it
> relies on newlines. Parsed and displayed reports are changing, however
> -- what irks me is now all the reports sent to the LKML look ugly.
>
> Some random KASAN reports (just compare formatting):
> next (ugly): 
> https://lore.kernel.org/lkml/c87b7305aadb6...@google.com/
> mainline (normal):
> https://lore.kernel.org/lkml/f4ef6a05aa92e...@google.com/
>
> The same problem exists with lockdep reports, KCSAN reports, ... If
> newline-printks to insert blank lines are now banned, what are we to
> do? Send dozens of patches to switch everyone to printk(" \n")? Or
> some better suggestion? I cannot yet see how that is an improvement.
> (And if the behaviour is not reverted, please document the new
> behaviour.)
>
> That also doesn't yet address the ~400 other newline-printk users, and
> somebody needs to do the due diligence to understand if it's just a
> flush, or an intentional blank line.


Empty lines improve readability of long crash reports significantly.
New lines in sanitizer reports originated in Go race reports 7 years
ago and then spread to user-space ASAN/MSAN/TSAN b/c that was an
improvement and then were specifically added to kernel sanitizers.
This is even more important now that we have up to 5 stacks in KASAN
reports.
Please keep them.

Also having lots of printk("\n") sprinkled in kernel code and turning
them into no-op separately does not look like the right solution.
These printk("\n") are confusing and add clutter. A better solution
would be to remove these printk("\n") from the code. But this also
naturally allows selective removal. Say, keeping for sanitizers and
some other cases, but removing some that are not useful.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread Marco Elver
On Mon, 20 Jul 2020 at 10:41, Sergey Senozhatsky
 wrote:
>
> On (20/07/20 08:43), Marco Elver wrote:
> > On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote:
> >
> > As I said, a number of debugging tools use them to format reports to be
> > more readable (visually separate title and report body, and separate
> > parts of the report). Also, such reports are often parsed by CI systems,
> > and by changing the reports, these CI systems may break. But those are
> > just the usecases I'm acutely aware of
>
> Can you give example of such CI systems? // that's a real question //

None of ours should break; I agree the CI system is brittle if it
relies on newlines. Parsed and displayed reports are changing, however
-- what irks me is now all the reports sent to the LKML look ugly.

Some random KASAN reports (just compare formatting):
next (ugly): 
https://lore.kernel.org/lkml/c87b7305aadb6...@google.com/
mainline (normal):
https://lore.kernel.org/lkml/f4ef6a05aa92e...@google.com/

The same problem exists with lockdep reports, KCSAN reports, ... If
newline-printks to insert blank lines are now banned, what are we to
do? Send dozens of patches to switch everyone to printk(" \n")? Or
some better suggestion? I cannot yet see how that is an improvement.
(And if the behaviour is not reverted, please document the new
behaviour.)

That also doesn't yet address the ~400 other newline-printk users, and
somebody needs to do the due diligence to understand if it's just a
flush, or an intentional blank line.

Thanks,
-- Marco

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread Sergey Senozhatsky
On (20/07/20 08:43), Marco Elver wrote:
> On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote:
> 
> As I said, a number of debugging tools use them to format reports to be
> more readable (visually separate title and report body, and separate
> parts of the report). Also, such reports are often parsed by CI systems,
> and by changing the reports, these CI systems may break. But those are
> just the usecases I'm acutely aware of

Can you give example of such CI systems? // that's a real question //

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread Sergey Senozhatsky
On (20/07/20 08:43), Marco Elver wrote:
[..]
> please see a full list of newline-print users below.
[..]
> $> git grep -En '\<(printk|pr_err|pr_warn|pr_info)\>\("\\n"\)'
> arch/alpha/kernel/core_wildfire.c:650:printk("\n");
> arch/alpha/kernel/core_wildfire.c:658:printk("\n");
> arch/alpha/kernel/traps.c:120:printk("\n");
[..]

In many cases printk("\n") is not for "print a blank line", but
rather for "flush pr_cont buffer".

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH 07/13] fs/kernel_read_file: Switch buffer size arg to size_t

2020-07-20 Thread David Laight
From: Kees Cook
> Sent: 17 July 2020 18:43
> In preparation for further refactoring of kernel_read_file*(), rename
> the "max_size" argument to the more accurate "buf_size", and correct
> its type to size_t. Add kerndoc to explain the specifics of how the
> arguments will be used. Note that with buf_size now size_t, it can no
> longer be negative (and was never called with a negative value). Adjust
> callers to use it as a "maximum size" when *buf is NULL.
> 
> Signed-off-by: Kees Cook 
> ---
>  fs/kernel_read_file.c| 34 +++-
>  include/linux/kernel_read_file.h |  8 
>  security/integrity/digsig.c  |  2 +-
>  security/integrity/ima/ima_fs.c  |  2 +-
>  4 files changed, 31 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
> index dc28a8def597..e21a76001fff 100644
> --- a/fs/kernel_read_file.c
> +++ b/fs/kernel_read_file.c
> @@ -5,15 +5,31 @@
>  #include 
>  #include 
> 
> +/**
> + * kernel_read_file() - read file contents into a kernel buffer
> + *
> + * @file file to read from
> + * @buf  pointer to a "void *" buffer for reading into (if
> + *   *@buf is NULL, a buffer will be allocated, and
> + *   @buf_size will be ignored)
> + * @buf_size size of buf, if already allocated. If @buf not
> + *   allocated, this is the largest size to allocate.
> + * @id   the kernel_read_file_id identifying the type of
> + *   file contents being read (for LSMs to examine)
> + *
> + * Returns number of bytes read (no single read will be bigger
> + * than INT_MAX), or negative on error.
> + *
> + */

That seems to be self-inconsistent.
If '*buf' is NULL is both says that buf_size is ignored and
is treated as a limit.
To make life easier, zero should probably be treated as no-limit.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread Andy Shevchenko
On Mon, Jul 20, 2020 at 9:45 AM Marco Elver  wrote:
> On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote:
> > On (20/07/18 14:10), Marco Elver wrote:
> > >
> > > It seems this causes a regression observed at least with newline-only
> > > printks. I noticed this during -next testing because various debugging
> > > tools (K*SAN, lockdep, etc.) use e.g. pr_{err,warn,info}("\n") to format
> > > reports.
> > >
> > > Without wanting to wait for a report from one of these debugging tools,
> > > a simple reproducer is below. Without this patch, the expected newline
> > > is printed.
> >
> > Empty/blank lines carry no valuable payload, could you please explain
> > why do you consider this to be a regression?
>
> Empty/blank lines are visually valuable.
>
> Did I miss a discussion somewhere that this change is acceptable?
> Unfortunately, I can't find it mentioned in the commit message, and
> therefore assumed it's a regression.
>
> As I said, a number of debugging tools use them to format reports to be
> more readable (visually separate title and report body, and separate
> parts of the report).

While I can find it useful in some cases, though messages can be
interleaved, ...

> Also, such reports are often parsed by CI systems,
> and by changing the reports, these CI systems may break. But those are
> just the usecases I'm acutely aware of -- please see a full list of
> newline-print users below.

...but this is a weak argument. If your CI relies on message rather on
the ABI, you earn the breakage.
Go and fix your CI to do sane things instead.

> Breaking the observable behaviour of a widely used interface such as
> printk doesn't seem right. Where the newline-print is inappropriate,
> wouldn't removing that newline-print be more appropriate (instead of
> forcing this behaviour on everyone)?

-- 
With Best Regards,
Andy Shevchenko

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec