Re: [PATCH v3 1/1] fs: move kernel_read_file* to its own include file

2020-06-30 Thread Scott Branden

Hi Al (Viro),

Are you able to take this patch into your tree or does someone else?

On 2020-06-24 12:55 a.m., Christoph Hellwig wrote:

Looks good,

Reviewed-by: Christoph Hellwig 



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 2/3] printk: add lockless ringbuffer

2020-06-30 Thread Paul E. McKenney
On Thu, Jun 18, 2020 at 04:55:18PM +0206, John Ogness wrote:
> Introduce a multi-reader multi-writer lockless ringbuffer for storing
> the kernel log messages. Readers and writers may use their API from
> any context (including scheduler and NMI). This ringbuffer will make
> it possible to decouple printk() callers from any context, locking,
> or console constraints. It also makes it possible for readers to have
> full access to the ringbuffer contents at any time and context (for
> example from any panic situation).
> 
> The printk_ringbuffer is made up of 3 internal ringbuffers:
> 
> desc_ring:
> A ring of descriptors. A descriptor contains all record meta data
> (sequence number, timestamp, loglevel, etc.) as well as internal state
> information about the record and logical positions specifying where in
> the other ringbuffers the text and dictionary strings are located.
> 
> text_data_ring:
> A ring of data blocks. A data block consists of an unsigned long
> integer (ID) that maps to a desc_ring index followed by the text
> string of the record.
> 
> dict_data_ring:
> A ring of data blocks. A data block consists of an unsigned long
> integer (ID) that maps to a desc_ring index followed by the dictionary
> string of the record.
> 
> The internal state information of a descriptor is the key element to
> allow readers and writers to locklessly synchronize access to the data.
> 
> Co-developed-by: Petr Mladek 
> Signed-off-by: John Ogness 

The orderings match the comments, although a number could (later!)
be weakened to the easier-to-read smp_load_acquire() and/or
smp_store_release().  So, from a memory-ordering perspective:

Reviewed-by: Paul E. McKenney 

> ---
>  kernel/printk/Makefile|1 +
>  kernel/printk/printk_ringbuffer.c | 1674 +
>  kernel/printk/printk_ringbuffer.h |  352 ++
>  3 files changed, 2027 insertions(+)
>  create mode 100644 kernel/printk/printk_ringbuffer.c
>  create mode 100644 kernel/printk/printk_ringbuffer.h
> 
> diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
> index 4d052fc6bcde..eee3dc9b60a9 100644
> --- a/kernel/printk/Makefile
> +++ b/kernel/printk/Makefile
> @@ -2,3 +2,4 @@
>  obj-y= printk.o
>  obj-$(CONFIG_PRINTK) += printk_safe.o
>  obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)   += braille.o
> +obj-$(CONFIG_PRINTK) += printk_ringbuffer.o
> diff --git a/kernel/printk/printk_ringbuffer.c 
> b/kernel/printk/printk_ringbuffer.c
> new file mode 100644
> index ..75d056436cc5
> --- /dev/null
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -0,0 +1,1674 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "printk_ringbuffer.h"
> +
> +/**
> + * DOC: printk_ringbuffer overview
> + *
> + * Data Structure
> + * --
> + * The printk_ringbuffer is made up of 3 internal ringbuffers:
> + *
> + *   desc_ring
> + * A ring of descriptors. A descriptor contains all record meta data
> + * (sequence number, timestamp, loglevel, etc.) as well as internal state
> + * information about the record and logical positions specifying where in
> + * the other ringbuffers the text and dictionary strings are located.
> + *
> + *   text_data_ring
> + * A ring of data blocks. A data block consists of an unsigned long
> + * integer (ID) that maps to a desc_ring index followed by the text
> + * string of the record.
> + *
> + *   dict_data_ring
> + * A ring of data blocks. A data block consists of an unsigned long
> + * integer (ID) that maps to a desc_ring index followed by the dictionary
> + * string of the record.
> + *
> + * The internal state information of a descriptor is the key element to allow
> + * readers and writers to locklessly synchronize access to the data.
> + *
> + * Implementation
> + * --
> + *
> + * Descriptor Ring
> + * ~~~
> + * The descriptor ring is an array of descriptors. A descriptor contains all
> + * the meta data of a printk record as well as blk_lpos structs pointing to
> + * associated text and dictionary data blocks (see "Data Rings" below). Each
> + * descriptor is assigned an ID that maps directly to index values of the
> + * descriptor array and has a state. The ID and the state are bitwise 
> combined
> + * into a single descriptor field named @state_var, allowing ID and state to
> + * be synchronously and atomically updated.
> + *
> + * Descriptors have three states:
> + *
> + *   reserved
> + * A writer is modifying the record.
> + *
> + *   committed
> + * The record and all its data are complete and available for reading.
> + *
> + *   reusable
> + * The record exists, but its text and/or dictionary data may no longer
> + * be available.
> + *
> + * Querying the @state_var of a record requires providing the ID of the
> + * descriptor to query. This can yield a possible fourth (pseudo) state:
> + *
> + *   miss
> + * The descriptor being 

Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-06-30 Thread piliu



On 06/30/2020 02:10 PM, Hari Bathini wrote:
> 
> 
> On 30/06/20 9:00 am, piliu wrote:
>>
>>
>> On 06/29/2020 01:55 PM, Hari Bathini wrote:
>>>
>>>
>>> On 28/06/20 7:44 am, piliu wrote:
 Hi Hari,
>>>
>>> Hi Pingfan,
>>>

 After a quick through for this series, I have a few question/comment on
 this patch for the time being. Pls see comment inline.

 On 06/27/2020 03:05 AM, Hari Bathini wrote:
> crashkernel region could have an overlap with special memory regions
> like  opal, rtas, tce-table & such. These regions are referred to as
> exclude memory ranges. Setup this ranges during image probe in order
> to avoid them while finding the buffer for different kdump segments.
>>>
>>> [...]
>>>
> + /*
> +  * Use the locate_mem_hole logic in kexec_add_buffer() for regular
> +  * kexec_file_load syscall
> +  */
> + if (kbuf->image->type != KEXEC_TYPE_CRASH)
> + return 0;
 Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
 there is no requirement for @exclude_ranges.
>>>
>>> The ranges like rtas, opal are loaded by f/w. They almost always overlap 
>>> with
>>> crashkernel region. So, @exclude_ranges is required to support kdump.
>> f/w passes rtas/opal as service, then must f/w mark these ranges as
>> fdt_reserved_mem in order to make kernel aware not to use these ranges?
> 
> It does. Actually, reserve_map + reserved-ranges are reserved as soon as
> memblock allocator is ready but not before crashkernel reservation.
> Check early_reserve_mem() call in kernel/prom.c
> 
>> Otherwise kernel memory allocation besides kdump can also overwrite
>> these ranges.> 
>> Hmm, revisiting reserve_crashkernel(). It seems not to take any reserved
>> memory into consider except kernel text. Could it work based on memblock
>> allocator?
> 
> So, kdump could possibly overwrite these regions which is why an exclude
> range list is needed. Same thing was done in kexec-tools as well.
OK, got it.

Thanks,
Pingfan
> 
> Thanks
> Hari
> 
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-06-30 Thread Hari Bathini



On 30/06/20 9:00 am, piliu wrote:
> 
> 
> On 06/29/2020 01:55 PM, Hari Bathini wrote:
>>
>>
>> On 28/06/20 7:44 am, piliu wrote:
>>> Hi Hari,
>>
>> Hi Pingfan,
>>
>>>
>>> After a quick through for this series, I have a few question/comment on
>>> this patch for the time being. Pls see comment inline.
>>>
>>> On 06/27/2020 03:05 AM, Hari Bathini wrote:
 crashkernel region could have an overlap with special memory regions
 like  opal, rtas, tce-table & such. These regions are referred to as
 exclude memory ranges. Setup this ranges during image probe in order
 to avoid them while finding the buffer for different kdump segments.
>>
>> [...]
>>
 +  /*
 +   * Use the locate_mem_hole logic in kexec_add_buffer() for regular
 +   * kexec_file_load syscall
 +   */
 +  if (kbuf->image->type != KEXEC_TYPE_CRASH)
 +  return 0;
>>> Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
>>> there is no requirement for @exclude_ranges.
>>
>> The ranges like rtas, opal are loaded by f/w. They almost always overlap with
>> crashkernel region. So, @exclude_ranges is required to support kdump.
> f/w passes rtas/opal as service, then must f/w mark these ranges as
> fdt_reserved_mem in order to make kernel aware not to use these ranges?

It does. Actually, reserve_map + reserved-ranges are reserved as soon as
memblock allocator is ready but not before crashkernel reservation.
Check early_reserve_mem() call in kernel/prom.c

> Otherwise kernel memory allocation besides kdump can also overwrite
> these ranges.> 
> Hmm, revisiting reserve_crashkernel(). It seems not to take any reserved
> memory into consider except kernel text. Could it work based on memblock
> allocator?

So, kdump could possibly overwrite these regions which is why an exclude
range list is needed. Same thing was done in kexec-tools as well.

Thanks
Hari

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec