Re: [PATCH v20 3/5] arm64: kdump: reimplement crashkernel=X

2022-02-13 Thread Leizhen (ThunderTown)



On 2022/2/14 11:52, Baoquan He wrote:
> On 01/24/22 at 04:47pm, Zhen Lei wrote:
> ..
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 6c653a2c7cff052..a5d43feac0d7d96 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -71,6 +71,30 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
>>  #define CRASH_ADDR_LOW_MAX  arm64_dma_phys_limit
>>  #define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
>>  
>> +static int __init reserve_crashkernel_low(unsigned long long low_size)
>> +{
>> +unsigned long long low_base;
>> +
>> +/* passed with crashkernel=0,low ? */
>> +if (!low_size)
>> +return 0;
>> +
>> +low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
>> CRASH_ADDR_LOW_MAX);
>> +if (!low_base) {
>> +pr_err("cannot allocate crashkernel low memory 
>> (size:0x%llx).\n", low_size);
>> +return -ENOMEM;
>> +}
>> +
>> +pr_info("crashkernel low memory reserved: 0x%llx - 0x%llx (%lld MB)\n",
>> +low_base, low_base + low_size, low_size >> 20);
>> +
>> +crashk_low_res.start = low_base;
>> +crashk_low_res.end   = low_base + low_size - 1;
>> +insert_resource(_resource, _low_res);
>> +
>> +return 0;
>> +}
>> +
>>  /*
>>   * reserve_crashkernel() - reserves memory for crash kernel
> 
> My another concern is the crashkernel=,low handling. In this patch, the
> code related to low memory is obscure. Wondering if we should make them
> explicit with a little redundant but very clear code flows. Saying this
> because the code must be very clear to you and reviewers, it may be
> harder for later code reader or anyone interested to understand.
> 
> 1) crashkernel=X,high
> 2) crashkernel=X,high crashkernel=Y,low
> 3) crashkernel=X,high crashkernel=0,low
> 4) crashkernel=X,high crashkernel='messy code',low
> 5) crashkernel=X //fall back to high memory, low memory is required then.
> 
> It could be me thinking about it too much. I made changes to your patch
> with a tuning, not sure if it's OK to you. Otherwise, this patchset

I think it's good.

> works very well for all above test cases, it's ripe to be merged for
> wider testing.

I will test it tomorrow. I've prepared a little more use cases than yours.

1) crashkernel=4G   //high=4G, 
low=256M
2) crashkernel=4G crashkernel=512M,high crashkernel=512M,low//high=4G, 
low=256M, high and low are ignored
3) crashkernel=4G crashkernel=512M,high //high=4G, 
low=256M, high is ignored
4) crashkernel=4G crashkernel=512M,low  //high=4G, 
low=256M, low is ignored
5) crashkernel=4G@0xe000//high=0G, 
low=0M, cannot allocate, failed
6) crashkernel=512M //high=0G, 
low=512M
7) crashkernel=128M //high=0G, 
low=128M
8) crashkernel=512M@0xde00  //512M@3552M//high=0G, 
low=512M
9) crashkernel=4G,high  //high=4G, 
low=256M
a) crashkernel=4G,high crashkernel=512M,low //high=4G, 
low=512M
b) crashkernel=512M,high crashkernel=128M,low   //high=512M, 
low=128M
c) crashkernel=512M,low //high=0G, 
low=0M, invalid


> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a5d43feac0d7..671862c56d7d 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -94,7 +94,8 @@ static int __init reserve_crashkernel_low(unsigned long 
> long low_size)
>  
>   return 0;
>  }
> -
> +/*Words explaining why it's 256M*/
> +#define DEFAULT_CRASH_KERNEL_LOW_SIZE SZ_256M
>  /*
>   * reserve_crashkernel() - reserves memory for crash kernel
>   *
> @@ -105,10 +106,10 @@ static int __init reserve_crashkernel_low(unsigned long 
> long low_size)
>  static void __init reserve_crashkernel(void)
>  {
>   unsigned long long crash_base, crash_size;
> - unsigned long long crash_low_size = SZ_256M;
> + unsigned long long crash_low_size;
>   unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>   int ret;
> - bool fixed_base;
> + bool fixed_base, high;
>   char *cmdline = boot_command_line;
>  
>   /* crashkernel=X[@offset] */
> @@ -126,7 +127,10 @@ static void __init reserve_crashkernel(void)
>   ret = parse_crashkernel_low(cmdline, 0, _size, _base);
>   if (!ret)
>   crash_low_size = low_size;
> + else
> + crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
>  
> + high = true;
>   crash_max = CRASH_ADDR_HIGH_MAX;
>   }
>  
> @@ -134,7 +138,7 @@ static void __init reserve_crashkernel(void)
>   crash_size = PAGE_ALIGN(crash_size);
>  
>   /* User specifies base address explicitly. */
> - if (crash_base)
> + if (fixed_base)
>   crash_max = 

Re: [PATCH v20 3/5] arm64: kdump: reimplement crashkernel=X

2022-02-13 Thread Baoquan He
On 02/14/22 at 02:44pm, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/2/11 18:51, Baoquan He wrote:
> > On 02/11/22 at 06:41pm, Leizhen (ThunderTown) wrote:
> >>
> >>
> >> On 2022/2/11 18:30, Baoquan He wrote:
> >>> On 01/24/22 at 04:47pm, Zhen Lei wrote:
>  From: Chen Zhou 
> >>> ..
>  diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>  index 6c653a2c7cff052..a5d43feac0d7d96 100644
>  --- a/arch/arm64/mm/init.c
>  +++ b/arch/arm64/mm/init.c
>  @@ -71,6 +71,30 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
>   #define CRASH_ADDR_LOW_MAX  arm64_dma_phys_limit
>   #define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
>   
>  +static int __init reserve_crashkernel_low(unsigned long long low_size)
>  +{
>  +unsigned long long low_base;
>  +
>  +/* passed with crashkernel=0,low ? */
>  +if (!low_size)
>  +return 0;
>  +
>  +low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
>  CRASH_ADDR_LOW_MAX);
>  +if (!low_base) {
>  +pr_err("cannot allocate crashkernel low memory 
>  (size:0x%llx).\n", low_size);
>  +return -ENOMEM;
>  +}
>  +
>  +pr_info("crashkernel low memory reserved: 0x%llx - 0x%llx (%lld 
>  MB)\n",
>  +low_base, low_base + low_size, low_size >> 20);
>  +
>  +crashk_low_res.start = low_base;
>  +crashk_low_res.end   = low_base + low_size - 1;
>  +insert_resource(_resource, _low_res);
>  +
>  +return 0;
>  +}
>  +
>   /*
>    * reserve_crashkernel() - reserves memory for crash kernel
>    *
>  @@ -81,29 +105,62 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
>   static void __init reserve_crashkernel(void)
>   {
>   unsigned long long crash_base, crash_size;
>  +unsigned long long crash_low_size = SZ_256M;
>   unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>   int ret;
>  +bool fixed_base;
>  +char *cmdline = boot_command_line;
>   
>  -ret = parse_crashkernel(boot_command_line, 
>  memblock_phys_mem_size(),
>  +/* crashkernel=X[@offset] */
>  +ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
>   _size, _base);
>  -/* no crashkernel= or invalid value specified */
>  -if (ret || !crash_size)
>  -return;
>  +if (ret || !crash_size) {
>  +unsigned long long low_size;
>   
>  +/* crashkernel=X,high */
>  +ret = parse_crashkernel_high(cmdline, 0, _size, 
>  _base);
>  +if (ret || !crash_size)
>  +return;
>  +
>  +/* crashkernel=X,low */
>  +ret = parse_crashkernel_low(cmdline, 0, _size, 
>  _base);
>  +if (!ret)
>  +crash_low_size = low_size;
> >>>
> >>> Here, the error case is not checked and handled. But it still gets
> >>> expeced result which is the default SZ_256M. Is this designed on
> >>> purpose?
> >>
> >> Yes, we can specify only "crashkernel=X,high".
> >>
> >> This is mentioned in Documentation/admin-guide/kernel-parameters.txt
> >>
> >> crashkernel=size[KMG],low
> >> [KNL, X86-64] range under 4G. When 
> >> crashkernel=X,high
> >> is passed, kernel could allocate physical memory 
> >> region
> >> above 4G, that cause second kernel crash on system
> >> that require some amount of low memory, e.g. 
> >> swiotlb
> >> requires at least 64M+32K low memory, also enough 
> >> extra
> >> low memory is needed to make sure DMA buffers for 
> >> 32-bit
> >> devices won't run out. Kernel would try to 
> >> allocate at <-
> >> least 256M below 4G automatically. 
> >> <-
> > 
> > Yeah, that is expected becasue no crahskernel=,low is a right usage. The
> > 'ret' is 0 in the case. If I gave below string, it works too.
> > "crashkernel=256M,high crashkernel=aaabbadfadfd,low"
> 
> Yes, so maybe we should change the error code in __parse_crashkernel()
> from "-EINVAL" to "-ENOENT" when the specified option does not exist.

Good point. I also thought of this, it could be next step clean up. X86
code need this too. In crashkernel='messy code',high, it will fail to
reserve. For consistency, we should fail crashkrenel='messy code',low
too.

> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 256cf6db573cd09..395f4fac1773f28 100644
> --- a/kernel/crash_core.c
> +++ 

Re: [PATCH v20 3/5] arm64: kdump: reimplement crashkernel=X

2022-02-13 Thread Leizhen (ThunderTown)



On 2022/2/11 18:51, Baoquan He wrote:
> On 02/11/22 at 06:41pm, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/2/11 18:30, Baoquan He wrote:
>>> On 01/24/22 at 04:47pm, Zhen Lei wrote:
 From: Chen Zhou 
>>> ..
 diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
 index 6c653a2c7cff052..a5d43feac0d7d96 100644
 --- a/arch/arm64/mm/init.c
 +++ b/arch/arm64/mm/init.c
 @@ -71,6 +71,30 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
  #define CRASH_ADDR_LOW_MAXarm64_dma_phys_limit
  #define CRASH_ADDR_HIGH_MAX   MEMBLOCK_ALLOC_ACCESSIBLE
  
 +static int __init reserve_crashkernel_low(unsigned long long low_size)
 +{
 +  unsigned long long low_base;
 +
 +  /* passed with crashkernel=0,low ? */
 +  if (!low_size)
 +  return 0;
 +
 +  low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
 CRASH_ADDR_LOW_MAX);
 +  if (!low_base) {
 +  pr_err("cannot allocate crashkernel low memory 
 (size:0x%llx).\n", low_size);
 +  return -ENOMEM;
 +  }
 +
 +  pr_info("crashkernel low memory reserved: 0x%llx - 0x%llx (%lld MB)\n",
 +  low_base, low_base + low_size, low_size >> 20);
 +
 +  crashk_low_res.start = low_base;
 +  crashk_low_res.end   = low_base + low_size - 1;
 +  insert_resource(_resource, _low_res);
 +
 +  return 0;
 +}
 +
  /*
   * reserve_crashkernel() - reserves memory for crash kernel
   *
 @@ -81,29 +105,62 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
  static void __init reserve_crashkernel(void)
  {
unsigned long long crash_base, crash_size;
 +  unsigned long long crash_low_size = SZ_256M;
unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
int ret;
 +  bool fixed_base;
 +  char *cmdline = boot_command_line;
  
 -  ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
 +  /* crashkernel=X[@offset] */
 +  ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
_size, _base);
 -  /* no crashkernel= or invalid value specified */
 -  if (ret || !crash_size)
 -  return;
 +  if (ret || !crash_size) {
 +  unsigned long long low_size;
  
 +  /* crashkernel=X,high */
 +  ret = parse_crashkernel_high(cmdline, 0, _size, 
 _base);
 +  if (ret || !crash_size)
 +  return;
 +
 +  /* crashkernel=X,low */
 +  ret = parse_crashkernel_low(cmdline, 0, _size, _base);
 +  if (!ret)
 +  crash_low_size = low_size;
>>>
>>> Here, the error case is not checked and handled. But it still gets
>>> expeced result which is the default SZ_256M. Is this designed on
>>> purpose?
>>
>> Yes, we can specify only "crashkernel=X,high".
>>
>> This is mentioned in Documentation/admin-guide/kernel-parameters.txt
>>
>> crashkernel=size[KMG],low
>> [KNL, X86-64] range under 4G. When crashkernel=X,high
>> is passed, kernel could allocate physical memory 
>> region
>> above 4G, that cause second kernel crash on system
>> that require some amount of low memory, e.g. swiotlb
>> requires at least 64M+32K low memory, also enough 
>> extra
>> low memory is needed to make sure DMA buffers for 
>> 32-bit
>> devices won't run out. Kernel would try to allocate 
>> at <-
>> least 256M below 4G automatically.   
>>   <-
> 
> Yeah, that is expected becasue no crahskernel=,low is a right usage. The
> 'ret' is 0 in the case. If I gave below string, it works too.
> "crashkernel=256M,high crashkernel=aaabbadfadfd,low"

Yes, so maybe we should change the error code in __parse_crashkernel()
from "-EINVAL" to "-ENOENT" when the specified option does not exist.

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 256cf6db573cd09..395f4fac1773f28 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -243,9 +243,8 @@ static int __init __parse_crashkernel(char *cmdline,
*crash_base = 0;

ck_cmdline = get_last_crashkernel(cmdline, name, suffix);
-
if (!ck_cmdline)
-   return -EINVAL;
+   return -ENOENT;

ck_cmdline += strlen(name);


> 
>>
>>>
 +
 +  crash_max = CRASH_ADDR_HIGH_MAX;
 +  }
 +
 +  fixed_base = !!crash_base;
crash_size = PAGE_ALIGN(crash_size);
  
/* User specifies base address explicitly. */
if (crash_base)
crash_max = crash_base + crash_size;
  
 +retry:
crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
   crash_base, 

Re: [PATCH v20 2/5] arm64: kdump: introduce some macros for crash kernel reservation

2022-02-13 Thread Leizhen (ThunderTown)



On 2022/2/11 18:39, Baoquan He wrote:
> On 01/24/22 at 04:47pm, Zhen Lei wrote:
>> From: Chen Zhou 
>>
>> Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
>> for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
>> upper bound of high crash memory, use macros instead.
>>
>> Signed-off-by: Chen Zhou 
>> Signed-off-by: Zhen Lei 
>> Tested-by: John Donnelly 
>> Tested-by: Dave Kleikamp 
>> ---
>>  arch/arm64/mm/init.c | 11 ---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 90f276d46b93bc6..6c653a2c7cff052 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -65,6 +65,12 @@ EXPORT_SYMBOL(memstart_addr);
>>  phys_addr_t arm64_dma_phys_limit __ro_after_init;
>>  
>>  #ifdef CONFIG_KEXEC_CORE
>> +/* Current arm64 boot protocol requires 2MB alignment */
>> +#define CRASH_ALIGN SZ_2M
>> +
>> +#define CRASH_ADDR_LOW_MAX  arm64_dma_phys_limit
>> +#define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
> 
> MEMBLOCK_ALLOC_ACCESSIBLE is obvoiously a alloc flag for memblock
> allocator, I don't think it's appropriate to make HIGH_MAX get its value.

Right, thanks.

> You can make it as memblock.current_limit, or do not define it, but using
> MEMBLOCK_ALLOC_ACCESSIBLE direclty in memblock_phys_alloc_range() with
> a code comment. 

This patch is not required at present. These macros are added to eliminate
differences to share code with x86.

> 
> 
>> +
>>  /*
>>   * reserve_crashkernel() - reserves memory for crash kernel
>>   *
>> @@ -75,7 +81,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
>>  static void __init reserve_crashkernel(void)
>>  {
>>  unsigned long long crash_base, crash_size;
>> -unsigned long long crash_max = arm64_dma_phys_limit;
>> +unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>>  int ret;
>>  
>>  ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>> @@ -90,8 +96,7 @@ static void __init reserve_crashkernel(void)
>>  if (crash_base)
>>  crash_max = crash_base + crash_size;
>>  
>> -/* Current arm64 boot protocol requires 2MB alignment */
>> -crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
>> +crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>> crash_base, crash_max);
>>  if (!crash_base) {
>>  pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>> -- 
>> 2.25.1
>>
> 
> .
> 

-- 
Regards,
  Zhen Lei

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 3/5] arm64: kdump: reimplement crashkernel=X

2022-02-13 Thread Baoquan He
On 01/24/22 at 04:47pm, Zhen Lei wrote:
..
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 6c653a2c7cff052..a5d43feac0d7d96 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -71,6 +71,30 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
>  #define CRASH_ADDR_LOW_MAX   arm64_dma_phys_limit
>  #define CRASH_ADDR_HIGH_MAX  MEMBLOCK_ALLOC_ACCESSIBLE
>  
> +static int __init reserve_crashkernel_low(unsigned long long low_size)
> +{
> + unsigned long long low_base;
> +
> + /* passed with crashkernel=0,low ? */
> + if (!low_size)
> + return 0;
> +
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
> CRASH_ADDR_LOW_MAX);
> + if (!low_base) {
> + pr_err("cannot allocate crashkernel low memory 
> (size:0x%llx).\n", low_size);
> + return -ENOMEM;
> + }
> +
> + pr_info("crashkernel low memory reserved: 0x%llx - 0x%llx (%lld MB)\n",
> + low_base, low_base + low_size, low_size >> 20);
> +
> + crashk_low_res.start = low_base;
> + crashk_low_res.end   = low_base + low_size - 1;
> + insert_resource(_resource, _low_res);
> +
> + return 0;
> +}
> +
>  /*
>   * reserve_crashkernel() - reserves memory for crash kernel

My another concern is the crashkernel=,low handling. In this patch, the
code related to low memory is obscure. Wondering if we should make them
explicit with a little redundant but very clear code flows. Saying this
because the code must be very clear to you and reviewers, it may be
harder for later code reader or anyone interested to understand.

1) crashkernel=X,high
2) crashkernel=X,high crashkernel=Y,low
3) crashkernel=X,high crashkernel=0,low
4) crashkernel=X,high crashkernel='messy code',low
5) crashkernel=X //fall back to high memory, low memory is required then.

It could be me thinking about it too much. I made changes to your patch
with a tuning, not sure if it's OK to you. Otherwise, this patchset
works very well for all above test cases, it's ripe to be merged for
wider testing.

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a5d43feac0d7..671862c56d7d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -94,7 +94,8 @@ static int __init reserve_crashkernel_low(unsigned long long 
low_size)
 
return 0;
 }
-
+/*Words explaining why it's 256M*/
+#define DEFAULT_CRASH_KERNEL_LOW_SIZE SZ_256M
 /*
  * reserve_crashkernel() - reserves memory for crash kernel
  *
@@ -105,10 +106,10 @@ static int __init reserve_crashkernel_low(unsigned long 
long low_size)
 static void __init reserve_crashkernel(void)
 {
unsigned long long crash_base, crash_size;
-   unsigned long long crash_low_size = SZ_256M;
+   unsigned long long crash_low_size;
unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
int ret;
-   bool fixed_base;
+   bool fixed_base, high;
char *cmdline = boot_command_line;
 
/* crashkernel=X[@offset] */
@@ -126,7 +127,10 @@ static void __init reserve_crashkernel(void)
ret = parse_crashkernel_low(cmdline, 0, _size, _base);
if (!ret)
crash_low_size = low_size;
+   else
+   crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
 
+   high = true;
crash_max = CRASH_ADDR_HIGH_MAX;
}
 
@@ -134,7 +138,7 @@ static void __init reserve_crashkernel(void)
crash_size = PAGE_ALIGN(crash_size);
 
/* User specifies base address explicitly. */
-   if (crash_base)
+   if (fixed_base)
crash_max = crash_base + crash_size;
 
 retry:
@@ -156,7 +160,10 @@ static void __init reserve_crashkernel(void)
return;
}
 
-   if (crash_base >= SZ_4G && reserve_crashkernel_low(crash_low_size)) {
+   if (crash_base >= SZ_4G && !high) 
+   crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
+
+   if (reserve_crashkernel_low(crash_low_size)) {
memblock_phys_free(crash_base, crash_size);
return;
}

>   *
> @@ -81,29 +105,62 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
>  static void __init reserve_crashkernel(void)
>  {
>   unsigned long long crash_base, crash_size;
> + unsigned long long crash_low_size = SZ_256M;
>   unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>   int ret;
> + bool fixed_base;
> + char *cmdline = boot_command_line;
>  
> - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> + /* crashkernel=X[@offset] */
> + ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
>   _size, _base);
> - /* no crashkernel= or invalid value specified */
> - if (ret || !crash_size)
> - return;
> + if (ret || !crash_size) {
> + unsigned long long low_size;
>  
> + /* crashkernel=X,high */
> + ret = 

Re: [PATCH v5 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2022-02-13 Thread Mimi Zohar
Hi Michal,

On Tue, 2022-01-11 at 12:37 +0100, Michal Suchanek wrote:
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index dea74d7717c0..1cde9b6c5987 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -560,6 +560,22 @@ config KEXEC_FILE
>  config ARCH_HAS_KEXEC_PURGATORY
> def_bool KEXEC_FILE
>  
> +config KEXEC_SIG
> +   bool "Verify kernel signature during kexec_file_load() syscall"
> +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> +   help
> + This option makes kernel signature verification mandatory for
> + the kexec_file_load() syscall.

When KEXEC_SIG is enabled on other architectures, IMA does not define a
kexec 'appraise' policy rule.  Refer to the policy rules in
security/ima/ima_efi.c.  Similarly the kexec 'appraise' policy rule in
arch/powerpc/kernel/ima_policy.c should not be defined.

-- 
thanks,

Mimi


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv2] kexec: disable cpu hotplug until the rebooting cpu is stable

2022-02-13 Thread Pingfan Liu
Gently ping, maintainers, could you share your opinions?


Thanks

On Thu, Jan 27, 2022 at 5:02 PM Pingfan Liu  wrote:
>
> The following identical code piece appears in both
> migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
>
> if (!cpu_online(primary_cpu))
> primary_cpu = cpumask_first(cpu_online_mask);
>
> This is due to a breakage like the following:
>migrate_to_reboot_cpu();
>cpu_hotplug_enable();
>   --> comes a cpu_down(this_cpu) on other cpu
>machine_shutdown();
>
> Although the kexec-reboot task can get through a cpu_down() on its cpu,
> this code looks a little confusing.
>
> Make things straight forward by keeping cpu hotplug disabled until
> smp_shutdown_nonboot_cpus() holds cpu_add_remove_lock. By this way, the
> breakage is squashed out and the rebooting cpu can keep unchanged.
>
> Note: this patch only affects the kexec-reboot on arches, which rely on
> cpu hotplug mechanism.
>
> Signed-off-by: Pingfan Liu 
> Cc: Eric Biederman 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Valentin Schneider 
> Cc: Vincent Donnefort 
> Cc: Ingo Molnar 
> Cc: Mark Rutland 
> Cc: YueHaibing 
> Cc: Baokun Li 
> Cc: Randy Dunlap 
> Cc: Valentin Schneider 
> Cc: kexec@lists.infradead.org
> To: linux-ker...@vger.kernel.org
> ---
> v1 -> v2:
>  improve commit log
>
>  kernel/cpu.c| 16 ++--
>  kernel/kexec_core.c | 10 --
>  2 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 9c92147f0812..87bdf21de950 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1240,20 +1240,24 @@ int remove_cpu(unsigned int cpu)
>  }
>  EXPORT_SYMBOL_GPL(remove_cpu);
>
> +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */
>  void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
>  {
> unsigned int cpu;
> int error;
>
> +   /*
> +* Block other cpu hotplug event, so primary_cpu is always online if
> +* it is not touched by us
> +*/
> cpu_maps_update_begin();
> -
> /*
> -* Make certain the cpu I'm about to reboot on is online.
> -*
> -* This is inline to what migrate_to_reboot_cpu() already do.
> +* migrate_to_reboot_cpu() disables CPU hotplug assuming that
> +* no further code needs to use CPU hotplug (which is true in
> +* the reboot case). However, the kexec path depends on using
> +* CPU hotplug again; so re-enable it here.
>  */
> -   if (!cpu_online(primary_cpu))
> -   primary_cpu = cpumask_first(cpu_online_mask);
> +   __cpu_hotplug_enable();
>
> for_each_online_cpu(cpu) {
> if (cpu == primary_cpu)
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 68480f731192..db4fa6b174e3 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1168,14 +1168,12 @@ int kernel_kexec(void)
> kexec_in_progress = true;
> kernel_restart_prepare("kexec reboot");
> migrate_to_reboot_cpu();
> -
> /*
> -* migrate_to_reboot_cpu() disables CPU hotplug assuming that
> -* no further code needs to use CPU hotplug (which is true in
> -* the reboot case). However, the kexec path depends on using
> -* CPU hotplug again; so re-enable it here.
> +* migrate_to_reboot_cpu() disables CPU hotplug. If an arch
> +* relies on the cpu teardown to achieve reboot, it needs to
> +* re-enable CPU hotplug there.
>  */
> -   cpu_hotplug_enable();
> +
> pr_notice("Starting new kernel\n");
> machine_shutdown();
> }
> --
> 2.31.1
>

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 0/6] KEXEC_SIG with appended signature

2022-02-13 Thread Mimi Zohar
[Cc'ing  Nageswara R Sastry]

Hi Michal,

On Tue, 2022-01-11 at 12:37 +0100, Michal Suchanek wrote:
> Hello,
> 
> This is a refresh of the KEXEC_SIG series.
> 
> This adds KEXEC_SIG support on powerpc and deduplicates the code dealing
> with appended signatures in the kernel.

tools/testing/selftests/kexec/test_kexec_file_load.sh probably needs to
be updated to reflect the new Kconfig support.

FYI, commit 65e38e32a959 ("selftests/kexec: Enable secureboot tests for
PowerPC") recently was upstreamed.

-- 
thanks,

Mimi


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 0/6] KEXEC_SIG with appended signature

2022-02-13 Thread Mimi Zohar
Hi Michal,

On Tue, 2022-01-11 at 12:37 +0100, Michal Suchanek wrote:
> Hello,
> 
> This is a refresh of the KEXEC_SIG series.

> This adds KEXEC_SIG support on powerpc and deduplicates the code dealing
> with appended signatures in the kernel.
> 
> powerpc supports IMA_KEXEC but that's an exception rather than the norm.
> On the other hand, KEXEC_SIG is portable across platforms.

This Kconfig carries the IMA measurement list across kexec.  This has
nothing to do with appended signatures.

config IMA_KEXEC
bool "Enable carrying the IMA measurement list across a soft
boot"
depends on IMA && TCG_TPM && HAVE_IMA_KEXEC

In addition to powerpc, arm64 sets HAVE_IMA_KEXEC.

Even prior to the kexec appended signature support, like all other
files, the kexec kernel image signature could be stored in
security.ima.

> 
> For distributions to have uniform security features across platforms one
> option should be used on all platforms.

The kexec kernel image measurement will not be included in the BIOS
event log.  Even if the measurement is included in the IMA measurement
list, without the IMA_KEXEC Kconfig the measurement list will not be
carried across kexec.  For those not interested in "trusted boot" or
those who do not need it for compliance, the simplification should be
fine.

-- 
thanks,

Mimi


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2022-02-13 Thread Mimi Zohar
Hi Michal,

On Wed, 2022-02-09 at 13:01 +0100, Michal Suchánek wrote:
> > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > index dea74d7717c0..1cde9b6c5987 100644
> > > --- a/arch/powerpc/Kconfig
> > > +++ b/arch/powerpc/Kconfig
> > > @@ -560,6 +560,22 @@ config KEXEC_FILE
> > >   config ARCH_HAS_KEXEC_PURGATORY
> > > def_bool KEXEC_FILE
> > > +config KEXEC_SIG
> > > +   bool "Verify kernel signature during kexec_file_load() syscall"
> > > +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> > > +   help
> > > + This option makes kernel signature verification mandatory for
> > > + the kexec_file_load() syscall.
> > > +
> > > + In addition to that option, you need to enable signature
> > > + verification for the corresponding kernel image type being
> > > + loaded in order for this to work.
> > > +
> > > + Note: on powerpc IMA_ARCH_POLICY also implements kexec'ed kernel
> > > + verification. In addition IMA adds kernel hashes to the measurement
> > > + list, extends IMA PCR in the TPM, and implements kernel image
> > > + blacklist by hash.
> > 
> > So, what is the takeaway for the user? IMA_ARCH_POLICY is preferred? What is
> > the disadvantage, and two implementations(?) needed then? More overhead?
> 
> IMA_KEXEC does more than KEXEC_SIG. The overhead is probably not big
> unless you are trying to really minimize the kernel code size.
> 
> Arguably the simpler implementation hass less potential for bugs, too.
> Both in code and in user configuration required to enable the feature.
> 
> Interestingly IMA_ARCH_POLICY depends on KEXEC_SIG rather than
> IMA_KEXEC. Just mind-boggling.

FYI, a soft boot doesn't clear the TPM PCRs.  To be able to verify the
IMA measurement list after a kexec against a TPM quote, requires
carrying the measurement list across kexec.

The "IMA_KEXEC" config enables carrying the IMA measurement list across
kexec.  It has nothing to do with verifying the appended signature. 
That is based on kernel module appended signature code.

> 
> The main problem with IMA_KEXEC from my point of view is it is not portable.
> To record the measurements TPM support is requireed which is not available on
> all platforms.

Measuring the kexec kernel image and extending the TPM with the
measurement is required for trusted boot.  Boot loaders extend the
TPM's BIOS measurements. Similarly, IMA does not require a TPM, but if
one is available the kexec kernel image measurement is extended into
the IMA measurement list.  Otherwise, IMA goes into "TPM by-pass" mode.

> It does not support PE so it cannot be used on platforms
> that use PE kernel signature format.

True.  However, a kernel image with an appended signature may be
kexec'ed, regardless of the architecture.  Because some boot loaders
don't support appended signatures, from my point of view does not make
IMA kexec support not portable.

-- 
thanks,

Mimi


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec