from:"Dave Young"

Re: [PATCH v1 2/2] firmware: dmi_scan: Pass dmi_entry_point to kexec'ed kernel

2016-12-17 Thread Dave Young

Ccing efi people.

On 12/16/16 at 02:33pm, Jean Delvare wrote:
> On Fri, 16 Dec 2016 14:18:58 +0200, Andy Shevchenko wrote:
> > On Fri, 2016-12-16 at 10:32 +0800, Dave Young wrote:
> > > On 12/15/16 at 12:28pm, Jean Delvare wrote:
> > > > I am no kexec expert but this confuses me. Shouldn't the second
> > > > kernel have access to the EFI systab as the first kernel does? It
> > > > includes many more pointers than just ACPI and DMI tables, and it
> > > > would seem inconvenient to have to pass all these addresses
> > > > individually explicitly.
> > > 
> > > Yes, in modern linux kernel, kexec has the support for EFI, I think it
> > > should work naturally at least in x86_64.
> > 
> > Thanks for this good news!
> > 
> > Unfortunately Intel Galileo is 32-bit platform.
> 
> If it was done for X86_64 then maybe it can be generalized to X86?

For X86_64, we have a new way for efi runtime memmory mapping, in i386
code it still use old ioremap way. It is impossible to use same way as
the X86_64 since the virtual address space is limited.

But maybe for 32bit, kexec kernel can run in physical mode, but I'm not
sure, I would suggest Andy to do a test first with efi=noruntime for
kexec 2nd kernel.

Thanks
Dave

> 
> -- 
> Jean Delvare
> SUSE L3 Support

Re: [PATCH v1 2/2] firmware: dmi_scan: Pass dmi_entry_point to kexec'ed kernel

2016-12-17 Thread Dave Young

Ccing efi people.

On 12/16/16 at 02:33pm, Jean Delvare wrote:
> On Fri, 16 Dec 2016 14:18:58 +0200, Andy Shevchenko wrote:
> > On Fri, 2016-12-16 at 10:32 +0800, Dave Young wrote:
> > > On 12/15/16 at 12:28pm, Jean Delvare wrote:
> > > > I am no kexec expert but this confuses me. Shouldn't the second
> > > > kernel have access to the EFI systab as the first kernel does? It
> > > > includes many more pointers than just ACPI and DMI tables, and it
> > > > would seem inconvenient to have to pass all these addresses
> > > > individually explicitly.
> > > 
> > > Yes, in modern linux kernel, kexec has the support for EFI, I think it
> > > should work naturally at least in x86_64.
> > 
> > Thanks for this good news!
> > 
> > Unfortunately Intel Galileo is 32-bit platform.
> 
> If it was done for X86_64 then maybe it can be generalized to X86?

For X86_64, we have a new way for efi runtime memmory mapping, in i386
code it still use old ioremap way. It is impossible to use same way as
the X86_64 since the virtual address space is limited.

But maybe for 32bit, kexec kernel can run in physical mode, but I'm not
sure, I would suggest Andy to do a test first with efi=noruntime for
kexec 2nd kernel.

Thanks
Dave

> 
> -- 
> Jean Delvare
> SUSE L3 Support

Re: [PATCH v1 2/2] firmware: dmi_scan: Pass dmi_entry_point to kexec'ed kernel

2016-12-17 Thread Dave Young

On 12/16/16 at 02:18pm, Andy Shevchenko wrote:
> On Fri, 2016-12-16 at 10:32 +0800, Dave Young wrote:
> > On 12/15/16 at 12:28pm, Jean Delvare wrote:
> > > Hi Andy,
> > > 
> > > On Fri,  2 Dec 2016 21:54:16 +0200, Andy Shevchenko wrote:
> > > > Until now kexec'ed kernel has no clue where to look for DMI entry
> > > > point.
> > > > 
> > > > Pass it via kernel command line parameter in the same way as it's
> > > > done for ACPI
> > > > RSDP.
> > > 
> > > I am no kexec expert but this confuses me. Shouldn't the second
> > > kernel
> > > have access to the EFI systab as the first kernel does? It includes
> > > many more pointers than just ACPI and DMI tables, and it would seem
> > > inconvenient to have to pass all these addresses individually
> > > explicitly.
> > 
> > Yes, in modern linux kernel, kexec has the support for EFI, I think it
> > should work naturally at least in x86_64.
> 
> Thanks for this good news!
> 
> Unfortunately Intel Galileo is 32-bit platform.

Maybe you can try use efi=noruntime kernel parameter in kexec/kdump
kernel, see if it works or not.

> 
> -- 
> Andy Shevchenko <andriy.shevche...@linux.intel.com>
> Intel Finland Oy

Re: [PATCH v1 2/2] firmware: dmi_scan: Pass dmi_entry_point to kexec'ed kernel

2016-12-17 Thread Dave Young

On 12/16/16 at 02:18pm, Andy Shevchenko wrote:
> On Fri, 2016-12-16 at 10:32 +0800, Dave Young wrote:
> > On 12/15/16 at 12:28pm, Jean Delvare wrote:
> > > Hi Andy,
> > > 
> > > On Fri,  2 Dec 2016 21:54:16 +0200, Andy Shevchenko wrote:
> > > > Until now kexec'ed kernel has no clue where to look for DMI entry
> > > > point.
> > > > 
> > > > Pass it via kernel command line parameter in the same way as it's
> > > > done for ACPI
> > > > RSDP.
> > > 
> > > I am no kexec expert but this confuses me. Shouldn't the second
> > > kernel
> > > have access to the EFI systab as the first kernel does? It includes
> > > many more pointers than just ACPI and DMI tables, and it would seem
> > > inconvenient to have to pass all these addresses individually
> > > explicitly.
> > 
> > Yes, in modern linux kernel, kexec has the support for EFI, I think it
> > should work naturally at least in x86_64.
> 
> Thanks for this good news!
> 
> Unfortunately Intel Galileo is 32-bit platform.

Maybe you can try use efi=noruntime kernel parameter in kexec/kdump
kernel, see if it works or not.

> 
> -- 
> Andy Shevchenko 
> Intel Finland Oy

Re: [PATCH v1 2/2] firmware: dmi_scan: Pass dmi_entry_point to kexec'ed kernel

2016-12-15 Thread Dave Young

On 12/15/16 at 12:28pm, Jean Delvare wrote:
> Hi Andy,
> 
> On Fri,  2 Dec 2016 21:54:16 +0200, Andy Shevchenko wrote:
> > Until now kexec'ed kernel has no clue where to look for DMI entry point.
> > 
> > Pass it via kernel command line parameter in the same way as it's done for 
> > ACPI
> > RSDP.
> 
> I am no kexec expert but this confuses me. Shouldn't the second kernel
> have access to the EFI systab as the first kernel does? It includes
> many more pointers than just ACPI and DMI tables, and it would seem
> inconvenient to have to pass all these addresses individually
> explicitly.

Yes, in modern linux kernel, kexec has the support for EFI, I think it
should work naturally at least in x86_64.

Is there any test log with latest mainline kernel about this?

> 
> Adding Eric to Cc for his opinion.
> 
> > 
> > Signed-off-by: Andy Shevchenko 
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |  5 +
> >  drivers/firmware/dmi_scan.c | 14 ++
> >  2 files changed, 19 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index be2d6d0..94f219f 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -843,6 +843,11 @@
> > The filter can be disabled or changed to another
> > driver later using sysfs.
> >  
> > +   dmi_entry_point=[DMI,EFI,KEXEC]
> > +   Pass the DMI entry point to the kernel, mostly used
> > +   on machines running EFI runtime service to boot the
> > +   second kernel for kdump.
> > +
> > drm_kms_helper.edid_firmware=[:][,[:]]
> > Broken monitors, graphic adapters, KVMs and EDIDless
> > panels may send no or incorrect EDID data sets.
> > diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
> > index b88def6..215843f 100644
> > --- a/drivers/firmware/dmi_scan.c
> > +++ b/drivers/firmware/dmi_scan.c
> > @@ -595,8 +595,22 @@ static int __init dmi_smbios3_present(const u8 *buf)
> > return 1;
> >  }
> >  
> > +#ifdef CONFIG_KEXEC
> > +static unsigned long dmi_entry_point;
> > +static int __init setup_dmi_entry_point(char *arg)
> > +{
> > +   return kstrtoul(arg, 16, _entry_point);
> > +}
> > +early_param("dmi_entry_point", setup_dmi_entry_point);
> > +#endif
> > +
> >  static resource_size_t __init dmi_get_entry_point(void)
> >  {
> > +#ifdef CONFIG_KEXEC
> > +   if (dmi_entry_point)
> > +   return dmi_entry_point;
> > +#endif
> > +
> > if (efi_enabled(EFI_CONFIG_TABLES)) {
> > /*
> >  * According to the DMTF SMBIOS reference spec v3.0.0, it is
> 
> 
> -- 
> Jean Delvare
> SUSE L3 Support
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v1 2/2] firmware: dmi_scan: Pass dmi_entry_point to kexec'ed kernel

2016-12-15 Thread Dave Young

On 12/15/16 at 12:28pm, Jean Delvare wrote:
> Hi Andy,
> 
> On Fri,  2 Dec 2016 21:54:16 +0200, Andy Shevchenko wrote:
> > Until now kexec'ed kernel has no clue where to look for DMI entry point.
> > 
> > Pass it via kernel command line parameter in the same way as it's done for 
> > ACPI
> > RSDP.
> 
> I am no kexec expert but this confuses me. Shouldn't the second kernel
> have access to the EFI systab as the first kernel does? It includes
> many more pointers than just ACPI and DMI tables, and it would seem
> inconvenient to have to pass all these addresses individually
> explicitly.

Yes, in modern linux kernel, kexec has the support for EFI, I think it
should work naturally at least in x86_64.

Is there any test log with latest mainline kernel about this?

> 
> Adding Eric to Cc for his opinion.
> 
> > 
> > Signed-off-by: Andy Shevchenko 
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |  5 +
> >  drivers/firmware/dmi_scan.c | 14 ++
> >  2 files changed, 19 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index be2d6d0..94f219f 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -843,6 +843,11 @@
> > The filter can be disabled or changed to another
> > driver later using sysfs.
> >  
> > +   dmi_entry_point=[DMI,EFI,KEXEC]
> > +   Pass the DMI entry point to the kernel, mostly used
> > +   on machines running EFI runtime service to boot the
> > +   second kernel for kdump.
> > +
> > drm_kms_helper.edid_firmware=[:][,[:]]
> > Broken monitors, graphic adapters, KVMs and EDIDless
> > panels may send no or incorrect EDID data sets.
> > diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
> > index b88def6..215843f 100644
> > --- a/drivers/firmware/dmi_scan.c
> > +++ b/drivers/firmware/dmi_scan.c
> > @@ -595,8 +595,22 @@ static int __init dmi_smbios3_present(const u8 *buf)
> > return 1;
> >  }
> >  
> > +#ifdef CONFIG_KEXEC
> > +static unsigned long dmi_entry_point;
> > +static int __init setup_dmi_entry_point(char *arg)
> > +{
> > +   return kstrtoul(arg, 16, _entry_point);
> > +}
> > +early_param("dmi_entry_point", setup_dmi_entry_point);
> > +#endif
> > +
> >  static resource_size_t __init dmi_get_entry_point(void)
> >  {
> > +#ifdef CONFIG_KEXEC
> > +   if (dmi_entry_point)
> > +   return dmi_entry_point;
> > +#endif
> > +
> > if (efi_enabled(EFI_CONFIG_TABLES)) {
> > /*
> >  * According to the DMTF SMBIOS reference spec v3.0.0, it is
> 
> 
> -- 
> Jean Delvare
> SUSE L3 Support
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2 2/5] ia64: reuse append_elf_note() and final_note() functions

2016-11-30 Thread Dave Young

Hi Hari

Personally I like V1 more, but split the patch 2 is easier for ia64
people to reivew.  I did basic x86 testing, it runs ok.

On 11/25/16 at 05:24pm, Hari Bathini wrote:
> Get rid of multiple definitions of append_elf_note() & final_note()
> functions. Reuse these functions compiled under CONFIG_CRASH_CORE.
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/ia64/kernel/crash.c   |   22 --
>  include/linux/crash_core.h |4 
>  kernel/crash_core.c|6 +++---
>  kernel/kexec_core.c|   28 
>  4 files changed, 7 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c
> index 2955f35..75859a0 100644
> --- a/arch/ia64/kernel/crash.c
> +++ b/arch/ia64/kernel/crash.c
> @@ -27,28 +27,6 @@ static int kdump_freeze_monarch;
>  static int kdump_on_init = 1;
>  static int kdump_on_fatal_mca = 1;
>  
> -static inline Elf64_Word
> -*append_elf_note(Elf64_Word *buf, char *name, unsigned type, void *data,
> - size_t data_len)
> -{
> - struct elf_note *note = (struct elf_note *)buf;
> - note->n_namesz = strlen(name) + 1;
> - note->n_descsz = data_len;
> - note->n_type   = type;
> - buf += (sizeof(*note) + 3)/4;
> - memcpy(buf, name, note->n_namesz);
> - buf += (note->n_namesz + 3)/4;
> - memcpy(buf, data, data_len);
> - buf += (data_len + 3)/4;
> - return buf;
> -}
> -
> -static void
> -final_note(void *buf)
> -{
> - memset(buf, 0, sizeof(struct elf_note));
> -}
> -

The above IA64 version looks better than the functions in kexec_core.c
about the Elf64_Word type usage and the simpler final_note function.

Care to update crash_core.c to use this instead?

Otherwise I'm fine with the changes.

>  extern void ia64_dump_cpu_regs(void *);
>  
>  static DEFINE_PER_CPU(struct elf_prstatus, elf_prstatus);
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index 9a4f4b0..2ae20b1 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -61,6 +61,10 @@ extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
>  extern size_t vmcoreinfo_size;
>  extern size_t vmcoreinfo_max_size;
>  
> +u32 *append_elf_note(u32 *buf, char *name, unsigned int type,
> +  void *data, size_t data_len);
> +void final_note(u32 *buf);
> +
>  int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
>   unsigned long long *crash_size, unsigned long long *crash_base);
>  int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 60a98fc..9223976 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -291,8 +291,8 @@ int __init parse_crashkernel_low(char *cmdline,
>   "crashkernel=", suffix_tbl[SUFFIX_LOW]);
>  }
>  
> -static u32 *append_elf_note(u32 *buf, char *name, unsigned int type,
> - void *data, size_t data_len)
> +u32 *append_elf_note(u32 *buf, char *name, unsigned int type,
> +  void *data, size_t data_len)
>  {
>   struct elf_note note;
>  
> @@ -309,7 +309,7 @@ static u32 *append_elf_note(u32 *buf, char *name, 
> unsigned int type,
>   return buf;
>  }
>  
> -static void final_note(u32 *buf)
> +void final_note(u32 *buf)
>  {
>   struct elf_note note;
>  
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 3aa21f3..596cb32 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -988,34 +988,6 @@ int crash_shrink_memory(unsigned long new_size)
>   return ret;
>  }
>  
> -static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
> - size_t data_len)
> -{
> - struct elf_note note;
> -
> - note.n_namesz = strlen(name) + 1;
> - note.n_descsz = data_len;
> - note.n_type   = type;
> - memcpy(buf, , sizeof(note));
> - buf += (sizeof(note) + 3)/4;
> - memcpy(buf, name, note.n_namesz);
> - buf += (note.n_namesz + 3)/4;
> - memcpy(buf, data, note.n_descsz);
> - buf += (note.n_descsz + 3)/4;
> -
> - return buf;
> -}
> -
> -static void final_note(u32 *buf)
> -{
> - struct elf_note note;
> -
> - note.n_namesz = 0;
> - note.n_descsz = 0;
> - note.n_type   = 0;
> - memcpy(buf, , sizeof(note));
> -}
> -
>  void crash_save_cpu(struct pt_regs *regs, int cpu)
>  {
>   struct elf_prstatus prstatus;
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

Re: [PATCH v2 2/5] ia64: reuse append_elf_note() and final_note() functions

2016-11-30 Thread Dave Young

Hi Hari

Personally I like V1 more, but split the patch 2 is easier for ia64
people to reivew.  I did basic x86 testing, it runs ok.

On 11/25/16 at 05:24pm, Hari Bathini wrote:
> Get rid of multiple definitions of append_elf_note() & final_note()
> functions. Reuse these functions compiled under CONFIG_CRASH_CORE.
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/ia64/kernel/crash.c   |   22 --
>  include/linux/crash_core.h |4 
>  kernel/crash_core.c|6 +++---
>  kernel/kexec_core.c|   28 
>  4 files changed, 7 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c
> index 2955f35..75859a0 100644
> --- a/arch/ia64/kernel/crash.c
> +++ b/arch/ia64/kernel/crash.c
> @@ -27,28 +27,6 @@ static int kdump_freeze_monarch;
>  static int kdump_on_init = 1;
>  static int kdump_on_fatal_mca = 1;
>  
> -static inline Elf64_Word
> -*append_elf_note(Elf64_Word *buf, char *name, unsigned type, void *data,
> - size_t data_len)
> -{
> - struct elf_note *note = (struct elf_note *)buf;
> - note->n_namesz = strlen(name) + 1;
> - note->n_descsz = data_len;
> - note->n_type   = type;
> - buf += (sizeof(*note) + 3)/4;
> - memcpy(buf, name, note->n_namesz);
> - buf += (note->n_namesz + 3)/4;
> - memcpy(buf, data, data_len);
> - buf += (data_len + 3)/4;
> - return buf;
> -}
> -
> -static void
> -final_note(void *buf)
> -{
> - memset(buf, 0, sizeof(struct elf_note));
> -}
> -

The above IA64 version looks better than the functions in kexec_core.c
about the Elf64_Word type usage and the simpler final_note function.

Care to update crash_core.c to use this instead?

Otherwise I'm fine with the changes.

>  extern void ia64_dump_cpu_regs(void *);
>  
>  static DEFINE_PER_CPU(struct elf_prstatus, elf_prstatus);
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index 9a4f4b0..2ae20b1 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -61,6 +61,10 @@ extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
>  extern size_t vmcoreinfo_size;
>  extern size_t vmcoreinfo_max_size;
>  
> +u32 *append_elf_note(u32 *buf, char *name, unsigned int type,
> +  void *data, size_t data_len);
> +void final_note(u32 *buf);
> +
>  int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
>   unsigned long long *crash_size, unsigned long long *crash_base);
>  int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 60a98fc..9223976 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -291,8 +291,8 @@ int __init parse_crashkernel_low(char *cmdline,
>   "crashkernel=", suffix_tbl[SUFFIX_LOW]);
>  }
>  
> -static u32 *append_elf_note(u32 *buf, char *name, unsigned int type,
> - void *data, size_t data_len)
> +u32 *append_elf_note(u32 *buf, char *name, unsigned int type,
> +  void *data, size_t data_len)
>  {
>   struct elf_note note;
>  
> @@ -309,7 +309,7 @@ static u32 *append_elf_note(u32 *buf, char *name, 
> unsigned int type,
>   return buf;
>  }
>  
> -static void final_note(u32 *buf)
> +void final_note(u32 *buf)
>  {
>   struct elf_note note;
>  
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 3aa21f3..596cb32 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -988,34 +988,6 @@ int crash_shrink_memory(unsigned long new_size)
>   return ret;
>  }
>  
> -static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
> - size_t data_len)
> -{
> - struct elf_note note;
> -
> - note.n_namesz = strlen(name) + 1;
> - note.n_descsz = data_len;
> - note.n_type   = type;
> - memcpy(buf, , sizeof(note));
> - buf += (sizeof(note) + 3)/4;
> - memcpy(buf, name, note.n_namesz);
> - buf += (note.n_namesz + 3)/4;
> - memcpy(buf, data, note.n_descsz);
> - buf += (note.n_descsz + 3)/4;
> -
> - return buf;
> -}
> -
> -static void final_note(u32 *buf)
> -{
> - struct elf_note note;
> -
> - note.n_namesz = 0;
> - note.n_descsz = 0;
> - note.n_type   = 0;
> - memcpy(buf, , sizeof(note));
> -}
> -
>  void crash_save_cpu(struct pt_regs *regs, int cpu)
>  {
>   struct elf_prstatus prstatus;
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

Re: [PATCHv4 07/10] kexec: Switch to __pa_symbol

2016-11-30 Thread Dave Young

On 11/30/16 at 09:13pm, Eric W. Biederman wrote:
> Dave Young <dyo...@redhat.com> writes:
> 
> > Hi, Laura
> > On 11/29/16 at 10:55am, Laura Abbott wrote:
> >> 
> >> __pa_symbol is the correct api to get the physical address of kernel
> >> symbols. Switch to it to allow for better debug checking.
> >> 
> >
> > I assume __pa_symbol is faster than __pa, but it still need some testing
> > on all arches which support kexec.
> >
> > But seems long long ago there is a commit e3ebadd95cb in the commit log
> > I see below from:
> > "we should deprecate __pa_symbol(), and preferably __pa() too - and
> >  just use "virt_to_phys()" instead, which is is more readable and has
> >  nicer semantics."
> >
> > But maybe in modern code __pa_symbol is prefered I may miss background.
> > virt_to_phys still sounds more readable now for me though.
> 
> There has been a lot of history with the various definitions.
> __pa_symbol used to be x86 specific.
> 
> Now what we have is that __pa_symbol is just __pa(RELOC_HIDE(x));
> 
> Now arguably that whole reloc hide thing should happen by architectures
> having a non-inline version of __pa as was done in the commit you
> mention.  But at this point there appears to be nothing wrong with
> changing a __pa to a __pa_symbol it might make things a tad more
> reliable depending on the implementation of __pa.

Then it is safe and reasonable, thanks for the clarification. 

> 
> Acked-by: "Eric W. Biederman" <ebied...@xmission.com>
> 
> 
> Eric
> 
> >> Signed-off-by: Laura Abbott <labb...@redhat.com>
> >> ---
> >> Found during review of the kernel. Untested.
> >> ---
> >>  kernel/kexec_core.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> >> index 5616755..e1b625e 100644
> >> --- a/kernel/kexec_core.c
> >> +++ b/kernel/kexec_core.c
> >> @@ -1397,7 +1397,7 @@ void __weak arch_crash_save_vmcoreinfo(void)
> >>  
> >>  phys_addr_t __weak paddr_vmcoreinfo_note(void)
> >>  {
> >> -  return __pa((unsigned long)(char *)_note);
> >> +  return __pa_symbol((unsigned long)(char *)_note);
> >>  }
> >>  
> >>  static int __init crash_save_vmcoreinfo_init(void)
> >> -- 
> >> 2.7.4
> >> 
> >> 
> >> ___
> >> kexec mailing list
> >> ke...@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec
> >
> > Thanks
> > Dave

Re: [PATCHv4 07/10] kexec: Switch to __pa_symbol

2016-11-30 Thread Dave Young

On 11/30/16 at 09:13pm, Eric W. Biederman wrote:
> Dave Young  writes:
> 
> > Hi, Laura
> > On 11/29/16 at 10:55am, Laura Abbott wrote:
> >> 
> >> __pa_symbol is the correct api to get the physical address of kernel
> >> symbols. Switch to it to allow for better debug checking.
> >> 
> >
> > I assume __pa_symbol is faster than __pa, but it still need some testing
> > on all arches which support kexec.
> >
> > But seems long long ago there is a commit e3ebadd95cb in the commit log
> > I see below from:
> > "we should deprecate __pa_symbol(), and preferably __pa() too - and
> >  just use "virt_to_phys()" instead, which is is more readable and has
> >  nicer semantics."
> >
> > But maybe in modern code __pa_symbol is prefered I may miss background.
> > virt_to_phys still sounds more readable now for me though.
> 
> There has been a lot of history with the various definitions.
> __pa_symbol used to be x86 specific.
> 
> Now what we have is that __pa_symbol is just __pa(RELOC_HIDE(x));
> 
> Now arguably that whole reloc hide thing should happen by architectures
> having a non-inline version of __pa as was done in the commit you
> mention.  But at this point there appears to be nothing wrong with
> changing a __pa to a __pa_symbol it might make things a tad more
> reliable depending on the implementation of __pa.

Then it is safe and reasonable, thanks for the clarification. 

> 
> Acked-by: "Eric W. Biederman" 
> 
> 
> Eric
> 
> >> Signed-off-by: Laura Abbott 
> >> ---
> >> Found during review of the kernel. Untested.
> >> ---
> >>  kernel/kexec_core.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> >> index 5616755..e1b625e 100644
> >> --- a/kernel/kexec_core.c
> >> +++ b/kernel/kexec_core.c
> >> @@ -1397,7 +1397,7 @@ void __weak arch_crash_save_vmcoreinfo(void)
> >>  
> >>  phys_addr_t __weak paddr_vmcoreinfo_note(void)
> >>  {
> >> -  return __pa((unsigned long)(char *)_note);
> >> +  return __pa_symbol((unsigned long)(char *)_note);
> >>  }
> >>  
> >>  static int __init crash_save_vmcoreinfo_init(void)
> >> -- 
> >> 2.7.4
> >> 
> >> 
> >> ___
> >> kexec mailing list
> >> ke...@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec
> >
> > Thanks
> > Dave

Re: [PATCHv4 07/10] kexec: Switch to __pa_symbol

2016-11-30 Thread Dave Young

Hi, Laura
On 11/29/16 at 10:55am, Laura Abbott wrote:
> 
> __pa_symbol is the correct api to get the physical address of kernel
> symbols. Switch to it to allow for better debug checking.
> 

I assume __pa_symbol is faster than __pa, but it still need some testing
on all arches which support kexec.

But seems long long ago there is a commit e3ebadd95cb in the commit log
I see below from:
"we should deprecate __pa_symbol(), and preferably __pa() too - and
 just use "virt_to_phys()" instead, which is is more readable and has
 nicer semantics."

But maybe in modern code __pa_symbol is prefered I may miss background.
virt_to_phys still sounds more readable now for me though.

> Signed-off-by: Laura Abbott 
> ---
> Found during review of the kernel. Untested.
> ---
>  kernel/kexec_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..e1b625e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1397,7 +1397,7 @@ void __weak arch_crash_save_vmcoreinfo(void)
>  
>  phys_addr_t __weak paddr_vmcoreinfo_note(void)
>  {
> - return __pa((unsigned long)(char *)_note);
> + return __pa_symbol((unsigned long)(char *)_note);
>  }
>  
>  static int __init crash_save_vmcoreinfo_init(void)
> -- 
> 2.7.4
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

Re: [PATCHv4 07/10] kexec: Switch to __pa_symbol

2016-11-30 Thread Dave Young

Hi, Laura
On 11/29/16 at 10:55am, Laura Abbott wrote:
> 
> __pa_symbol is the correct api to get the physical address of kernel
> symbols. Switch to it to allow for better debug checking.
> 

I assume __pa_symbol is faster than __pa, but it still need some testing
on all arches which support kexec.

But seems long long ago there is a commit e3ebadd95cb in the commit log
I see below from:
"we should deprecate __pa_symbol(), and preferably __pa() too - and
 just use "virt_to_phys()" instead, which is is more readable and has
 nicer semantics."

But maybe in modern code __pa_symbol is prefered I may miss background.
virt_to_phys still sounds more readable now for me though.

> Signed-off-by: Laura Abbott 
> ---
> Found during review of the kernel. Untested.
> ---
>  kernel/kexec_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..e1b625e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1397,7 +1397,7 @@ void __weak arch_crash_save_vmcoreinfo(void)
>  
>  phys_addr_t __weak paddr_vmcoreinfo_note(void)
>  {
> - return __pa((unsigned long)(char *)_note);
> + return __pa_symbol((unsigned long)(char *)_note);
>  }
>  
>  static int __init crash_save_vmcoreinfo_init(void)
> -- 
> 2.7.4
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-23 Thread Dave Young

On 11/21/16 at 09:49pm, Thiago Jung Bauermann wrote:
> Hello Dave,
> 
> Thanks for your review.
> 
> Am Sonntag, 20. November 2016, 10:45:46 BRST schrieb Dave Young:
> > On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> > > powerpc's purgatory.ro has 12 relocation types when built as
> > > a relocatable object. To implement support for them requires
> > > arch_kexec_apply_relocations_add to duplicate a lot of code with
> > > module_64.c:apply_relocate_add.
> > > 
> > > When built as a Position Independent Executable there are only 4
> > > relocation types in purgatory.ro, so it becomes practical for the powerpc
> > > implementation of kexec_file to have its own relocation implementation.
> > > 
> > > Also, the purgatory is an executable and not an intermediary output from
> > > the compiler so it makes sense conceptually that it is easier to build
> > > it as a PIE than as a partially linked object.
> > > 
> > > Apart from the greatly reduced number of relocations, there are two
> > > differences between a relocatable object and a PIE:
> > > 
> > > 1. __kexec_load_purgatory needs to use the program headers rather than the
> > > 
> > >section headers to figure out how to load the binary.
> > > 
> > > 2. Symbol values are absolute addresses instead of relative to the
> > > 
> > >start of the section.
> > > 
> > > This patch adds the support needed in generic code for the differences
> > > above and allows powerpc to load and relocate a position independent
> > > purgatory.
> > 
> > [snip]
> > 
> > The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
> > not that complex. So could you look into simplify your kexec_file
> > implementation?
> 
> I can try, but there is one fundamental issue here: powerpc 
> position-dependent 
> code relies more on relocations than x86 position-dependent code does, so 
> there's a limit to how simple it can be made without switching to position-
> independent code. And it will always be more involved than it is on x86.
> 
> BTW, building x86's purgatory as PIE results in it not having any relocation 
> at all, so it's an advantage even in that architecture. Unfortunately, the 
> machine locks up during reboot and I didn't have time to try to figure out 
> what's going on.
> 
> > kernel/kexec_file.c kexec_apply_relocations only do limited things
> > and some of the logic is in arch/x86, so move general code out of arch
> > code, then I guess the arch code will be simpler
> 
> I agree that is a good idea. Is the patch below what you had in mind?
> 
> > and then we probably do not need this PIE stuff anymore.
> 
> If you are ok with the patch below I can post a new version of the series 
> based on it and we can see if Michael Ellerman thinks it is enough.
> 
> > BTW, __kexec_really_load_purgatory looks worse than
> > ___kexec_load_purgatory ;)
> 
> Really? I find the special handling of bss makes the section-based loader a 
> bit more confusing.
> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> Subject: [PATCH] kexec_file: Move generic relocation code from arch/x86 to
>  kernel/kexec_file.c
> 
> The check for undefined symbols stays in arch-specific code because
> powerpc needs to allow TOC symbols to be processed even though they're
> undefined.
> 
> There is no functional change.
> 
> Suggested-by: Dave Young <dyo...@redhat.com>
> Signed-off-by: Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com>
> ---
>  arch/x86/kernel/machine_kexec_64.c | 160 
> +++--
>  include/linux/kexec.h  |   9 ++-
>  kernel/kexec_file.c| 120 +++-
>  3 files changed, 154 insertions(+), 135 deletions(-)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 8c1f218926d7..f4860c408ece 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -401,143 +401,45 @@ int arch_kexec_kernel_verify_sig(struct kimage *image, 
> void *kernel,
>  }
>  #endif
>  
> -/*
> - * Apply purgatory relocations.
> - *
> - * ehdr: Pointer to elf headers
> - * sechdrs: Pointer to section headers.
> - * relsec: section index of SHT_RELA section.
> - *
> - * TODO: Some of the code belongs to generic code. Move that in kexec.c.
> - */
> -int arch_kexec_apply_relocations_add(const Elf64_Ehdr *ehdr,
> -  Elf64_Shdr *sechdrs, unsigned int relse

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-23 Thread Dave Young

On 11/21/16 at 09:49pm, Thiago Jung Bauermann wrote:
> Hello Dave,
> 
> Thanks for your review.
> 
> Am Sonntag, 20. November 2016, 10:45:46 BRST schrieb Dave Young:
> > On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> > > powerpc's purgatory.ro has 12 relocation types when built as
> > > a relocatable object. To implement support for them requires
> > > arch_kexec_apply_relocations_add to duplicate a lot of code with
> > > module_64.c:apply_relocate_add.
> > > 
> > > When built as a Position Independent Executable there are only 4
> > > relocation types in purgatory.ro, so it becomes practical for the powerpc
> > > implementation of kexec_file to have its own relocation implementation.
> > > 
> > > Also, the purgatory is an executable and not an intermediary output from
> > > the compiler so it makes sense conceptually that it is easier to build
> > > it as a PIE than as a partially linked object.
> > > 
> > > Apart from the greatly reduced number of relocations, there are two
> > > differences between a relocatable object and a PIE:
> > > 
> > > 1. __kexec_load_purgatory needs to use the program headers rather than the
> > > 
> > >section headers to figure out how to load the binary.
> > > 
> > > 2. Symbol values are absolute addresses instead of relative to the
> > > 
> > >start of the section.
> > > 
> > > This patch adds the support needed in generic code for the differences
> > > above and allows powerpc to load and relocate a position independent
> > > purgatory.
> > 
> > [snip]
> > 
> > The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
> > not that complex. So could you look into simplify your kexec_file
> > implementation?
> 
> I can try, but there is one fundamental issue here: powerpc 
> position-dependent 
> code relies more on relocations than x86 position-dependent code does, so 
> there's a limit to how simple it can be made without switching to position-
> independent code. And it will always be more involved than it is on x86.
> 
> BTW, building x86's purgatory as PIE results in it not having any relocation 
> at all, so it's an advantage even in that architecture. Unfortunately, the 
> machine locks up during reboot and I didn't have time to try to figure out 
> what's going on.
> 
> > kernel/kexec_file.c kexec_apply_relocations only do limited things
> > and some of the logic is in arch/x86, so move general code out of arch
> > code, then I guess the arch code will be simpler
> 
> I agree that is a good idea. Is the patch below what you had in mind?
> 
> > and then we probably do not need this PIE stuff anymore.
> 
> If you are ok with the patch below I can post a new version of the series 
> based on it and we can see if Michael Ellerman thinks it is enough.
> 
> > BTW, __kexec_really_load_purgatory looks worse than
> > ___kexec_load_purgatory ;)
> 
> Really? I find the special handling of bss makes the section-based loader a 
> bit more confusing.
> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> Subject: [PATCH] kexec_file: Move generic relocation code from arch/x86 to
>  kernel/kexec_file.c
> 
> The check for undefined symbols stays in arch-specific code because
> powerpc needs to allow TOC symbols to be processed even though they're
> undefined.
> 
> There is no functional change.
> 
> Suggested-by: Dave Young 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/x86/kernel/machine_kexec_64.c | 160 
> +++--
>  include/linux/kexec.h  |   9 ++-
>  kernel/kexec_file.c| 120 +++-
>  3 files changed, 154 insertions(+), 135 deletions(-)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 8c1f218926d7..f4860c408ece 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -401,143 +401,45 @@ int arch_kexec_kernel_verify_sig(struct kimage *image, 
> void *kernel,
>  }
>  #endif
>  
> -/*
> - * Apply purgatory relocations.
> - *
> - * ehdr: Pointer to elf headers
> - * sechdrs: Pointer to section headers.
> - * relsec: section index of SHT_RELA section.
> - *
> - * TODO: Some of the code belongs to generic code. Move that in kexec.c.
> - */
> -int arch_kexec_apply_relocations_add(const Elf64_Ehdr *ehdr,
> -  Elf64_Shdr *sechdrs, unsigned int relsec)
> +int arch_kexec_apply_relocation_add(const Elf_Ehdr

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-22 Thread Dave Young

On 11/22/16 at 11:44am, Thiago Jung Bauermann wrote:
> Am Dienstag, 22. November 2016, 17:01:10 BRST schrieb Michael Ellerman:
> > Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com> writes:
> > > Am Sonntag, 20. November 2016, 10:45:46 BRST schrieb Dave Young:
> > >> On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> > >> > powerpc's purgatory.ro has 12 relocation types when built as
> > >> > a relocatable object. To implement support for them requires
> > >> > arch_kexec_apply_relocations_add to duplicate a lot of code with
> > >> > module_64.c:apply_relocate_add.
> > >> > 
> > >> > When built as a Position Independent Executable there are only 4
> > >> > relocation types in purgatory.ro, so it becomes practical for the
> > >> > powerpc
> > >> > implementation of kexec_file to have its own relocation implementation.
> > >> > 
> > >> > Also, the purgatory is an executable and not an intermediary output
> > >> > from
> > >> > the compiler so it makes sense conceptually that it is easier to build
> > >> > it as a PIE than as a partially linked object.
> > >> > 
> > >> > Apart from the greatly reduced number of relocations, there are two
> > >> > differences between a relocatable object and a PIE:
> > >> > 
> > >> > 1. __kexec_load_purgatory needs to use the program headers rather than
> > >> > the
> > >> > 
> > >> >section headers to figure out how to load the binary.
> > >> > 
> > >> > 2. Symbol values are absolute addresses instead of relative to the
> > >> > 
> > >> >start of the section.
> > >> > 
> > >> > This patch adds the support needed in generic code for the differences
> > >> > above and allows powerpc to load and relocate a position independent
> > >> > purgatory.
> > >> 
> > >> [snip]
> > >> 
> > >> The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
> > >> not that complex. So could you look into simplify your kexec_file
> > >> implementation?
> > > 
> > > I can try, but there is one fundamental issue here: powerpc
> > > position-dependent code relies more on relocations than x86
> > > position-dependent code does, so there's a limit to how simple it can be
> > > made without switching to position- independent code. And it will always
> > > be more involved than it is on x86.
> > I think we need to go back to the drawing board on this one.
> > 
> > My hope was that building purgatory as PIE would reduce the amount of
> > complexity, but instead it's just added more. Sorry for sending you in
> > that direction.
> 
> It added complexity because in my series powerpc was using a PIE purgatory 
> but 
> x86 kept using a partially-linked object (because of the problem I mentioned 
> I 
> had when trying out a PIE x86 purgatory), so generic code needed two 
> purgatory 
> loaders.
> 
> I'll see if I can make the PIE x86 purgatory to work so that generic code can 
> have only one loader implementation. Then it will indeed be simpler.

Do we really need the PIE purgatory, after moving generic code out of
x86, there will be no much benefit, no? Anyway, the first step should be
making the purgatory code more generic so that it can be easier for
other arches to support kexec_file in the future. 

> 
> 
> Am Dienstag, 22. November 2016, 14:16:22 BRST schrieb Dave Young:
> > Hi Michael
> > 
> > On 11/22/16 at 05:01pm, Michael Ellerman wrote:
> > > In general I dislike the level of complexity of the kexec-tools
> > > purgatory, and in particular I'm not comfortable with things like:
> > > 
> > > diff --git a/arch/powerpc/purgatory/sha256.c
> > > b/arch/powerpc/purgatory/sha256.c new file mode 100644
> > > index ..6abee1877d56
> > > --- /dev/null
> > > +++ b/arch/powerpc/purgatory/sha256.c
> > > @@ -0,0 +1,6 @@
> > > +#include "../boot/string.h"
> > > +
> > > +/* Avoid including x86's boot/string.h in sha256.c. */
> > > +#define BOOT_STRING_H
> > > +
> > > +#include "../../x86/purgatory/sha256.c"
> > 
> > Agreed, include x86 code in powerpc looks bad
> > 
> > > I think the best way to get this over the line would be to take the
> > > kexec-lite purgatory implementation and use that to begin with. I know
> > > it doesn't have all the features of the kexec-tools version, but it
> > > should work, and we can look at adding the extra features later.
> > 
> > Instead of adding other implementation, moving the purgatory sha256 code
> > out of x86 sounds better so that we can reuse them cleanly..
> 
> Do you have a suggestion of where that code can live so that it can be shared 
> between purgatories for different arches?

Maybe it is better to stay in lib/purgatory/

> 
> Do we need a purgatory with generic and arch-specific code like in kexec-
> tools?

Yes, if we have more arches to add kexec_file, this should be
necessary..

> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 

Thanks
Dave

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-22 Thread Dave Young

On 11/22/16 at 11:44am, Thiago Jung Bauermann wrote:
> Am Dienstag, 22. November 2016, 17:01:10 BRST schrieb Michael Ellerman:
> > Thiago Jung Bauermann  writes:
> > > Am Sonntag, 20. November 2016, 10:45:46 BRST schrieb Dave Young:
> > >> On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> > >> > powerpc's purgatory.ro has 12 relocation types when built as
> > >> > a relocatable object. To implement support for them requires
> > >> > arch_kexec_apply_relocations_add to duplicate a lot of code with
> > >> > module_64.c:apply_relocate_add.
> > >> > 
> > >> > When built as a Position Independent Executable there are only 4
> > >> > relocation types in purgatory.ro, so it becomes practical for the
> > >> > powerpc
> > >> > implementation of kexec_file to have its own relocation implementation.
> > >> > 
> > >> > Also, the purgatory is an executable and not an intermediary output
> > >> > from
> > >> > the compiler so it makes sense conceptually that it is easier to build
> > >> > it as a PIE than as a partially linked object.
> > >> > 
> > >> > Apart from the greatly reduced number of relocations, there are two
> > >> > differences between a relocatable object and a PIE:
> > >> > 
> > >> > 1. __kexec_load_purgatory needs to use the program headers rather than
> > >> > the
> > >> > 
> > >> >section headers to figure out how to load the binary.
> > >> > 
> > >> > 2. Symbol values are absolute addresses instead of relative to the
> > >> > 
> > >> >start of the section.
> > >> > 
> > >> > This patch adds the support needed in generic code for the differences
> > >> > above and allows powerpc to load and relocate a position independent
> > >> > purgatory.
> > >> 
> > >> [snip]
> > >> 
> > >> The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
> > >> not that complex. So could you look into simplify your kexec_file
> > >> implementation?
> > > 
> > > I can try, but there is one fundamental issue here: powerpc
> > > position-dependent code relies more on relocations than x86
> > > position-dependent code does, so there's a limit to how simple it can be
> > > made without switching to position- independent code. And it will always
> > > be more involved than it is on x86.
> > I think we need to go back to the drawing board on this one.
> > 
> > My hope was that building purgatory as PIE would reduce the amount of
> > complexity, but instead it's just added more. Sorry for sending you in
> > that direction.
> 
> It added complexity because in my series powerpc was using a PIE purgatory 
> but 
> x86 kept using a partially-linked object (because of the problem I mentioned 
> I 
> had when trying out a PIE x86 purgatory), so generic code needed two 
> purgatory 
> loaders.
> 
> I'll see if I can make the PIE x86 purgatory to work so that generic code can 
> have only one loader implementation. Then it will indeed be simpler.

Do we really need the PIE purgatory, after moving generic code out of
x86, there will be no much benefit, no? Anyway, the first step should be
making the purgatory code more generic so that it can be easier for
other arches to support kexec_file in the future. 

> 
> 
> Am Dienstag, 22. November 2016, 14:16:22 BRST schrieb Dave Young:
> > Hi Michael
> > 
> > On 11/22/16 at 05:01pm, Michael Ellerman wrote:
> > > In general I dislike the level of complexity of the kexec-tools
> > > purgatory, and in particular I'm not comfortable with things like:
> > > 
> > > diff --git a/arch/powerpc/purgatory/sha256.c
> > > b/arch/powerpc/purgatory/sha256.c new file mode 100644
> > > index ..6abee1877d56
> > > --- /dev/null
> > > +++ b/arch/powerpc/purgatory/sha256.c
> > > @@ -0,0 +1,6 @@
> > > +#include "../boot/string.h"
> > > +
> > > +/* Avoid including x86's boot/string.h in sha256.c. */
> > > +#define BOOT_STRING_H
> > > +
> > > +#include "../../x86/purgatory/sha256.c"
> > 
> > Agreed, include x86 code in powerpc looks bad
> > 
> > > I think the best way to get this over the line would be to take the
> > > kexec-lite purgatory implementation and use that to begin with. I know
> > > it doesn't have all the features of the kexec-tools version, but it
> > > should work, and we can look at adding the extra features later.
> > 
> > Instead of adding other implementation, moving the purgatory sha256 code
> > out of x86 sounds better so that we can reuse them cleanly..
> 
> Do you have a suggestion of where that code can live so that it can be shared 
> between purgatories for different arches?

Maybe it is better to stay in lib/purgatory/

> 
> Do we need a purgatory with generic and arch-specific code like in kexec-
> tools?

Yes, if we have more arches to add kexec_file, this should be
necessary..

> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 

Thanks
Dave

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-21 Thread Dave Young

Hi Michael
On 11/22/16 at 05:01pm, Michael Ellerman wrote:
> Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com> writes:
> > Am Sonntag, 20. November 2016, 10:45:46 BRST schrieb Dave Young:
> >> On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> >> > powerpc's purgatory.ro has 12 relocation types when built as
> >> > a relocatable object. To implement support for them requires
> >> > arch_kexec_apply_relocations_add to duplicate a lot of code with
> >> > module_64.c:apply_relocate_add.
> >> > 
> >> > When built as a Position Independent Executable there are only 4
> >> > relocation types in purgatory.ro, so it becomes practical for the powerpc
> >> > implementation of kexec_file to have its own relocation implementation.
> >> > 
> >> > Also, the purgatory is an executable and not an intermediary output from
> >> > the compiler so it makes sense conceptually that it is easier to build
> >> > it as a PIE than as a partially linked object.
> >> > 
> >> > Apart from the greatly reduced number of relocations, there are two
> >> > differences between a relocatable object and a PIE:
> >> > 
> >> > 1. __kexec_load_purgatory needs to use the program headers rather than 
> >> > the
> >> > 
> >> >section headers to figure out how to load the binary.
> >> > 
> >> > 2. Symbol values are absolute addresses instead of relative to the
> >> > 
> >> >start of the section.
> >> > 
> >> > This patch adds the support needed in generic code for the differences
> >> > above and allows powerpc to load and relocate a position independent
> >> > purgatory.
> >> 
> >> [snip]
> >> 
> >> The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
> >> not that complex. So could you look into simplify your kexec_file
> >> implementation?
> >
> > I can try, but there is one fundamental issue here: powerpc 
> > position-dependent 
> > code relies more on relocations than x86 position-dependent code does, so 
> > there's a limit to how simple it can be made without switching to position-
> > independent code. And it will always be more involved than it is on x86.
> 
> I think we need to go back to the drawing board on this one.
> 
> My hope was that building purgatory as PIE would reduce the amount of
> complexity, but instead it's just added more. Sorry for sending you in
> that direction.
> 
> 
> In general I dislike the level of complexity of the kexec-tools
> purgatory, and in particular I'm not comfortable with things like:
> 
> diff --git a/arch/powerpc/purgatory/sha256.c b/arch/powerpc/purgatory/sha256.c
> new file mode 100644
> index ..6abee1877d56
> --- /dev/null
> +++ b/arch/powerpc/purgatory/sha256.c
> @@ -0,0 +1,6 @@
> +#include "../boot/string.h"
> +
> +/* Avoid including x86's boot/string.h in sha256.c. */
> +#define BOOT_STRING_H
> +
> +#include "../../x86/purgatory/sha256.c"
> 

Agreed, include x86 code in powerpc looks bad

> 
> I think the best way to get this over the line would be to take the
> kexec-lite purgatory implementation and use that to begin with. I know
> it doesn't have all the features of the kexec-tools version, but it
> should work, and we can look at adding the extra features later.

Instead of adding other implementation, moving the purgatory sha256 code
out of x86 sounds better so that we can reuse them cleanly..

> 
> I'll try and get that working tonight.
> 
> cheers

Thanks
Dave

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-21 Thread Dave Young

Hi Michael
On 11/22/16 at 05:01pm, Michael Ellerman wrote:
> Thiago Jung Bauermann  writes:
> > Am Sonntag, 20. November 2016, 10:45:46 BRST schrieb Dave Young:
> >> On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> >> > powerpc's purgatory.ro has 12 relocation types when built as
> >> > a relocatable object. To implement support for them requires
> >> > arch_kexec_apply_relocations_add to duplicate a lot of code with
> >> > module_64.c:apply_relocate_add.
> >> > 
> >> > When built as a Position Independent Executable there are only 4
> >> > relocation types in purgatory.ro, so it becomes practical for the powerpc
> >> > implementation of kexec_file to have its own relocation implementation.
> >> > 
> >> > Also, the purgatory is an executable and not an intermediary output from
> >> > the compiler so it makes sense conceptually that it is easier to build
> >> > it as a PIE than as a partially linked object.
> >> > 
> >> > Apart from the greatly reduced number of relocations, there are two
> >> > differences between a relocatable object and a PIE:
> >> > 
> >> > 1. __kexec_load_purgatory needs to use the program headers rather than 
> >> > the
> >> > 
> >> >section headers to figure out how to load the binary.
> >> > 
> >> > 2. Symbol values are absolute addresses instead of relative to the
> >> > 
> >> >start of the section.
> >> > 
> >> > This patch adds the support needed in generic code for the differences
> >> > above and allows powerpc to load and relocate a position independent
> >> > purgatory.
> >> 
> >> [snip]
> >> 
> >> The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
> >> not that complex. So could you look into simplify your kexec_file
> >> implementation?
> >
> > I can try, but there is one fundamental issue here: powerpc 
> > position-dependent 
> > code relies more on relocations than x86 position-dependent code does, so 
> > there's a limit to how simple it can be made without switching to position-
> > independent code. And it will always be more involved than it is on x86.
> 
> I think we need to go back to the drawing board on this one.
> 
> My hope was that building purgatory as PIE would reduce the amount of
> complexity, but instead it's just added more. Sorry for sending you in
> that direction.
> 
> 
> In general I dislike the level of complexity of the kexec-tools
> purgatory, and in particular I'm not comfortable with things like:
> 
> diff --git a/arch/powerpc/purgatory/sha256.c b/arch/powerpc/purgatory/sha256.c
> new file mode 100644
> index ..6abee1877d56
> --- /dev/null
> +++ b/arch/powerpc/purgatory/sha256.c
> @@ -0,0 +1,6 @@
> +#include "../boot/string.h"
> +
> +/* Avoid including x86's boot/string.h in sha256.c. */
> +#define BOOT_STRING_H
> +
> +#include "../../x86/purgatory/sha256.c"
> 

Agreed, include x86 code in powerpc looks bad

> 
> I think the best way to get this over the line would be to take the
> kexec-lite purgatory implementation and use that to begin with. I know
> it doesn't have all the features of the kexec-tools version, but it
> should work, and we can look at adding the extra features later.

Instead of adding other implementation, moving the purgatory sha256 code
out of x86 sounds better so that we can reuse them cleanly..

> 
> I'll try and get that working tonight.
> 
> cheers

Thanks
Dave

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-21 Thread Dave Young

On 11/21/16 at 09:49pm, Thiago Jung Bauermann wrote:
> Hello Dave,
> 
> Thanks for your review.
> 
> Am Sonntag, 20. November 2016, 10:45:46 BRST schrieb Dave Young:
> > On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> > > powerpc's purgatory.ro has 12 relocation types when built as
> > > a relocatable object. To implement support for them requires
> > > arch_kexec_apply_relocations_add to duplicate a lot of code with
> > > module_64.c:apply_relocate_add.
> > > 
> > > When built as a Position Independent Executable there are only 4
> > > relocation types in purgatory.ro, so it becomes practical for the powerpc
> > > implementation of kexec_file to have its own relocation implementation.
> > > 
> > > Also, the purgatory is an executable and not an intermediary output from
> > > the compiler so it makes sense conceptually that it is easier to build
> > > it as a PIE than as a partially linked object.
> > > 
> > > Apart from the greatly reduced number of relocations, there are two
> > > differences between a relocatable object and a PIE:
> > > 
> > > 1. __kexec_load_purgatory needs to use the program headers rather than the
> > > 
> > >section headers to figure out how to load the binary.
> > > 
> > > 2. Symbol values are absolute addresses instead of relative to the
> > > 
> > >start of the section.
> > > 
> > > This patch adds the support needed in generic code for the differences
> > > above and allows powerpc to load and relocate a position independent
> > > purgatory.
> > 
> > [snip]
> > 
> > The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
> > not that complex. So could you look into simplify your kexec_file
> > implementation?
> 
> I can try, but there is one fundamental issue here: powerpc 
> position-dependent 
> code relies more on relocations than x86 position-dependent code does, so 
> there's a limit to how simple it can be made without switching to position-
> independent code. And it will always be more involved than it is on x86.
> 
> BTW, building x86's purgatory as PIE results in it not having any relocation 
> at all, so it's an advantage even in that architecture. Unfortunately, the 
> machine locks up during reboot and I didn't have time to try to figure out 
> what's going on.
> 
> > kernel/kexec_file.c kexec_apply_relocations only do limited things
> > and some of the logic is in arch/x86, so move general code out of arch
> > code, then I guess the arch code will be simpler
> 
> I agree that is a good idea. Is the patch below what you had in mind?
> 
> > and then we probably do not need this PIE stuff anymore.
> 
> If you are ok with the patch below I can post a new version of the series 
> based on it and we can see if Michael Ellerman thinks it is enough.
> 

Will review it and do a test. Thanks. I believe this will benefit for
other arches if they want a kexec_file in the future.

> > BTW, __kexec_really_load_purgatory looks worse than
> > ___kexec_load_purgatory ;)
> 
> Really? I find the special handling of bss makes the section-based loader a 
> bit more confusing.

I'm not sure I understand above about "*bss*", personally I like
___kexec_load_purgatory more. But anyway if we move arch code as general
code then it will be not necessary anymore..

> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> Subject: [PATCH] kexec_file: Move generic relocation code from arch/x86 to
>  kernel/kexec_file.c
> 
> The check for undefined symbols stays in arch-specific code because
> powerpc needs to allow TOC symbols to be processed even though they're
> undefined.
> 
> There is no functional change.
> 
> Suggested-by: Dave Young <dyo...@redhat.com>
> Signed-off-by: Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com>
> ---
>  arch/x86/kernel/machine_kexec_64.c | 160 
> +++--
>  include/linux/kexec.h  |   9 ++-
>  kernel/kexec_file.c| 120 +++-
>  3 files changed, 154 insertions(+), 135 deletions(-)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 8c1f218926d7..f4860c408ece 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -401,143 +401,45 @@ int arch_kexec_kernel_verify_sig(struct kimage *image, 
> void *kernel,
>  }
>  #endif
>  
> -/*
> - * Apply purgatory relocations.
> - *
> - * ehdr: Pointer to elf headers
> - * sechdr

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-21 Thread Dave Young

On 11/21/16 at 09:49pm, Thiago Jung Bauermann wrote:
> Hello Dave,
> 
> Thanks for your review.
> 
> Am Sonntag, 20. November 2016, 10:45:46 BRST schrieb Dave Young:
> > On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> > > powerpc's purgatory.ro has 12 relocation types when built as
> > > a relocatable object. To implement support for them requires
> > > arch_kexec_apply_relocations_add to duplicate a lot of code with
> > > module_64.c:apply_relocate_add.
> > > 
> > > When built as a Position Independent Executable there are only 4
> > > relocation types in purgatory.ro, so it becomes practical for the powerpc
> > > implementation of kexec_file to have its own relocation implementation.
> > > 
> > > Also, the purgatory is an executable and not an intermediary output from
> > > the compiler so it makes sense conceptually that it is easier to build
> > > it as a PIE than as a partially linked object.
> > > 
> > > Apart from the greatly reduced number of relocations, there are two
> > > differences between a relocatable object and a PIE:
> > > 
> > > 1. __kexec_load_purgatory needs to use the program headers rather than the
> > > 
> > >section headers to figure out how to load the binary.
> > > 
> > > 2. Symbol values are absolute addresses instead of relative to the
> > > 
> > >start of the section.
> > > 
> > > This patch adds the support needed in generic code for the differences
> > > above and allows powerpc to load and relocate a position independent
> > > purgatory.
> > 
> > [snip]
> > 
> > The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
> > not that complex. So could you look into simplify your kexec_file
> > implementation?
> 
> I can try, but there is one fundamental issue here: powerpc 
> position-dependent 
> code relies more on relocations than x86 position-dependent code does, so 
> there's a limit to how simple it can be made without switching to position-
> independent code. And it will always be more involved than it is on x86.
> 
> BTW, building x86's purgatory as PIE results in it not having any relocation 
> at all, so it's an advantage even in that architecture. Unfortunately, the 
> machine locks up during reboot and I didn't have time to try to figure out 
> what's going on.
> 
> > kernel/kexec_file.c kexec_apply_relocations only do limited things
> > and some of the logic is in arch/x86, so move general code out of arch
> > code, then I guess the arch code will be simpler
> 
> I agree that is a good idea. Is the patch below what you had in mind?
> 
> > and then we probably do not need this PIE stuff anymore.
> 
> If you are ok with the patch below I can post a new version of the series 
> based on it and we can see if Michael Ellerman thinks it is enough.
> 

Will review it and do a test. Thanks. I believe this will benefit for
other arches if they want a kexec_file in the future.

> > BTW, __kexec_really_load_purgatory looks worse than
> > ___kexec_load_purgatory ;)
> 
> Really? I find the special handling of bss makes the section-based loader a 
> bit more confusing.

I'm not sure I understand above about "*bss*", personally I like
___kexec_load_purgatory more. But anyway if we move arch code as general
code then it will be not necessary anymore..

> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> Subject: [PATCH] kexec_file: Move generic relocation code from arch/x86 to
>  kernel/kexec_file.c
> 
> The check for undefined symbols stays in arch-specific code because
> powerpc needs to allow TOC symbols to be processed even though they're
> undefined.
> 
> There is no functional change.
> 
> Suggested-by: Dave Young 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/x86/kernel/machine_kexec_64.c | 160 
> +++--
>  include/linux/kexec.h  |   9 ++-
>  kernel/kexec_file.c| 120 +++-
>  3 files changed, 154 insertions(+), 135 deletions(-)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 8c1f218926d7..f4860c408ece 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -401,143 +401,45 @@ int arch_kexec_kernel_verify_sig(struct kimage *image, 
> void *kernel,
>  }
>  #endif
>  
> -/*
> - * Apply purgatory relocations.
> - *
> - * ehdr: Pointer to elf headers
> - * sechdrs: Pointer to section headers.
> - * relsec: section index o

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-19 Thread Dave Young

On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> powerpc's purgatory.ro has 12 relocation types when built as
> a relocatable object. To implement support for them requires
> arch_kexec_apply_relocations_add to duplicate a lot of code with
> module_64.c:apply_relocate_add.
> 
> When built as a Position Independent Executable there are only 4
> relocation types in purgatory.ro, so it becomes practical for the powerpc
> implementation of kexec_file to have its own relocation implementation.
> 
> Also, the purgatory is an executable and not an intermediary output from
> the compiler so it makes sense conceptually that it is easier to build
> it as a PIE than as a partially linked object.
> 
> Apart from the greatly reduced number of relocations, there are two
> differences between a relocatable object and a PIE:
> 
> 1. __kexec_load_purgatory needs to use the program headers rather than the
>section headers to figure out how to load the binary.
> 2. Symbol values are absolute addresses instead of relative to the
>start of the section.
> 
> This patch adds the support needed in generic code for the differences
> above and allows powerpc to load and relocate a position independent
> purgatory.
> 

[snip]

The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
not that complex. So could you look into simplify your kexec_file
implementation?

kernel/kexec_file.c kexec_apply_relocations only do limited things
and some of the logic is in arch/x86, so move general code out of arch
code, then I guess the arch code will be simpler and then we probably
do not need this PIE stuff anymore.

BTW, __kexec_really_load_purgatory looks worse than
___kexec_load_purgatory ;)

Thanks
Dave

Re: [PATCH v10 04/10] kexec_file: Add support for purgatory built as PIE.

2016-11-19 Thread Dave Young

On 11/10/16 at 01:27am, Thiago Jung Bauermann wrote:
> powerpc's purgatory.ro has 12 relocation types when built as
> a relocatable object. To implement support for them requires
> arch_kexec_apply_relocations_add to duplicate a lot of code with
> module_64.c:apply_relocate_add.
> 
> When built as a Position Independent Executable there are only 4
> relocation types in purgatory.ro, so it becomes practical for the powerpc
> implementation of kexec_file to have its own relocation implementation.
> 
> Also, the purgatory is an executable and not an intermediary output from
> the compiler so it makes sense conceptually that it is easier to build
> it as a PIE than as a partially linked object.
> 
> Apart from the greatly reduced number of relocations, there are two
> differences between a relocatable object and a PIE:
> 
> 1. __kexec_load_purgatory needs to use the program headers rather than the
>section headers to figure out how to load the binary.
> 2. Symbol values are absolute addresses instead of relative to the
>start of the section.
> 
> This patch adds the support needed in generic code for the differences
> above and allows powerpc to load and relocate a position independent
> purgatory.
> 

[snip]

The kexec-tools machine_apply_elf_rel is pretty simple for ppc64, it is
not that complex. So could you look into simplify your kexec_file
implementation?

kernel/kexec_file.c kexec_apply_relocations only do limited things
and some of the logic is in arch/x86, so move general code out of arch
code, then I guess the arch code will be simpler and then we probably
do not need this PIE stuff anymore.

BTW, __kexec_really_load_purgatory looks worse than
___kexec_load_purgatory ;)

Thanks
Dave

Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-10-31 Thread Dave Young

On 10/06/16 at 04:46pm, Baoquan He wrote:
> KASLR memory randomization can randomize the base of the physical memory
> mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
> (VMEMMAP_START). These need be exported to VMCOREINFO so that user space
> utility, mainly makedumpfile can use them to identify the base of each
> memory section. Here using VMCOREINFO_NUMBER we can reuse the existing
> struct number_table in makedumpfile to import data easily.
> 
> Since they are related to x86_64 only, put them into
> arch_crash_save_vmcoreinfo. And move the exportion of KERNEL_IMAGE_SIZE
> together since it's also for x86_64 only.
> 
> Signed-off-by: Baoquan He 
> ---
>  arch/x86/kernel/machine_kexec_64.c | 4 
>  kernel/kexec_core.c| 3 ---
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 5a294e4..e150dd7 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -337,6 +337,10 @@ void arch_crash_save_vmcoreinfo(void)
>  #endif
>   vmcoreinfo_append_str("KERNELOFFSET=%lx\n",
> kaslr_offset());
> + VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
> + VMCOREINFO_NUMBER(PAGE_OFFSET);
> + VMCOREINFO_NUMBER(VMALLOC_START);
> + VMCOREINFO_NUMBER(VMEMMAP_START);

Pratyush has posted makedumpfile patches below to avoid the VMCOREINFO:
http://lists.infradead.org/pipermail/kexec/2016-October/017540.html

But we have this in mainline which also introduced the VMCOREINFO
numbers, can you send a patch to revert them?
commit 0549a3c02efb350776bc869685a361045efd3a29
Author: Thomas Garnier 
Date:   Tue Oct 11 13:55:08 2016 -0700

kdump, vmcoreinfo: report memory sections virtual addresses
[snip]]

>  }
>  
>  /* arch-dependent functionality related to kexec file-based syscall */
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..8ad3a29e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1467,9 +1467,6 @@ static int __init crash_save_vmcoreinfo_init(void)
>  #endif
>   VMCOREINFO_NUMBER(PG_head_mask);
>   VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
> -#ifdef CONFIG_X86
> - VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
> -#endif

Moving KERNEL_IMAGE_SIZE to x86 should be a standalone patch.
I remember Dave Anderson said he use it in crash utility, cced him.

>  #ifdef CONFIG_HUGETLB_PAGE
>   VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
>  #endif
> -- 
> 2.5.5
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-10-31 Thread Dave Young

On 10/06/16 at 04:46pm, Baoquan He wrote:
> KASLR memory randomization can randomize the base of the physical memory
> mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
> (VMEMMAP_START). These need be exported to VMCOREINFO so that user space
> utility, mainly makedumpfile can use them to identify the base of each
> memory section. Here using VMCOREINFO_NUMBER we can reuse the existing
> struct number_table in makedumpfile to import data easily.
> 
> Since they are related to x86_64 only, put them into
> arch_crash_save_vmcoreinfo. And move the exportion of KERNEL_IMAGE_SIZE
> together since it's also for x86_64 only.
> 
> Signed-off-by: Baoquan He 
> ---
>  arch/x86/kernel/machine_kexec_64.c | 4 
>  kernel/kexec_core.c| 3 ---
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 5a294e4..e150dd7 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -337,6 +337,10 @@ void arch_crash_save_vmcoreinfo(void)
>  #endif
>   vmcoreinfo_append_str("KERNELOFFSET=%lx\n",
> kaslr_offset());
> + VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
> + VMCOREINFO_NUMBER(PAGE_OFFSET);
> + VMCOREINFO_NUMBER(VMALLOC_START);
> + VMCOREINFO_NUMBER(VMEMMAP_START);

Pratyush has posted makedumpfile patches below to avoid the VMCOREINFO:
http://lists.infradead.org/pipermail/kexec/2016-October/017540.html

But we have this in mainline which also introduced the VMCOREINFO
numbers, can you send a patch to revert them?
commit 0549a3c02efb350776bc869685a361045efd3a29
Author: Thomas Garnier 
Date:   Tue Oct 11 13:55:08 2016 -0700

kdump, vmcoreinfo: report memory sections virtual addresses
[snip]]

>  }
>  
>  /* arch-dependent functionality related to kexec file-based syscall */
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..8ad3a29e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1467,9 +1467,6 @@ static int __init crash_save_vmcoreinfo_init(void)
>  #endif
>   VMCOREINFO_NUMBER(PG_head_mask);
>   VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
> -#ifdef CONFIG_X86
> - VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
> -#endif

Moving KERNEL_IMAGE_SIZE to x86 should be a standalone patch.
I remember Dave Anderson said he use it in crash utility, cced him.

>  #ifdef CONFIG_HUGETLB_PAGE
>   VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
>  #endif
> -- 
> 2.5.5
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-10-13 Thread Dave Young

On 10/13/16 at 04:53pm, Baoquan He wrote:
> Hi Pratyush,
> 
> On 10/12/16 at 02:39pm, Pratyush Anand wrote:
> > 
> > 
> > On Wednesday 12 October 2016 05:56 AM, Baoquan He wrote:
> > > > PAGE_OFFSET can be get via vaddr - paddr from elf pt_loads so only
> > > > > VMALLOC_BASE and VMEMMAP_BASE is necessary..
> > > Well, yes, I was wrong. I wrongly thought of kernel text virtual address
> > > when I wrote the reply
> > 
> > So, if you can get PAGE_OFFSET then, probably you do not need to know
> > anything else.
> > 
> > I think, we can simplify makedumpfile code, where we do not need to depend
> > on VMALLOC_START or VMEMMAP_START etc.
> > 
> > "If we know PAGE_OFFSET, we can read from swapper space. If we can read from
> > swapper space, then we can know PA of any kernel VA, whether it is VMALLOC,
> > or vmemmap or module or kernel text area."
> 
> Check makedumpfile code and re-think about this, it's really like you
> said, we can convert VA to PA by swapper_pg_dir or init_level4_pgt. But the
> reason why we have to involve VMALLOC_START and VMEMMAP_START is that in
> x86_64 direct mapping and kernel text mapping are all linear mapping.
> Linear mapping can let us do a very efficient translation from VA to
> PA. Especially for page filtering, we need get PA of mm related data.
> All of them need convert by swapper_pg_dir or init_level4_pgt, that's
> inefficient, imagine the current system usually own many Tera bytes of
> physical memory.
> 

Atsushi, what do you think about above concern?  Ideally we should do it
in userspace instead of add more symbols. Maybe do a test on large
memory machine is necessary.

> So here though we can pick up crash memory regions from elf program
> header of vmcore and calculate the PAGE_OFFSET, we still need
> VMALLOC_START and VMEMMAP_START.
> 
> Thanks
> Baoquan
> > 
> > 
> > In fact, I have cleanup patches for ARM64 [1], which take above approach and
> > get rid of need of VMALLOC_START or VMEMMAP_START etc. I will be sending
> > them upstream soon.
> > 
> > Probably, x86 can take the similar approach.
> > 
> > ~Pratyush
> > 
> > [1] 
> > https://github.com/pratyushanand/makedumpfile/blob/arm64_devel/arch/arm64.c#L228
> > 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-10-13 Thread Dave Young

On 10/13/16 at 04:53pm, Baoquan He wrote:
> Hi Pratyush,
> 
> On 10/12/16 at 02:39pm, Pratyush Anand wrote:
> > 
> > 
> > On Wednesday 12 October 2016 05:56 AM, Baoquan He wrote:
> > > > PAGE_OFFSET can be get via vaddr - paddr from elf pt_loads so only
> > > > > VMALLOC_BASE and VMEMMAP_BASE is necessary..
> > > Well, yes, I was wrong. I wrongly thought of kernel text virtual address
> > > when I wrote the reply
> > 
> > So, if you can get PAGE_OFFSET then, probably you do not need to know
> > anything else.
> > 
> > I think, we can simplify makedumpfile code, where we do not need to depend
> > on VMALLOC_START or VMEMMAP_START etc.
> > 
> > "If we know PAGE_OFFSET, we can read from swapper space. If we can read from
> > swapper space, then we can know PA of any kernel VA, whether it is VMALLOC,
> > or vmemmap or module or kernel text area."
> 
> Check makedumpfile code and re-think about this, it's really like you
> said, we can convert VA to PA by swapper_pg_dir or init_level4_pgt. But the
> reason why we have to involve VMALLOC_START and VMEMMAP_START is that in
> x86_64 direct mapping and kernel text mapping are all linear mapping.
> Linear mapping can let us do a very efficient translation from VA to
> PA. Especially for page filtering, we need get PA of mm related data.
> All of them need convert by swapper_pg_dir or init_level4_pgt, that's
> inefficient, imagine the current system usually own many Tera bytes of
> physical memory.
> 

Atsushi, what do you think about above concern?  Ideally we should do it
in userspace instead of add more symbols. Maybe do a test on large
memory machine is necessary.

> So here though we can pick up crash memory regions from elf program
> header of vmcore and calculate the PAGE_OFFSET, we still need
> VMALLOC_START and VMEMMAP_START.
> 
> Thanks
> Baoquan
> > 
> > 
> > In fact, I have cleanup patches for ARM64 [1], which take above approach and
> > get rid of need of VMALLOC_START or VMEMMAP_START etc. I will be sending
> > them upstream soon.
> > 
> > Probably, x86 can take the similar approach.
> > 
> > ~Pratyush
> > 
> > [1] 
> > https://github.com/pratyushanand/makedumpfile/blob/arm64_devel/arch/arm64.c#L228
> > 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-10-11 Thread Dave Young

On 10/11/16 at 04:19pm, Dave Young wrote:
> On 10/11/16 at 03:41pm, Baoquan He wrote:
> > Hi Eric,
> > 
> > Thanks a lot for your reviewing! Sorry for late reply.
> > 
> > On 10/06/16 at 03:07pm, Eric W. Biederman wrote:
> > > Baoquan He <b...@redhat.com> writes:
> > > 
> > > > KASLR memory randomization can randomize the base of the physical memory
> > > > mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
> > > > (VMEMMAP_START). These need be exported to VMCOREINFO so that user space
> > > > utility, mainly makedumpfile can use them to identify the base of each
> > > > memory section. Here using VMCOREINFO_NUMBER we can reuse the existing
> > > > struct number_table in makedumpfile to import data easily.
> > > >
> > > > Since they are related to x86_64 only, put them into
> > > > arch_crash_save_vmcoreinfo. And move the exportion of KERNEL_IMAGE_SIZE
> > > > together since it's also for x86_64 only.
> > > 
> > > *Scratches my head*  I would have thought this information would have
> > > better fit in the ELF header.  Where it actually has a field for virtual
> > > address.  It also has a field for physical address, and a third field
> > > for offset in the file (which is where the kdump finds these things in
> > > memory aftewards).
> > > 
> > > Why do we need need more magic vmcoreinfo to handle this?
> > 
> > Previously in x86_64, values of PAGE_OFFSET, VMALLOC and VMEMMAP are
> > fixed, makedumpfile also hard codes them.
> > 
> > In kexec-tools, we try to get page_offset_base from /proc/kallsyms or
> > search it from /proc/kcore elf header with the help of virtual address
> > of symbol _stext. Then we save it into p_vaddr of kernel text program
> > segment. In kdump kernel, we may assume kernel text has the biggest
> > starting virtual address and search it from vmcore elf header. But I
> > can't think of a way to get the starting virtual address of vmalloc and
> > vmemmap which are necessary for makedumpfile analysis.
> > 
> > So it's necessary to add them into VMCOREINFO to let makedumpfile know.
> 
> PAGE_OFFSET can be get via vaddr - paddr from elf pt_loads so only
> VMALLOC_BASE and VMEMMAP_BASE is necessary..

Besides of these, since kernel module is randomized as well I wonder if
it need special handling, does it work?

> 
> Thanks
> Dave
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-10-11 Thread Dave Young

On 10/11/16 at 04:19pm, Dave Young wrote:
> On 10/11/16 at 03:41pm, Baoquan He wrote:
> > Hi Eric,
> > 
> > Thanks a lot for your reviewing! Sorry for late reply.
> > 
> > On 10/06/16 at 03:07pm, Eric W. Biederman wrote:
> > > Baoquan He  writes:
> > > 
> > > > KASLR memory randomization can randomize the base of the physical memory
> > > > mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
> > > > (VMEMMAP_START). These need be exported to VMCOREINFO so that user space
> > > > utility, mainly makedumpfile can use them to identify the base of each
> > > > memory section. Here using VMCOREINFO_NUMBER we can reuse the existing
> > > > struct number_table in makedumpfile to import data easily.
> > > >
> > > > Since they are related to x86_64 only, put them into
> > > > arch_crash_save_vmcoreinfo. And move the exportion of KERNEL_IMAGE_SIZE
> > > > together since it's also for x86_64 only.
> > > 
> > > *Scratches my head*  I would have thought this information would have
> > > better fit in the ELF header.  Where it actually has a field for virtual
> > > address.  It also has a field for physical address, and a third field
> > > for offset in the file (which is where the kdump finds these things in
> > > memory aftewards).
> > > 
> > > Why do we need need more magic vmcoreinfo to handle this?
> > 
> > Previously in x86_64, values of PAGE_OFFSET, VMALLOC and VMEMMAP are
> > fixed, makedumpfile also hard codes them.
> > 
> > In kexec-tools, we try to get page_offset_base from /proc/kallsyms or
> > search it from /proc/kcore elf header with the help of virtual address
> > of symbol _stext. Then we save it into p_vaddr of kernel text program
> > segment. In kdump kernel, we may assume kernel text has the biggest
> > starting virtual address and search it from vmcore elf header. But I
> > can't think of a way to get the starting virtual address of vmalloc and
> > vmemmap which are necessary for makedumpfile analysis.
> > 
> > So it's necessary to add them into VMCOREINFO to let makedumpfile know.
> 
> PAGE_OFFSET can be get via vaddr - paddr from elf pt_loads so only
> VMALLOC_BASE and VMEMMAP_BASE is necessary..

Besides of these, since kernel module is randomized as well I wonder if
it need special handling, does it work?

> 
> Thanks
> Dave
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-10-11 Thread Dave Young

On 10/11/16 at 03:41pm, Baoquan He wrote:
> Hi Eric,
> 
> Thanks a lot for your reviewing! Sorry for late reply.
> 
> On 10/06/16 at 03:07pm, Eric W. Biederman wrote:
> > Baoquan He  writes:
> > 
> > > KASLR memory randomization can randomize the base of the physical memory
> > > mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
> > > (VMEMMAP_START). These need be exported to VMCOREINFO so that user space
> > > utility, mainly makedumpfile can use them to identify the base of each
> > > memory section. Here using VMCOREINFO_NUMBER we can reuse the existing
> > > struct number_table in makedumpfile to import data easily.
> > >
> > > Since they are related to x86_64 only, put them into
> > > arch_crash_save_vmcoreinfo. And move the exportion of KERNEL_IMAGE_SIZE
> > > together since it's also for x86_64 only.
> > 
> > *Scratches my head*  I would have thought this information would have
> > better fit in the ELF header.  Where it actually has a field for virtual
> > address.  It also has a field for physical address, and a third field
> > for offset in the file (which is where the kdump finds these things in
> > memory aftewards).
> > 
> > Why do we need need more magic vmcoreinfo to handle this?
> 
> Previously in x86_64, values of PAGE_OFFSET, VMALLOC and VMEMMAP are
> fixed, makedumpfile also hard codes them.
> 
> In kexec-tools, we try to get page_offset_base from /proc/kallsyms or
> search it from /proc/kcore elf header with the help of virtual address
> of symbol _stext. Then we save it into p_vaddr of kernel text program
> segment. In kdump kernel, we may assume kernel text has the biggest
> starting virtual address and search it from vmcore elf header. But I
> can't think of a way to get the starting virtual address of vmalloc and
> vmemmap which are necessary for makedumpfile analysis.
> 
> So it's necessary to add them into VMCOREINFO to let makedumpfile know.

PAGE_OFFSET can be get via vaddr - paddr from elf pt_loads so only
VMALLOC_BASE and VMEMMAP_BASE is necessary..

Thanks
Dave

Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-10-11 Thread Dave Young

On 10/11/16 at 03:41pm, Baoquan He wrote:
> Hi Eric,
> 
> Thanks a lot for your reviewing! Sorry for late reply.
> 
> On 10/06/16 at 03:07pm, Eric W. Biederman wrote:
> > Baoquan He  writes:
> > 
> > > KASLR memory randomization can randomize the base of the physical memory
> > > mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
> > > (VMEMMAP_START). These need be exported to VMCOREINFO so that user space
> > > utility, mainly makedumpfile can use them to identify the base of each
> > > memory section. Here using VMCOREINFO_NUMBER we can reuse the existing
> > > struct number_table in makedumpfile to import data easily.
> > >
> > > Since they are related to x86_64 only, put them into
> > > arch_crash_save_vmcoreinfo. And move the exportion of KERNEL_IMAGE_SIZE
> > > together since it's also for x86_64 only.
> > 
> > *Scratches my head*  I would have thought this information would have
> > better fit in the ELF header.  Where it actually has a field for virtual
> > address.  It also has a field for physical address, and a third field
> > for offset in the file (which is where the kdump finds these things in
> > memory aftewards).
> > 
> > Why do we need need more magic vmcoreinfo to handle this?
> 
> Previously in x86_64, values of PAGE_OFFSET, VMALLOC and VMEMMAP are
> fixed, makedumpfile also hard codes them.
> 
> In kexec-tools, we try to get page_offset_base from /proc/kallsyms or
> search it from /proc/kcore elf header with the help of virtual address
> of symbol _stext. Then we save it into p_vaddr of kernel text program
> segment. In kdump kernel, we may assume kernel text has the biggest
> starting virtual address and search it from vmcore elf header. But I
> can't think of a way to get the starting virtual address of vmalloc and
> vmemmap which are necessary for makedumpfile analysis.
> 
> So it's necessary to add them into VMCOREINFO to let makedumpfile know.

PAGE_OFFSET can be get via vaddr - paddr from elf pt_loads so only
VMALLOC_BASE and VMEMMAP_BASE is necessary..

Thanks
Dave

Re: Change CONFIG_DEVKMEM default value to n

2016-10-09 Thread Dave Young

On 10/10/16 at 07:12am, Greg Kroah-Hartman wrote:
> On Mon, Oct 10, 2016 at 10:50:50AM +0800, Dave Young wrote:
> > On 10/10/16 at 10:44am, Dave Young wrote:
> > > On 10/07/16 at 05:57am, Greg Kroah-Hartman wrote:
> > > > On Fri, Oct 07, 2016 at 10:04:11AM +0800, Dave Young wrote:
> > > > > Kconfig comment suggests setting it as "n" if in doubt thus move the
> > > > > default value to 'n'.
> > > > > 
> > > > > Signed-off-by: Dave Young <dyo...@redhat.com>
> > > > > Suggested-by: Kees Cook <keesc...@chromium.org>
> > > > > ---
> > > > >  drivers/char/Kconfig |2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > --- linux-x86.orig/drivers/char/Kconfig
> > > > > +++ linux-x86/drivers/char/Kconfig
> > > > > @@ -17,7 +17,7 @@ config DEVMEM
> > > > >  
> > > > >  config DEVKMEM
> > > > >   bool "/dev/kmem virtual device support"
> > > > > - default y
> > > > > + default n
> > > > 
> > > > If you remove the "default" line, it defaults to 'n'.
> > > 
> > > I personally perfer a "default n", but I can update it..
> > 
> > Greg, here is an update with dropping the default line:
> 
> 
> 
> Can you resend it in a format I can apply it in?

Done, thanks you!

> 
> thanks,
> 
> greg k-h

Re: Change CONFIG_DEVKMEM default value to n

2016-10-09 Thread Dave Young

On 10/10/16 at 07:12am, Greg Kroah-Hartman wrote:
> On Mon, Oct 10, 2016 at 10:50:50AM +0800, Dave Young wrote:
> > On 10/10/16 at 10:44am, Dave Young wrote:
> > > On 10/07/16 at 05:57am, Greg Kroah-Hartman wrote:
> > > > On Fri, Oct 07, 2016 at 10:04:11AM +0800, Dave Young wrote:
> > > > > Kconfig comment suggests setting it as "n" if in doubt thus move the
> > > > > default value to 'n'.
> > > > > 
> > > > > Signed-off-by: Dave Young 
> > > > > Suggested-by: Kees Cook 
> > > > > ---
> > > > >  drivers/char/Kconfig |2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > --- linux-x86.orig/drivers/char/Kconfig
> > > > > +++ linux-x86/drivers/char/Kconfig
> > > > > @@ -17,7 +17,7 @@ config DEVMEM
> > > > >  
> > > > >  config DEVKMEM
> > > > >   bool "/dev/kmem virtual device support"
> > > > > - default y
> > > > > + default n
> > > > 
> > > > If you remove the "default" line, it defaults to 'n'.
> > > 
> > > I personally perfer a "default n", but I can update it..
> > 
> > Greg, here is an update with dropping the default line:
> 
> 
> 
> Can you resend it in a format I can apply it in?

Done, thanks you!

> 
> thanks,
> 
> greg k-h

[PATCH v2] Move CONFIG_DEVKMEM default to n

2016-10-09 Thread Dave Young

Kconfig comment suggests setting it as "n" if in doubt thus move the
default value to 'n'.

Signed-off-by: Dave Young <dyo...@redhat.com>
Suggested-by: Kees Cook <keesc...@chromium.org>
---
Greg: drop the "default" line will set the default as n
 drivers/char/Kconfig |1 -
 1 file changed, 1 deletion(-)

--- linux-x86.orig/drivers/char/Kconfig
+++ linux-x86/drivers/char/Kconfig
@@ -17,7 +17,6 @@ config DEVMEM
 
 config DEVKMEM
bool "/dev/kmem virtual device support"
-   default y
help
  Say Y here if you want to support the /dev/kmem device. The
  /dev/kmem device is rarely used, but can be used for certain

[PATCH v2] Move CONFIG_DEVKMEM default to n

2016-10-09 Thread Dave Young

Kconfig comment suggests setting it as "n" if in doubt thus move the
default value to 'n'.

Signed-off-by: Dave Young 
Suggested-by: Kees Cook 
---
Greg: drop the "default" line will set the default as n
 drivers/char/Kconfig |1 -
 1 file changed, 1 deletion(-)

--- linux-x86.orig/drivers/char/Kconfig
+++ linux-x86/drivers/char/Kconfig
@@ -17,7 +17,6 @@ config DEVMEM
 
 config DEVKMEM
bool "/dev/kmem virtual device support"
-   default y
help
  Say Y here if you want to support the /dev/kmem device. The
  /dev/kmem device is rarely used, but can be used for certain

Re: Change CONFIG_DEVKMEM default value to n

2016-10-09 Thread Dave Young

On 10/10/16 at 10:44am, Dave Young wrote:
> On 10/07/16 at 05:57am, Greg Kroah-Hartman wrote:
> > On Fri, Oct 07, 2016 at 10:04:11AM +0800, Dave Young wrote:
> > > Kconfig comment suggests setting it as "n" if in doubt thus move the
> > > default value to 'n'.
> > > 
> > > Signed-off-by: Dave Young <dyo...@redhat.com>
> > > Suggested-by: Kees Cook <keesc...@chromium.org>
> > > ---
> > >  drivers/char/Kconfig |2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > --- linux-x86.orig/drivers/char/Kconfig
> > > +++ linux-x86/drivers/char/Kconfig
> > > @@ -17,7 +17,7 @@ config DEVMEM
> > >  
> > >  config DEVKMEM
> > >   bool "/dev/kmem virtual device support"
> > > - default y
> > > + default n
> > 
> > If you remove the "default" line, it defaults to 'n'.
> 
> I personally perfer a "default n", but I can update it..

Greg, here is an update with dropping the default line:

Move CONFIG_DEVKMEM default to n

Kconfig comment suggests setting it as "n" if in doubt thus move the
default value to 'n'.

Signed-off-by: Dave Young <dyo...@redhat.com>
Suggested-by: Kees Cook <keesc...@chromium.org>
---
 drivers/char/Kconfig |1 -
 1 file changed, 1 deletion(-)

--- linux-x86.orig/drivers/char/Kconfig
+++ linux-x86/drivers/char/Kconfig
@@ -17,7 +17,6 @@ config DEVMEM
 
 config DEVKMEM
bool "/dev/kmem virtual device support"
-   default y
help
  Say Y here if you want to support the /dev/kmem device. The
  /dev/kmem device is rarely used, but can be used for certain

Re: Change CONFIG_DEVKMEM default value to n

2016-10-09 Thread Dave Young

On 10/10/16 at 10:44am, Dave Young wrote:
> On 10/07/16 at 05:57am, Greg Kroah-Hartman wrote:
> > On Fri, Oct 07, 2016 at 10:04:11AM +0800, Dave Young wrote:
> > > Kconfig comment suggests setting it as "n" if in doubt thus move the
> > > default value to 'n'.
> > > 
> > > Signed-off-by: Dave Young 
> > > Suggested-by: Kees Cook 
> > > ---
> > >  drivers/char/Kconfig |2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > --- linux-x86.orig/drivers/char/Kconfig
> > > +++ linux-x86/drivers/char/Kconfig
> > > @@ -17,7 +17,7 @@ config DEVMEM
> > >  
> > >  config DEVKMEM
> > >   bool "/dev/kmem virtual device support"
> > > - default y
> > > + default n
> > 
> > If you remove the "default" line, it defaults to 'n'.
> 
> I personally perfer a "default n", but I can update it..

Greg, here is an update with dropping the default line:

Move CONFIG_DEVKMEM default to n

Kconfig comment suggests setting it as "n" if in doubt thus move the
default value to 'n'.

Signed-off-by: Dave Young 
Suggested-by: Kees Cook 
---
 drivers/char/Kconfig |1 -
 1 file changed, 1 deletion(-)

--- linux-x86.orig/drivers/char/Kconfig
+++ linux-x86/drivers/char/Kconfig
@@ -17,7 +17,6 @@ config DEVMEM
 
 config DEVKMEM
bool "/dev/kmem virtual device support"
-   default y
help
  Say Y here if you want to support the /dev/kmem device. The
  /dev/kmem device is rarely used, but can be used for certain

Re: Change CONFIG_DEVKMEM default value to n

2016-10-09 Thread Dave Young

On 10/07/16 at 05:57am, Greg Kroah-Hartman wrote:
> On Fri, Oct 07, 2016 at 10:04:11AM +0800, Dave Young wrote:
> > Kconfig comment suggests setting it as "n" if in doubt thus move the
> > default value to 'n'.
> > 
> > Signed-off-by: Dave Young <dyo...@redhat.com>
> > Suggested-by: Kees Cook <keesc...@chromium.org>
> > ---
> >  drivers/char/Kconfig |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > --- linux-x86.orig/drivers/char/Kconfig
> > +++ linux-x86/drivers/char/Kconfig
> > @@ -17,7 +17,7 @@ config DEVMEM
> >  
> >  config DEVKMEM
> > bool "/dev/kmem virtual device support"
> > -   default y
> > +   default n
> 
> If you remove the "default" line, it defaults to 'n'.

I personally perfer a "default n", but I can update it..

> 
> And is it really "safe" to default this to n now?

There is an old article here:
https://lwn.net/Articles/147901/

AFAIK Distributions like Fedora/Debian has disabled it for long time.
If one really need it he can still enable it in his own config file.

> 
> thanks,
> 
> greg k-h

Thanks
Dave

Re: Change CONFIG_DEVKMEM default value to n

2016-10-09 Thread Dave Young

On 10/07/16 at 05:57am, Greg Kroah-Hartman wrote:
> On Fri, Oct 07, 2016 at 10:04:11AM +0800, Dave Young wrote:
> > Kconfig comment suggests setting it as "n" if in doubt thus move the
> > default value to 'n'.
> > 
> > Signed-off-by: Dave Young 
> > Suggested-by: Kees Cook 
> > ---
> >  drivers/char/Kconfig |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > --- linux-x86.orig/drivers/char/Kconfig
> > +++ linux-x86/drivers/char/Kconfig
> > @@ -17,7 +17,7 @@ config DEVMEM
> >  
> >  config DEVKMEM
> > bool "/dev/kmem virtual device support"
> > -   default y
> > +   default n
> 
> If you remove the "default" line, it defaults to 'n'.

I personally perfer a "default n", but I can update it..

> 
> And is it really "safe" to default this to n now?

There is an old article here:
https://lwn.net/Articles/147901/

AFAIK Distributions like Fedora/Debian has disabled it for long time.
If one really need it he can still enable it in his own config file.

> 
> thanks,
> 
> greg k-h

Thanks
Dave

loop mount: kernel BUG at lib/percpu-refcount.c:231

2016-10-06 Thread Dave Young

Hi,

Below bug happened to me while loop mount a file image after stopping a
kvm guest. But it only happend once til now..

[ 4761.031686] [ cut here ]
[ 4761.075984] kernel BUG at lib/percpu-refcount.c:231!
[ 4761.120184] invalid opcode:  [#1] SMP
[ 4761.164307] Modules linked in: loop(+) macvtap macvlan tun ccm rfcomm fuse 
snd_hda_codec_hdmi cmac bnep vfat fat kvm_intel kvm irqbypass arc4 i915 
rtsx_pci_sdmmc intel_gtt drm_kms_helper iwlmvm syscopyarea sysfillrect 
sysimgblt fb_sys_fops mac80211 drm snd_hda_codec_realtek snd_hda_codec_generic 
snd_hda_intel snd_hda_codec btusb snd_hwdep iwlwifi snd_hda_core input_leds 
btrtl snd_seq pcspkr serio_raw btbcm snd_seq_device i2c_i801 btintel cfg80211 
bluetooth snd_pcm i2c_smbus rtsx_pci mfd_core e1000e ptp pps_core snd_timer 
thinkpad_acpi wmi snd soundcore rfkill video nfsd auth_rpcgss nfs_acl lockd 
grace sunrpc
[ 4761.323045] CPU: 1 PID: 25890 Comm: modprobe Not tainted 4.8.0+ #168
[ 4761.377791] Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET86WW (2.36 
) 12/04/2015
[ 4761.433704] task: 986fd1b7d780 task.stack: a85842528000
[ 4761.490120] RIP: 0010:[]  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4761.548138] RSP: 0018:a8584252bb38  EFLAGS: 00010246
[ 4761.604673] RAX:  RBX: 986fbdca3200 RCX: 
[ 4761.662416] RDX: 00983288 RSI: 0001 RDI: 986fbdca3958
[ 4761.720473] RBP: a8584252bb80 R08: 0008 R09: 0008
[ 4761.779270] R10:  R11:  R12: 
[ 4761.837603] R13: 9870fa22c800 R14: 9870fa22c80c R15: 986fbdca3200
[ 4761.895870] FS:  7fc286eb4640() GS:98711f24() 
knlGS:
[ 4761.954596] CS:  0010 DS:  ES:  CR0: 80050033
[ 4762.012978] CR2: 555c3a20ee78 CR3: 000212988000 CR4: 001406e0
[ 4762.072454] Stack:
[ 4762.131283]  9870f2f37800 9870c8e46000 9870fa22c880 
a8584252bbb8
[ 4762.190776]  ae2a147c ba169577 986fbdca3200 
9870fa22c870
[ 4762.251149]  9870fa22c800 a8584252bb90 ae2b3294 
a8584252bbc8
[ 4762.311657] Call Trace:
[ 4762.371157]  [] ? kobject_uevent_env+0xfc/0x3b0
[ 4762.431483]  [] percpu_ref_switch_to_percpu+0x14/0x20
[ 4762.492093]  [] blk_register_queue+0xbe/0x120
[ 4762.552727]  [] device_add_disk+0x1c4/0x470
[ 4762.614155]  [] loop_add+0x1d9/0x260 [loop]
[ 4762.674042]  [] loop_init+0x119/0x16c [loop]
[ 4762.733949]  [] ? 0xc02ff000
[ 4762.793563]  [] do_one_initcall+0x4b/0x180
[ 4762.853068]  [] ? free_vmap_area_noflush+0x43/0xb0
[ 4762.913665]  [] do_init_module+0x55/0x1c4
[ 4762.973400]  [] load_module+0x1fc4/0x23e0
[ 4763.033545]  [] ? __symbol_put+0x60/0x60
[ 4763.094281]  [] SYSC_init_module+0x138/0x150
[ 4763.154985]  [] SyS_init_module+0x9/0x10
[ 4763.215577]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 4763.277044] Code: 00 48 c7 c7 20 c7 a8 ae 48 63 d2 e8 63 ef ff ff 3b 05 81 
a9 7d 00 89 c2 7c cd 48 8b 43 08 48 83 e0 fe 48 89 43 08 e9 3c ff ff ff <0f> 0b 
e8 81 b6 d9 ff 90 55 48 89 e5 41 54 4c 8d 67 d8 53 48 89 
[ 4763.342964] RIP  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4763.407151]  RSP 

Thanks
Dave

loop mount: kernel BUG at lib/percpu-refcount.c:231

2016-10-06 Thread Dave Young

Hi,

Below bug happened to me while loop mount a file image after stopping a
kvm guest. But it only happend once til now..

[ 4761.031686] [ cut here ]
[ 4761.075984] kernel BUG at lib/percpu-refcount.c:231!
[ 4761.120184] invalid opcode:  [#1] SMP
[ 4761.164307] Modules linked in: loop(+) macvtap macvlan tun ccm rfcomm fuse 
snd_hda_codec_hdmi cmac bnep vfat fat kvm_intel kvm irqbypass arc4 i915 
rtsx_pci_sdmmc intel_gtt drm_kms_helper iwlmvm syscopyarea sysfillrect 
sysimgblt fb_sys_fops mac80211 drm snd_hda_codec_realtek snd_hda_codec_generic 
snd_hda_intel snd_hda_codec btusb snd_hwdep iwlwifi snd_hda_core input_leds 
btrtl snd_seq pcspkr serio_raw btbcm snd_seq_device i2c_i801 btintel cfg80211 
bluetooth snd_pcm i2c_smbus rtsx_pci mfd_core e1000e ptp pps_core snd_timer 
thinkpad_acpi wmi snd soundcore rfkill video nfsd auth_rpcgss nfs_acl lockd 
grace sunrpc
[ 4761.323045] CPU: 1 PID: 25890 Comm: modprobe Not tainted 4.8.0+ #168
[ 4761.377791] Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET86WW (2.36 
) 12/04/2015
[ 4761.433704] task: 986fd1b7d780 task.stack: a85842528000
[ 4761.490120] RIP: 0010:[]  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4761.548138] RSP: 0018:a8584252bb38  EFLAGS: 00010246
[ 4761.604673] RAX:  RBX: 986fbdca3200 RCX: 
[ 4761.662416] RDX: 00983288 RSI: 0001 RDI: 986fbdca3958
[ 4761.720473] RBP: a8584252bb80 R08: 0008 R09: 0008
[ 4761.779270] R10:  R11:  R12: 
[ 4761.837603] R13: 9870fa22c800 R14: 9870fa22c80c R15: 986fbdca3200
[ 4761.895870] FS:  7fc286eb4640() GS:98711f24() 
knlGS:
[ 4761.954596] CS:  0010 DS:  ES:  CR0: 80050033
[ 4762.012978] CR2: 555c3a20ee78 CR3: 000212988000 CR4: 001406e0
[ 4762.072454] Stack:
[ 4762.131283]  9870f2f37800 9870c8e46000 9870fa22c880 
a8584252bbb8
[ 4762.190776]  ae2a147c ba169577 986fbdca3200 
9870fa22c870
[ 4762.251149]  9870fa22c800 a8584252bb90 ae2b3294 
a8584252bbc8
[ 4762.311657] Call Trace:
[ 4762.371157]  [] ? kobject_uevent_env+0xfc/0x3b0
[ 4762.431483]  [] percpu_ref_switch_to_percpu+0x14/0x20
[ 4762.492093]  [] blk_register_queue+0xbe/0x120
[ 4762.552727]  [] device_add_disk+0x1c4/0x470
[ 4762.614155]  [] loop_add+0x1d9/0x260 [loop]
[ 4762.674042]  [] loop_init+0x119/0x16c [loop]
[ 4762.733949]  [] ? 0xc02ff000
[ 4762.793563]  [] do_one_initcall+0x4b/0x180
[ 4762.853068]  [] ? free_vmap_area_noflush+0x43/0xb0
[ 4762.913665]  [] do_init_module+0x55/0x1c4
[ 4762.973400]  [] load_module+0x1fc4/0x23e0
[ 4763.033545]  [] ? __symbol_put+0x60/0x60
[ 4763.094281]  [] SYSC_init_module+0x138/0x150
[ 4763.154985]  [] SyS_init_module+0x9/0x10
[ 4763.215577]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 4763.277044] Code: 00 48 c7 c7 20 c7 a8 ae 48 63 d2 e8 63 ef ff ff 3b 05 81 
a9 7d 00 89 c2 7c cd 48 8b 43 08 48 83 e0 fe 48 89 43 08 e9 3c ff ff ff <0f> 0b 
e8 81 b6 d9 ff 90 55 48 89 e5 41 54 4c 8d 67 d8 53 48 89 
[ 4763.342964] RIP  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4763.407151]  RSP 

Thanks
Dave

Change CONFIG_DEVKMEM default value to n

2016-10-06 Thread Dave Young

Kconfig comment suggests setting it as "n" if in doubt thus move the
default value to 'n'.

Signed-off-by: Dave Young <dyo...@redhat.com>
Suggested-by: Kees Cook <keesc...@chromium.org>
---
 drivers/char/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-x86.orig/drivers/char/Kconfig
+++ linux-x86/drivers/char/Kconfig
@@ -17,7 +17,7 @@ config DEVMEM
 
 config DEVKMEM
bool "/dev/kmem virtual device support"
-   default y
+   default n
help
  Say Y here if you want to support the /dev/kmem device. The
  /dev/kmem device is rarely used, but can be used for certain

Change CONFIG_DEVKMEM default value to n

2016-10-06 Thread Dave Young

Kconfig comment suggests setting it as "n" if in doubt thus move the
default value to 'n'.

Signed-off-by: Dave Young 
Suggested-by: Kees Cook 
---
 drivers/char/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-x86.orig/drivers/char/Kconfig
+++ linux-x86/drivers/char/Kconfig
@@ -17,7 +17,7 @@ config DEVMEM
 
 config DEVKMEM
bool "/dev/kmem virtual device support"
-   default y
+   default n
help
  Say Y here if you want to support the /dev/kmem device. The
  /dev/kmem device is rarely used, but can be used for certain

Re: [PATCH] Let CONFIG_STRICT_DEVMEM depends on CONFIG_DEVMEM

2016-10-06 Thread Dave Young

On 10/06/16 at 02:39pm, Kees Cook wrote:
> On Wed, Oct 5, 2016 at 10:12 PM, Dave Young <dyo...@redhat.com> wrote:
> > With CONFIG_DEVMEM not set, CONFIG_STRICT_DEVMEM will be useless
> > even if it is set =y, thus let's update the dependency in Kconfig.
> >
> > Signed-off-by: Dave Young <dyo...@redhat.com>
> 
> Acked-by: Kees Cook <keesc...@chromium.org>
> 
> > ---
> >  lib/Kconfig.debug |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > --- linux-x86.orig/lib/Kconfig.debug
> > +++ linux-x86/lib/Kconfig.debug
> > @@ -1980,7 +1980,7 @@ config ARCH_HAS_DEVMEM_IS_ALLOWED
> >
> >  config STRICT_DEVMEM
> > bool "Filter access to /dev/mem"
> > -   depends on MMU
> > +   depends on MMU && DEVMEM
> > depends on ARCH_HAS_DEVMEM_IS_ALLOWED
> > default y if TILE || PPC
> > ---help---
> 
> While we're at it, can we make DEVKMEM default=n? The help text even
> suggests making it "n".

It's fine to me, will send another patch for that.

Thanks
Dave

> 
> -Kees
> 
> -- 
> Kees Cook
> Nexus Security

Re: [PATCH] Let CONFIG_STRICT_DEVMEM depends on CONFIG_DEVMEM

2016-10-06 Thread Dave Young

On 10/06/16 at 02:39pm, Kees Cook wrote:
> On Wed, Oct 5, 2016 at 10:12 PM, Dave Young  wrote:
> > With CONFIG_DEVMEM not set, CONFIG_STRICT_DEVMEM will be useless
> > even if it is set =y, thus let's update the dependency in Kconfig.
> >
> > Signed-off-by: Dave Young 
> 
> Acked-by: Kees Cook 
> 
> > ---
> >  lib/Kconfig.debug |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > --- linux-x86.orig/lib/Kconfig.debug
> > +++ linux-x86/lib/Kconfig.debug
> > @@ -1980,7 +1980,7 @@ config ARCH_HAS_DEVMEM_IS_ALLOWED
> >
> >  config STRICT_DEVMEM
> > bool "Filter access to /dev/mem"
> > -   depends on MMU
> > +   depends on MMU && DEVMEM
> > depends on ARCH_HAS_DEVMEM_IS_ALLOWED
> > default y if TILE || PPC
> > ---help---
> 
> While we're at it, can we make DEVKMEM default=n? The help text even
> suggests making it "n".

It's fine to me, will send another patch for that.

Thanks
Dave

> 
> -Kees
> 
> -- 
> Kees Cook
> Nexus Security

[PATCH] Let CONFIG_STRICT_DEVMEM depends on CONFIG_DEVMEM

2016-10-05 Thread Dave Young

With CONFIG_DEVMEM not set, CONFIG_STRICT_DEVMEM will be useless
even if it is set =y, thus let's update the dependency in Kconfig.

Signed-off-by: Dave Young <dyo...@redhat.com>
---
 lib/Kconfig.debug |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-x86.orig/lib/Kconfig.debug
+++ linux-x86/lib/Kconfig.debug
@@ -1980,7 +1980,7 @@ config ARCH_HAS_DEVMEM_IS_ALLOWED
 
 config STRICT_DEVMEM
bool "Filter access to /dev/mem"
-   depends on MMU
+   depends on MMU && DEVMEM
depends on ARCH_HAS_DEVMEM_IS_ALLOWED
default y if TILE || PPC
---help---

[PATCH] Let CONFIG_STRICT_DEVMEM depends on CONFIG_DEVMEM

2016-10-05 Thread Dave Young

With CONFIG_DEVMEM not set, CONFIG_STRICT_DEVMEM will be useless
even if it is set =y, thus let's update the dependency in Kconfig.

Signed-off-by: Dave Young 
---
 lib/Kconfig.debug |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-x86.orig/lib/Kconfig.debug
+++ linux-x86/lib/Kconfig.debug
@@ -1980,7 +1980,7 @@ config ARCH_HAS_DEVMEM_IS_ALLOWED
 
 config STRICT_DEVMEM
bool "Filter access to /dev/mem"
-   depends on MMU
+   depends on MMU && DEVMEM
depends on ARCH_HAS_DEVMEM_IS_ALLOWED
default y if TILE || PPC
---help---

Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-21 Thread 'Dave Young'

Hi, 河合英宏

Thanks for the patch log update, it looks good to me.

Acked-by: Dave Young <dyo...@redhat.com>

On 09/20/16 at 11:22am, 河合英宏 / KAWAI，HIDEHIRO wrote:
> Here is the revised commit description reflecting Dave's
> comment.  Cc list was copied from -mm version.
> 
> From: Hidehiro Kawai <hidehiro.kawai...@hitachi.com>
> Subject: x86/panic: replace smp_send_stop() with kdump friendly version in 
> panic path
> 
> This patch fixes a problem reported by Daniel Walker
> (https://lkml.org/lkml/2015/6/24/44).
> 
> When kernel panics with crash_kexec_post_notifiers kernel parameter
> enabled, other CPUs are stopped by smp_send_stop() instead of
> machine_crash_shutdown() in __crash_kexec() path.
> 
>   panic()
> if crash_kexec_post_notifiers == 1
>   smp_send_stop()
>   atomic_notifier_call_chain()
>   kmsg_dump()
> __crash_kexec()
>   machine_crash_shutdown()
> 
> Different from smp_send_stop(), machine_crash_shutdown() stops other
> CPUs with extra works for kdump.  So, if smp_send_stop() stops other
> CPUs in advance, these extra works won't be done.  For x86, kdump
> routines miss to save other CPUs' registers and disable virtualization
> extensions.
> 
> To fix this problem, call a new kdump friendly function,
> crash_smp_send_stop(), instead of the smp_send_stop() when
> crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> weak function, and it just call smp_send_stop().  Architecture
> codes should override it so that kdump can work appropriately.
> This patch only provides x86-specific version.
> 
> For Xen's PV kernel, just keep the current behavior.
> As for Dom0, it doesn't use crash_kexec routines, and it relies on
> panic notifier chain.  At the end of the chain, a hypercall is
> issued which requests the hypervisor to execute kdump.  This means
> regardless of crash_kexec_post_notifiers setting, smp_send_stop().
> For PV HVM, it would work similarly to baremetal kernels with extra
> cleanups for hypervisor.  It doesn't need additional care.
> 
> Changes in V4:
> - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> - Rename panic_smp_send_stop to crash_smp_send_stop
> - Don't change the behavior for Xen's PV kernel
> 
> Changes in V3:
> - Revise comments, description, and symbol names
> 
> Changes in V2:
> - Replace smp_send_stop() call with crash_kexec version which
>   saves cpu states and cleans up VMX/SVM
> - Drop a fix for Problem 1 at this moment
> 
> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
> Link: 
> http://lkml.kernel.org/r/20160810080948.11028.15344.st...@sysi4-13.yrl.intra.hitachi.co.jp
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai...@hitachi.com>
> Reported-by: Daniel Walker <dwal...@fifo99.com>
> Cc: Dave Young <dyo...@redhat.com>
> Cc: Baoquan He <b...@redhat.com>
> Cc: Vivek Goyal <vgo...@redhat.com>
> Cc: Eric Biederman <ebied...@xmission.com>
> Cc: Masami Hiramatsu <mhira...@kernel.org>
> Cc: Daniel Walker <dwal...@fifo99.com>
> Cc: Xunlei Pang <xp...@redhat.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: "H. Peter Anvin" <h...@zytor.com>
> Cc: Borislav Petkov <b...@suse.de>
> Cc: David Vrabel <david.vra...@citrix.com>
> Cc: Toshi Kani <toshi.k...@hpe.com>
> Cc: Ralf Baechle <r...@linux-mips.org>
> Cc: David Daney <david.da...@cavium.com>
> Cc: Aaro Koskinen <aaro.koski...@iki.fi>
> Cc: "Steven J. Hill" <steven.h...@cavium.com>
> Cc: Corey Minyard <cminy...@mvista.com>
> Signed-off-by: Andrew Morton <a...@linux-foundation.org>
> 

[snip]

Thanks
Dave

Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-21 Thread 'Dave Young'

Hi, 河合英宏

Thanks for the patch log update, it looks good to me.

Acked-by: Dave Young 

On 09/20/16 at 11:22am, 河合英宏 / KAWAI，HIDEHIRO wrote:
> Here is the revised commit description reflecting Dave's
> comment.  Cc list was copied from -mm version.
> 
> From: Hidehiro Kawai 
> Subject: x86/panic: replace smp_send_stop() with kdump friendly version in 
> panic path
> 
> This patch fixes a problem reported by Daniel Walker
> (https://lkml.org/lkml/2015/6/24/44).
> 
> When kernel panics with crash_kexec_post_notifiers kernel parameter
> enabled, other CPUs are stopped by smp_send_stop() instead of
> machine_crash_shutdown() in __crash_kexec() path.
> 
>   panic()
> if crash_kexec_post_notifiers == 1
>   smp_send_stop()
>   atomic_notifier_call_chain()
>   kmsg_dump()
> __crash_kexec()
>   machine_crash_shutdown()
> 
> Different from smp_send_stop(), machine_crash_shutdown() stops other
> CPUs with extra works for kdump.  So, if smp_send_stop() stops other
> CPUs in advance, these extra works won't be done.  For x86, kdump
> routines miss to save other CPUs' registers and disable virtualization
> extensions.
> 
> To fix this problem, call a new kdump friendly function,
> crash_smp_send_stop(), instead of the smp_send_stop() when
> crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> weak function, and it just call smp_send_stop().  Architecture
> codes should override it so that kdump can work appropriately.
> This patch only provides x86-specific version.
> 
> For Xen's PV kernel, just keep the current behavior.
> As for Dom0, it doesn't use crash_kexec routines, and it relies on
> panic notifier chain.  At the end of the chain, a hypercall is
> issued which requests the hypervisor to execute kdump.  This means
> regardless of crash_kexec_post_notifiers setting, smp_send_stop().
> For PV HVM, it would work similarly to baremetal kernels with extra
> cleanups for hypervisor.  It doesn't need additional care.
> 
> Changes in V4:
> - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> - Rename panic_smp_send_stop to crash_smp_send_stop
> - Don't change the behavior for Xen's PV kernel
> 
> Changes in V3:
> - Revise comments, description, and symbol names
> 
> Changes in V2:
> - Replace smp_send_stop() call with crash_kexec version which
>   saves cpu states and cleans up VMX/SVM
> - Drop a fix for Problem 1 at this moment
> 
> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
> Link: 
> http://lkml.kernel.org/r/20160810080948.11028.15344.st...@sysi4-13.yrl.intra.hitachi.co.jp
> Signed-off-by: Hidehiro Kawai 
> Reported-by: Daniel Walker 
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: Vivek Goyal 
> Cc: Eric Biederman 
> Cc: Masami Hiramatsu 
> Cc: Daniel Walker 
> Cc: Xunlei Pang 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Borislav Petkov 
> Cc: David Vrabel 
> Cc: Toshi Kani 
> Cc: Ralf Baechle 
> Cc: David Daney 
> Cc: Aaro Koskinen 
> Cc: "Steven J. Hill" 
> Cc: Corey Minyard 
> Signed-off-by: Andrew Morton 
> 

[snip]

Thanks
Dave

Re: [PATHC v2 5/9] ima: on soft reboot, save the measurement list

2016-08-31 Thread Dave Young

Hi, Mimi

On 08/30/16 at 06:40pm, Mimi Zohar wrote:
> From: Thiago Jung Bauermann 
> 
> This patch uses the kexec buffer passing mechanism to pass the
> serialized IMA binary_runtime_measurements to the next kernel.
> 
> Changelog v2:
> - Fix build issue by defining a stub ima_add_kexec_buffer and stub
>   struct kimage when CONFIG_IMA=n and CONFIG_IMA_KEXEC=n. (Fenguang Wu)
> - removed kexec_add_handover_buffer() checksum argument.
> - added skip_checksum member to kexec_buf
> - only register reboot notifier once
> 
> Changelog v1:
> - updated to call IMA functions  (Mimi)
> - move code from ima_template.c to ima_kexec.c (Mimi)
> 
> Signed-off-by: Thiago Jung Bauermann 
> Signed-off-by: Mimi Zohar 
> ---
>  include/linux/ima.h| 12 ++
>  kernel/kexec_file.c|  4 ++
>  security/integrity/ima/ima_kexec.c | 88 
> ++
>  3 files changed, 104 insertions(+)
> 
> diff --git a/include/linux/ima.h b/include/linux/ima.h
> index 0eb7c2e..7f6952f 100644
> --- a/include/linux/ima.h
> +++ b/include/linux/ima.h
> @@ -11,6 +11,7 @@
>  #define _LINUX_IMA_H
>  
>  #include 
> +#include 
>  struct linux_binprm;
>  
>  #ifdef CONFIG_IMA
> @@ -23,6 +24,10 @@ extern int ima_post_read_file(struct file *file, void 
> *buf, loff_t size,
> enum kernel_read_file_id id);
>  extern void ima_post_path_mknod(struct dentry *dentry);
>  
> +#ifdef CONFIG_IMA_KEXEC
> +extern void ima_add_kexec_buffer(struct kimage *image);
> +#endif
> +
>  #else
>  static inline int ima_bprm_check(struct linux_binprm *bprm)
>  {
> @@ -62,6 +67,13 @@ static inline void ima_post_path_mknod(struct dentry 
> *dentry)
>  
>  #endif /* CONFIG_IMA */
>  
> +#ifndef CONFIG_IMA_KEXEC
> +struct kimage;
> +
> +static inline void ima_add_kexec_buffer(struct kimage *image)
> +{}
> +#endif
> +
>  #ifdef CONFIG_IMA_APPRAISE
>  extern void ima_inode_post_setattr(struct dentry *dentry);
>  extern int ima_inode_setxattr(struct dentry *dentry, const char *xattr_name,
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 0e90d14..9585861 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -200,6 +201,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
> kernel_fd, int initrd_fd,
>   return ret;
>   image->kernel_buf_len = size;
>  
> + /* IMA needs to pass the measurement list to the next kernel. */
> + ima_add_kexec_buffer(image);
> +
>   /* Call arch image probe handlers */
>   ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
>   image->kernel_buf_len);
> diff --git a/security/integrity/ima/ima_kexec.c 
> b/security/integrity/ima/ima_kexec.c
> index e77ca9d..0e4d0db 100644
> --- a/security/integrity/ima/ima_kexec.c
> +++ b/security/integrity/ima/ima_kexec.c
> @@ -23,6 +23,11 @@
>  
>  #include "ima.h"
>  
> +#ifdef CONFIG_IMA_KEXEC
> +/* Physical address of the measurement buffer in the next kernel. */
> +static unsigned long kexec_buffer_load_addr;
> +static size_t kexec_segment_size;
> +
>  static int ima_dump_measurement_list(unsigned long *buffer_size, void 
> **buffer,
>unsigned long segment_size)
>  {
> @@ -75,6 +80,89 @@ out:
>  }
>  
>  /*
> + * Called during kexec execute so that IMA can save the measurement list.
> + */
> +static int ima_update_kexec_buffer(struct notifier_block *self,
> +unsigned long action, void *data)
> +{
> + void *kexec_buffer = NULL;
> + size_t kexec_buffer_size;
> + int ret;
> +
> + if (!kexec_in_progress)
> + return NOTIFY_OK;
> +
> + kexec_buffer_size = ima_get_binary_runtime_size();
> + if (kexec_buffer_size >
> + (kexec_segment_size - sizeof(struct ima_kexec_hdr))) {
> + pr_err("Binary measurement list grew too large.\n");
> + goto out;
> + }
> +
> + ima_dump_measurement_list(_buffer_size, _buffer,
> +   kexec_segment_size);
> + if (!kexec_buffer) {
> + pr_err("Not enough memory for the kexec measurement buffer.\n");
> + goto out;
> + }
> + ret = kexec_update_segment(kexec_buffer, kexec_buffer_size,
> +kexec_buffer_load_addr, kexec_segment_size);
> + if (ret)
> + pr_err("Error updating kexec buffer: %d\n", ret);
> +out:
> + return NOTIFY_OK;
> +}
> +
> +struct notifier_block update_buffer_nb = {
> + .notifier_call = ima_update_kexec_buffer,
> +};
> +
> +/*
> + * Called during kexec_file_load so that IMA can add a segment to the kexec
> + * image for the measurement list for the next kernel.
> + */
> +void ima_add_kexec_buffer(struct kimage *image)
> +{
> + static

Re: [PATHC v2 5/9] ima: on soft reboot, save the measurement list

2016-08-31 Thread Dave Young

Hi, Mimi

On 08/30/16 at 06:40pm, Mimi Zohar wrote:
> From: Thiago Jung Bauermann 
> 
> This patch uses the kexec buffer passing mechanism to pass the
> serialized IMA binary_runtime_measurements to the next kernel.
> 
> Changelog v2:
> - Fix build issue by defining a stub ima_add_kexec_buffer and stub
>   struct kimage when CONFIG_IMA=n and CONFIG_IMA_KEXEC=n. (Fenguang Wu)
> - removed kexec_add_handover_buffer() checksum argument.
> - added skip_checksum member to kexec_buf
> - only register reboot notifier once
> 
> Changelog v1:
> - updated to call IMA functions  (Mimi)
> - move code from ima_template.c to ima_kexec.c (Mimi)
> 
> Signed-off-by: Thiago Jung Bauermann 
> Signed-off-by: Mimi Zohar 
> ---
>  include/linux/ima.h| 12 ++
>  kernel/kexec_file.c|  4 ++
>  security/integrity/ima/ima_kexec.c | 88 
> ++
>  3 files changed, 104 insertions(+)
> 
> diff --git a/include/linux/ima.h b/include/linux/ima.h
> index 0eb7c2e..7f6952f 100644
> --- a/include/linux/ima.h
> +++ b/include/linux/ima.h
> @@ -11,6 +11,7 @@
>  #define _LINUX_IMA_H
>  
>  #include 
> +#include 
>  struct linux_binprm;
>  
>  #ifdef CONFIG_IMA
> @@ -23,6 +24,10 @@ extern int ima_post_read_file(struct file *file, void 
> *buf, loff_t size,
> enum kernel_read_file_id id);
>  extern void ima_post_path_mknod(struct dentry *dentry);
>  
> +#ifdef CONFIG_IMA_KEXEC
> +extern void ima_add_kexec_buffer(struct kimage *image);
> +#endif
> +
>  #else
>  static inline int ima_bprm_check(struct linux_binprm *bprm)
>  {
> @@ -62,6 +67,13 @@ static inline void ima_post_path_mknod(struct dentry 
> *dentry)
>  
>  #endif /* CONFIG_IMA */
>  
> +#ifndef CONFIG_IMA_KEXEC
> +struct kimage;
> +
> +static inline void ima_add_kexec_buffer(struct kimage *image)
> +{}
> +#endif
> +
>  #ifdef CONFIG_IMA_APPRAISE
>  extern void ima_inode_post_setattr(struct dentry *dentry);
>  extern int ima_inode_setxattr(struct dentry *dentry, const char *xattr_name,
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 0e90d14..9585861 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -200,6 +201,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
> kernel_fd, int initrd_fd,
>   return ret;
>   image->kernel_buf_len = size;
>  
> + /* IMA needs to pass the measurement list to the next kernel. */
> + ima_add_kexec_buffer(image);
> +
>   /* Call arch image probe handlers */
>   ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
>   image->kernel_buf_len);
> diff --git a/security/integrity/ima/ima_kexec.c 
> b/security/integrity/ima/ima_kexec.c
> index e77ca9d..0e4d0db 100644
> --- a/security/integrity/ima/ima_kexec.c
> +++ b/security/integrity/ima/ima_kexec.c
> @@ -23,6 +23,11 @@
>  
>  #include "ima.h"
>  
> +#ifdef CONFIG_IMA_KEXEC
> +/* Physical address of the measurement buffer in the next kernel. */
> +static unsigned long kexec_buffer_load_addr;
> +static size_t kexec_segment_size;
> +
>  static int ima_dump_measurement_list(unsigned long *buffer_size, void 
> **buffer,
>unsigned long segment_size)
>  {
> @@ -75,6 +80,89 @@ out:
>  }
>  
>  /*
> + * Called during kexec execute so that IMA can save the measurement list.
> + */
> +static int ima_update_kexec_buffer(struct notifier_block *self,
> +unsigned long action, void *data)
> +{
> + void *kexec_buffer = NULL;
> + size_t kexec_buffer_size;
> + int ret;
> +
> + if (!kexec_in_progress)
> + return NOTIFY_OK;
> +
> + kexec_buffer_size = ima_get_binary_runtime_size();
> + if (kexec_buffer_size >
> + (kexec_segment_size - sizeof(struct ima_kexec_hdr))) {
> + pr_err("Binary measurement list grew too large.\n");
> + goto out;
> + }
> +
> + ima_dump_measurement_list(_buffer_size, _buffer,
> +   kexec_segment_size);
> + if (!kexec_buffer) {
> + pr_err("Not enough memory for the kexec measurement buffer.\n");
> + goto out;
> + }
> + ret = kexec_update_segment(kexec_buffer, kexec_buffer_size,
> +kexec_buffer_load_addr, kexec_segment_size);
> + if (ret)
> + pr_err("Error updating kexec buffer: %d\n", ret);
> +out:
> + return NOTIFY_OK;
> +}
> +
> +struct notifier_block update_buffer_nb = {
> + .notifier_call = ima_update_kexec_buffer,
> +};
> +
> +/*
> + * Called during kexec_file_load so that IMA can add a segment to the kexec
> + * image for the measurement list for the next kernel.
> + */
> +void ima_add_kexec_buffer(struct kimage *image)
> +{
> + static int registered = 0;
> + struct kexec_buf kbuf = { .image = image, .buf_align =

Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered

2016-08-30 Thread Dave Young

Hi, Pratyush,

I'm not sure who is the maintainer to review and take the patches,
In MATAINERS file, x86 hpet is orphaned. rtc-cmos may go to rtc
maitianer Alessandro Zummo

Ccing Andrew maybe he can also take the patches for orphaned component.

On 08/30/16 at 03:24pm, Pratyush Anand wrote:
> Hi Dave,
> 
> On 30/08/2016:04:22:30 PM, Dave Young wrote:
> > Hi, Pratyush
> > 
> > On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > > We have observed on few x86 machines with rtc-cmos device that
> > > hpet_rtc_interrupt() is called just after irq registration and before
> > > cmos_do_probe() could call hpet_rtc_timer_init().
> > > 
> > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > > interrupt is raised in the given situation, and this results in NMI
> > > watchdog LOCKUP.
> > > 
> > > It has only been observed sporadically on kdump secondary kernels.
> > > 
> > > See the call trace:
> > > ---<-snip->---
> > >27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > > cpu 0
> > > [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > 3.10.0-342.el7.x86_64 #1
> > > [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > > [   27.919455]  8186a728 59c82488 880034e05af0
> > > 81637bd4
> > > [   27.921870]  880034e05b70 8163144a 0010
> > > 880034e05b80
> > > [   27.924257]  880034e05b20 59c82488 
> > > 
> > > [   27.926599] Call Trace:
> > > [   27.927352][] dump_stack+0x19/0x1b
> > > [   27.929080]  [] panic+0xd8/0x1e7
> > > [   27.930588]  [] ? restart_watchdog_hrtimer+0x50/0x50
> > > [   27.932502]  [] watchdog_overflow_callback+0xc2/0xd0
> > > [   27.934427]  [] __perf_event_overflow+0xa1/0x250
> > > [   27.936232]  [] perf_event_overflow+0x14/0x20
> > > [   27.937957]  [] intel_pmu_handle_irq+0x1e8/0x470
> > > [   27.939799]  [] perf_event_nmi_handler+0x2b/0x50
> > > [   27.941649]  [] nmi_handle.isra.0+0x69/0xb0
> > > [   27.943348]  [] do_nmi+0x169/0x340
> > > [   27.944802]  [] end_repeat_nmi+0x1e/0x2e
> > > [   27.946424]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.948197]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.949992]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.951816]  <>[] ?
> > > run_timer_softirq+0x43/0x340
> > > [   27.954114]  [] handle_irq_event_percpu+0x3e/0x1e0
> > > [   27.955962]  [] handle_irq_event+0x3d/0x60
> > > [   27.957635]  [] handle_edge_irq+0x77/0x130
> > > [   27.959332]  [] handle_irq+0xbf/0x150
> > > [   27.960949]  [] do_IRQ+0x4f/0xf0
> > > [   27.962434]  [] common_interrupt+0x6d/0x6d
> > > [   27.964101][] ?
> > > _raw_spin_unlock_irqrestore+0x1b/0x40
> > > [   27.966308]  [] __setup_irq+0x2a7/0x570
> > > [   28.067859]  [] ? hpet_cpuhp_notify+0x140/0x140
> > > [   28.069709]  [] request_threaded_irq+0xcc/0x170
> > > [   28.071585]  [] cmos_do_probe+0x1e6/0x450
> > > [   28.073240]  [] ? cmos_do_probe+0x450/0x450
> > > [   28.074911]  [] cmos_pnp_probe+0xbb/0xc0
> > > [   28.076533]  [] pnp_device_probe+0x65/0xd0
> > > [   28.078198]  [] driver_probe_device+0x87/0x390
> > > [   28.079971]  [] __driver_attach+0x93/0xa0
> > > [   28.081660]  [] ? __device_attach+0x40/0x40
> > > [   28.083662]  [] bus_for_each_dev+0x73/0xc0
> > > [   28.085370]  [] driver_attach+0x1e/0x20
> > > [   28.086974]  [] bus_add_driver+0x200/0x2d0
> > > [   28.088634]  [] ? rtc_sysfs_init+0xe/0xe
> > > [   28.090349]  [] driver_register+0x64/0xf0
> > > [   28.091989]  [] pnp_register_driver+0x20/0x30
> > > [   28.093707]  [] cmos_init+0x11/0x71
> > > ---<-snip->---
> > > 
> > > The previous patch split hpet_rtc_timer_init into
> > > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> > > 
> > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > > registration, so that we can gracefully handle such spurious interrupts.
> > > 
> > > We were able to reproduce the problem in maximum 15 trials of kdump
> > > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > > However, more than 35 trials went fine after applying this patch.
> > > 
> > > Signed-off-by: Pratyush Anand <pan...@redhat.com>
> > > [dzi

Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered

2016-08-30 Thread Dave Young

Hi, Pratyush,

I'm not sure who is the maintainer to review and take the patches,
In MATAINERS file, x86 hpet is orphaned. rtc-cmos may go to rtc
maitianer Alessandro Zummo

Ccing Andrew maybe he can also take the patches for orphaned component.

On 08/30/16 at 03:24pm, Pratyush Anand wrote:
> Hi Dave,
> 
> On 30/08/2016:04:22:30 PM, Dave Young wrote:
> > Hi, Pratyush
> > 
> > On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > > We have observed on few x86 machines with rtc-cmos device that
> > > hpet_rtc_interrupt() is called just after irq registration and before
> > > cmos_do_probe() could call hpet_rtc_timer_init().
> > > 
> > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > > interrupt is raised in the given situation, and this results in NMI
> > > watchdog LOCKUP.
> > > 
> > > It has only been observed sporadically on kdump secondary kernels.
> > > 
> > > See the call trace:
> > > ---<-snip->---
> > >27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > > cpu 0
> > > [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > 3.10.0-342.el7.x86_64 #1
> > > [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > > [   27.919455]  8186a728 59c82488 880034e05af0
> > > 81637bd4
> > > [   27.921870]  880034e05b70 8163144a 0010
> > > 880034e05b80
> > > [   27.924257]  880034e05b20 59c82488 
> > > 
> > > [   27.926599] Call Trace:
> > > [   27.927352][] dump_stack+0x19/0x1b
> > > [   27.929080]  [] panic+0xd8/0x1e7
> > > [   27.930588]  [] ? restart_watchdog_hrtimer+0x50/0x50
> > > [   27.932502]  [] watchdog_overflow_callback+0xc2/0xd0
> > > [   27.934427]  [] __perf_event_overflow+0xa1/0x250
> > > [   27.936232]  [] perf_event_overflow+0x14/0x20
> > > [   27.937957]  [] intel_pmu_handle_irq+0x1e8/0x470
> > > [   27.939799]  [] perf_event_nmi_handler+0x2b/0x50
> > > [   27.941649]  [] nmi_handle.isra.0+0x69/0xb0
> > > [   27.943348]  [] do_nmi+0x169/0x340
> > > [   27.944802]  [] end_repeat_nmi+0x1e/0x2e
> > > [   27.946424]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.948197]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.949992]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.951816]  <>[] ?
> > > run_timer_softirq+0x43/0x340
> > > [   27.954114]  [] handle_irq_event_percpu+0x3e/0x1e0
> > > [   27.955962]  [] handle_irq_event+0x3d/0x60
> > > [   27.957635]  [] handle_edge_irq+0x77/0x130
> > > [   27.959332]  [] handle_irq+0xbf/0x150
> > > [   27.960949]  [] do_IRQ+0x4f/0xf0
> > > [   27.962434]  [] common_interrupt+0x6d/0x6d
> > > [   27.964101][] ?
> > > _raw_spin_unlock_irqrestore+0x1b/0x40
> > > [   27.966308]  [] __setup_irq+0x2a7/0x570
> > > [   28.067859]  [] ? hpet_cpuhp_notify+0x140/0x140
> > > [   28.069709]  [] request_threaded_irq+0xcc/0x170
> > > [   28.071585]  [] cmos_do_probe+0x1e6/0x450
> > > [   28.073240]  [] ? cmos_do_probe+0x450/0x450
> > > [   28.074911]  [] cmos_pnp_probe+0xbb/0xc0
> > > [   28.076533]  [] pnp_device_probe+0x65/0xd0
> > > [   28.078198]  [] driver_probe_device+0x87/0x390
> > > [   28.079971]  [] __driver_attach+0x93/0xa0
> > > [   28.081660]  [] ? __device_attach+0x40/0x40
> > > [   28.083662]  [] bus_for_each_dev+0x73/0xc0
> > > [   28.085370]  [] driver_attach+0x1e/0x20
> > > [   28.086974]  [] bus_add_driver+0x200/0x2d0
> > > [   28.088634]  [] ? rtc_sysfs_init+0xe/0xe
> > > [   28.090349]  [] driver_register+0x64/0xf0
> > > [   28.091989]  [] pnp_register_driver+0x20/0x30
> > > [   28.093707]  [] cmos_init+0x11/0x71
> > > ---<-snip->---
> > > 
> > > The previous patch split hpet_rtc_timer_init into
> > > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> > > 
> > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > > registration, so that we can gracefully handle such spurious interrupts.
> > > 
> > > We were able to reproduce the problem in maximum 15 trials of kdump
> > > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > > However, more than 35 trials went fine after applying this patch.
> > > 
> > > Signed-off-by: Pratyush Anand 
> > > [dzic...@redhat.com: edited t

Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered

2016-08-30 Thread Dave Young

On 08/30/16 at 04:38pm, Dave Young wrote:
> On 08/30/16 at 04:22pm, Dave Young wrote:
> > Hi, Pratyush
> > 
> > On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > > We have observed on few x86 machines with rtc-cmos device that
> > > hpet_rtc_interrupt() is called just after irq registration and before
> > > cmos_do_probe() could call hpet_rtc_timer_init().
> > > 
> > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > > interrupt is raised in the given situation, and this results in NMI
> > > watchdog LOCKUP.
> > > 
> > > It has only been observed sporadically on kdump secondary kernels.
> > > 
> > > See the call trace:
> > > ---<-snip->---
> > >27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > > cpu 0
> > > [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > 3.10.0-342.el7.x86_64 #1
> > > [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > > [   27.919455]  8186a728 59c82488 880034e05af0
> > > 81637bd4
> > > [   27.921870]  880034e05b70 8163144a 0010
> > > 880034e05b80
> > > [   27.924257]  880034e05b20 59c82488 
> > > 
> > > [   27.926599] Call Trace:
> > > [   27.927352][] dump_stack+0x19/0x1b
> > > [   27.929080]  [] panic+0xd8/0x1e7
> > > [   27.930588]  [] ? restart_watchdog_hrtimer+0x50/0x50
> > > [   27.932502]  [] watchdog_overflow_callback+0xc2/0xd0
> > > [   27.934427]  [] __perf_event_overflow+0xa1/0x250
> > > [   27.936232]  [] perf_event_overflow+0x14/0x20
> > > [   27.937957]  [] intel_pmu_handle_irq+0x1e8/0x470
> > > [   27.939799]  [] perf_event_nmi_handler+0x2b/0x50
> > > [   27.941649]  [] nmi_handle.isra.0+0x69/0xb0
> > > [   27.943348]  [] do_nmi+0x169/0x340
> > > [   27.944802]  [] end_repeat_nmi+0x1e/0x2e
> > > [   27.946424]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.948197]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.949992]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.951816]  <>[] ?
> > > run_timer_softirq+0x43/0x340
> > > [   27.954114]  [] handle_irq_event_percpu+0x3e/0x1e0
> > > [   27.955962]  [] handle_irq_event+0x3d/0x60
> > > [   27.957635]  [] handle_edge_irq+0x77/0x130
> > > [   27.959332]  [] handle_irq+0xbf/0x150
> > > [   27.960949]  [] do_IRQ+0x4f/0xf0
> > > [   27.962434]  [] common_interrupt+0x6d/0x6d
> > > [   27.964101][] ?
> > > _raw_spin_unlock_irqrestore+0x1b/0x40
> > > [   27.966308]  [] __setup_irq+0x2a7/0x570
> > > [   28.067859]  [] ? hpet_cpuhp_notify+0x140/0x140
> > > [   28.069709]  [] request_threaded_irq+0xcc/0x170
> > > [   28.071585]  [] cmos_do_probe+0x1e6/0x450
> > > [   28.073240]  [] ? cmos_do_probe+0x450/0x450
> > > [   28.074911]  [] cmos_pnp_probe+0xbb/0xc0
> > > [   28.076533]  [] pnp_device_probe+0x65/0xd0
> > > [   28.078198]  [] driver_probe_device+0x87/0x390
> > > [   28.079971]  [] __driver_attach+0x93/0xa0
> > > [   28.081660]  [] ? __device_attach+0x40/0x40
> > > [   28.083662]  [] bus_for_each_dev+0x73/0xc0
> > > [   28.085370]  [] driver_attach+0x1e/0x20
> > > [   28.086974]  [] bus_add_driver+0x200/0x2d0
> > > [   28.088634]  [] ? rtc_sysfs_init+0xe/0xe
> > > [   28.090349]  [] driver_register+0x64/0xf0
> > > [   28.091989]  [] pnp_register_driver+0x20/0x30
> > > [   28.093707]  [] cmos_init+0x11/0x71
> > > ---<-snip->---
> > > 
> > > The previous patch split hpet_rtc_timer_init into
> > > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> > > 
> > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > > registration, so that we can gracefully handle such spurious interrupts.
> > > 
> > > We were able to reproduce the problem in maximum 15 trials of kdump
> > > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > > However, more than 35 trials went fine after applying this patch.
> > > 
> > > Signed-off-by: Pratyush Anand <pan...@redhat.com>
> > > [dzic...@redhat.com: edited the patch's summary]
> > > Signed-off-by: Don Zickus <dzic...@redhat.com>
> > > ---
> > >  drivers/rtc/rtc-cmos.c | 13 -
> > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> > > index 43745cac0141..089d987f2638 100644
> > > --- a/drivers/rtc/rtc-cmos.c
> > > +++ b/drivers/rtc/rtc-cmos.c
> > > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
> > >   return 0;
> > >  }
> > >  
> > > +static inline int hpet_rtc_timer_counter_init(void)
> > > +{
> > > + return 0;
> > > +}
> > > +
> > > +static inline int hpet_rtc_timer_enable(void)
> > > +{
> > > + return 0;
> > > +}
> > > +
> > 
> > Can these dummy functions go to /usr/include/linux/hpet.h alont with
> > the #ifdef  etc.
> 
> Hmm, seems CONFIG_HPET_EMULATE_RTC is x86 only, so maybe go to
> asm/hpet.h should be better..

Oops, asm/hpet will not work since rtc-cmos is also used in other
arches., please ignore the comment.

Thanks
Dave

Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered

2016-08-30 Thread Dave Young

On 08/30/16 at 04:38pm, Dave Young wrote:
> On 08/30/16 at 04:22pm, Dave Young wrote:
> > Hi, Pratyush
> > 
> > On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > > We have observed on few x86 machines with rtc-cmos device that
> > > hpet_rtc_interrupt() is called just after irq registration and before
> > > cmos_do_probe() could call hpet_rtc_timer_init().
> > > 
> > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > > interrupt is raised in the given situation, and this results in NMI
> > > watchdog LOCKUP.
> > > 
> > > It has only been observed sporadically on kdump secondary kernels.
> > > 
> > > See the call trace:
> > > ---<-snip->---
> > >27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > > cpu 0
> > > [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > 3.10.0-342.el7.x86_64 #1
> > > [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > > [   27.919455]  8186a728 59c82488 880034e05af0
> > > 81637bd4
> > > [   27.921870]  880034e05b70 8163144a 0010
> > > 880034e05b80
> > > [   27.924257]  880034e05b20 59c82488 
> > > 
> > > [   27.926599] Call Trace:
> > > [   27.927352][] dump_stack+0x19/0x1b
> > > [   27.929080]  [] panic+0xd8/0x1e7
> > > [   27.930588]  [] ? restart_watchdog_hrtimer+0x50/0x50
> > > [   27.932502]  [] watchdog_overflow_callback+0xc2/0xd0
> > > [   27.934427]  [] __perf_event_overflow+0xa1/0x250
> > > [   27.936232]  [] perf_event_overflow+0x14/0x20
> > > [   27.937957]  [] intel_pmu_handle_irq+0x1e8/0x470
> > > [   27.939799]  [] perf_event_nmi_handler+0x2b/0x50
> > > [   27.941649]  [] nmi_handle.isra.0+0x69/0xb0
> > > [   27.943348]  [] do_nmi+0x169/0x340
> > > [   27.944802]  [] end_repeat_nmi+0x1e/0x2e
> > > [   27.946424]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.948197]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.949992]  [] ? hpet_rtc_interrupt+0x85/0x380
> > > [   27.951816]  <>[] ?
> > > run_timer_softirq+0x43/0x340
> > > [   27.954114]  [] handle_irq_event_percpu+0x3e/0x1e0
> > > [   27.955962]  [] handle_irq_event+0x3d/0x60
> > > [   27.957635]  [] handle_edge_irq+0x77/0x130
> > > [   27.959332]  [] handle_irq+0xbf/0x150
> > > [   27.960949]  [] do_IRQ+0x4f/0xf0
> > > [   27.962434]  [] common_interrupt+0x6d/0x6d
> > > [   27.964101][] ?
> > > _raw_spin_unlock_irqrestore+0x1b/0x40
> > > [   27.966308]  [] __setup_irq+0x2a7/0x570
> > > [   28.067859]  [] ? hpet_cpuhp_notify+0x140/0x140
> > > [   28.069709]  [] request_threaded_irq+0xcc/0x170
> > > [   28.071585]  [] cmos_do_probe+0x1e6/0x450
> > > [   28.073240]  [] ? cmos_do_probe+0x450/0x450
> > > [   28.074911]  [] cmos_pnp_probe+0xbb/0xc0
> > > [   28.076533]  [] pnp_device_probe+0x65/0xd0
> > > [   28.078198]  [] driver_probe_device+0x87/0x390
> > > [   28.079971]  [] __driver_attach+0x93/0xa0
> > > [   28.081660]  [] ? __device_attach+0x40/0x40
> > > [   28.083662]  [] bus_for_each_dev+0x73/0xc0
> > > [   28.085370]  [] driver_attach+0x1e/0x20
> > > [   28.086974]  [] bus_add_driver+0x200/0x2d0
> > > [   28.088634]  [] ? rtc_sysfs_init+0xe/0xe
> > > [   28.090349]  [] driver_register+0x64/0xf0
> > > [   28.091989]  [] pnp_register_driver+0x20/0x30
> > > [   28.093707]  [] cmos_init+0x11/0x71
> > > ---<-snip->---
> > > 
> > > The previous patch split hpet_rtc_timer_init into
> > > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> > > 
> > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > > registration, so that we can gracefully handle such spurious interrupts.
> > > 
> > > We were able to reproduce the problem in maximum 15 trials of kdump
> > > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > > However, more than 35 trials went fine after applying this patch.
> > > 
> > > Signed-off-by: Pratyush Anand 
> > > [dzic...@redhat.com: edited the patch's summary]
> > > Signed-off-by: Don Zickus 
> > > ---
> > >  drivers/rtc/rtc-cmos.c | 13 -
> > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> > > index 43745cac0141..089d987f2638 100644
> > > --- a/drivers/rtc/rtc-cmos.c
> > > +++ b/drivers/rtc/rtc-cmos.c
> > > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
> > >   return 0;
> > >  }
> > >  
> > > +static inline int hpet_rtc_timer_counter_init(void)
> > > +{
> > > + return 0;
> > > +}
> > > +
> > > +static inline int hpet_rtc_timer_enable(void)
> > > +{
> > > + return 0;
> > > +}
> > > +
> > 
> > Can these dummy functions go to /usr/include/linux/hpet.h alont with
> > the #ifdef  etc.
> 
> Hmm, seems CONFIG_HPET_EMULATE_RTC is x86 only, so maybe go to
> asm/hpet.h should be better..

Oops, asm/hpet will not work since rtc-cmos is also used in other
arches., please ignore the comment.

Thanks
Dave

Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered

2016-08-30 Thread Dave Young

On 08/30/16 at 04:22pm, Dave Young wrote:
> Hi, Pratyush
> 
> On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > We have observed on few x86 machines with rtc-cmos device that
> > hpet_rtc_interrupt() is called just after irq registration and before
> > cmos_do_probe() could call hpet_rtc_timer_init().
> > 
> > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > interrupt is raised in the given situation, and this results in NMI
> > watchdog LOCKUP.
> > 
> > It has only been observed sporadically on kdump secondary kernels.
> > 
> > See the call trace:
> > ---<-snip->---
> >27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > cpu 0
> > [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 3.10.0-342.el7.x86_64 #1
> > [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > [   27.919455]  8186a728 59c82488 880034e05af0
> > 81637bd4
> > [   27.921870]  880034e05b70 8163144a 0010
> > 880034e05b80
> > [   27.924257]  880034e05b20 59c82488 
> > 
> > [   27.926599] Call Trace:
> > [   27.927352][] dump_stack+0x19/0x1b
> > [   27.929080]  [] panic+0xd8/0x1e7
> > [   27.930588]  [] ? restart_watchdog_hrtimer+0x50/0x50
> > [   27.932502]  [] watchdog_overflow_callback+0xc2/0xd0
> > [   27.934427]  [] __perf_event_overflow+0xa1/0x250
> > [   27.936232]  [] perf_event_overflow+0x14/0x20
> > [   27.937957]  [] intel_pmu_handle_irq+0x1e8/0x470
> > [   27.939799]  [] perf_event_nmi_handler+0x2b/0x50
> > [   27.941649]  [] nmi_handle.isra.0+0x69/0xb0
> > [   27.943348]  [] do_nmi+0x169/0x340
> > [   27.944802]  [] end_repeat_nmi+0x1e/0x2e
> > [   27.946424]  [] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.948197]  [] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.949992]  [] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.951816]  <>[] ?
> > run_timer_softirq+0x43/0x340
> > [   27.954114]  [] handle_irq_event_percpu+0x3e/0x1e0
> > [   27.955962]  [] handle_irq_event+0x3d/0x60
> > [   27.957635]  [] handle_edge_irq+0x77/0x130
> > [   27.959332]  [] handle_irq+0xbf/0x150
> > [   27.960949]  [] do_IRQ+0x4f/0xf0
> > [   27.962434]  [] common_interrupt+0x6d/0x6d
> > [   27.964101][] ?
> > _raw_spin_unlock_irqrestore+0x1b/0x40
> > [   27.966308]  [] __setup_irq+0x2a7/0x570
> > [   28.067859]  [] ? hpet_cpuhp_notify+0x140/0x140
> > [   28.069709]  [] request_threaded_irq+0xcc/0x170
> > [   28.071585]  [] cmos_do_probe+0x1e6/0x450
> > [   28.073240]  [] ? cmos_do_probe+0x450/0x450
> > [   28.074911]  [] cmos_pnp_probe+0xbb/0xc0
> > [   28.076533]  [] pnp_device_probe+0x65/0xd0
> > [   28.078198]  [] driver_probe_device+0x87/0x390
> > [   28.079971]  [] __driver_attach+0x93/0xa0
> > [   28.081660]  [] ? __device_attach+0x40/0x40
> > [   28.083662]  [] bus_for_each_dev+0x73/0xc0
> > [   28.085370]  [] driver_attach+0x1e/0x20
> > [   28.086974]  [] bus_add_driver+0x200/0x2d0
> > [   28.088634]  [] ? rtc_sysfs_init+0xe/0xe
> > [   28.090349]  [] driver_register+0x64/0xf0
> > [   28.091989]  [] pnp_register_driver+0x20/0x30
> > [   28.093707]  [] cmos_init+0x11/0x71
> > ---<-snip->---
> > 
> > The previous patch split hpet_rtc_timer_init into
> > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> > 
> > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > registration, so that we can gracefully handle such spurious interrupts.
> > 
> > We were able to reproduce the problem in maximum 15 trials of kdump
> > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > However, more than 35 trials went fine after applying this patch.
> > 
> > Signed-off-by: Pratyush Anand <pan...@redhat.com>
> > [dzic...@redhat.com: edited the patch's summary]
> > Signed-off-by: Don Zickus <dzic...@redhat.com>
> > ---
> >  drivers/rtc/rtc-cmos.c | 13 -
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> > index 43745cac0141..089d987f2638 100644
> > --- a/drivers/rtc/rtc-cmos.c
> > +++ b/drivers/rtc/rtc-cmos.c
> > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
> > return 0;
> >  }
> >  
> > +static inline int hpet_rtc_timer_counter_init(void)
> > +{
> > +   return 0;
> > +}
> > +
> > +static inline int hpet_

Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered

2016-08-30 Thread Dave Young

On 08/30/16 at 04:22pm, Dave Young wrote:
> Hi, Pratyush
> 
> On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > We have observed on few x86 machines with rtc-cmos device that
> > hpet_rtc_interrupt() is called just after irq registration and before
> > cmos_do_probe() could call hpet_rtc_timer_init().
> > 
> > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > interrupt is raised in the given situation, and this results in NMI
> > watchdog LOCKUP.
> > 
> > It has only been observed sporadically on kdump secondary kernels.
> > 
> > See the call trace:
> > ---<-snip->---
> >27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > cpu 0
> > [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 3.10.0-342.el7.x86_64 #1
> > [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > [   27.919455]  8186a728 59c82488 880034e05af0
> > 81637bd4
> > [   27.921870]  880034e05b70 8163144a 0010
> > 880034e05b80
> > [   27.924257]  880034e05b20 59c82488 
> > 
> > [   27.926599] Call Trace:
> > [   27.927352][] dump_stack+0x19/0x1b
> > [   27.929080]  [] panic+0xd8/0x1e7
> > [   27.930588]  [] ? restart_watchdog_hrtimer+0x50/0x50
> > [   27.932502]  [] watchdog_overflow_callback+0xc2/0xd0
> > [   27.934427]  [] __perf_event_overflow+0xa1/0x250
> > [   27.936232]  [] perf_event_overflow+0x14/0x20
> > [   27.937957]  [] intel_pmu_handle_irq+0x1e8/0x470
> > [   27.939799]  [] perf_event_nmi_handler+0x2b/0x50
> > [   27.941649]  [] nmi_handle.isra.0+0x69/0xb0
> > [   27.943348]  [] do_nmi+0x169/0x340
> > [   27.944802]  [] end_repeat_nmi+0x1e/0x2e
> > [   27.946424]  [] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.948197]  [] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.949992]  [] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.951816]  <>[] ?
> > run_timer_softirq+0x43/0x340
> > [   27.954114]  [] handle_irq_event_percpu+0x3e/0x1e0
> > [   27.955962]  [] handle_irq_event+0x3d/0x60
> > [   27.957635]  [] handle_edge_irq+0x77/0x130
> > [   27.959332]  [] handle_irq+0xbf/0x150
> > [   27.960949]  [] do_IRQ+0x4f/0xf0
> > [   27.962434]  [] common_interrupt+0x6d/0x6d
> > [   27.964101][] ?
> > _raw_spin_unlock_irqrestore+0x1b/0x40
> > [   27.966308]  [] __setup_irq+0x2a7/0x570
> > [   28.067859]  [] ? hpet_cpuhp_notify+0x140/0x140
> > [   28.069709]  [] request_threaded_irq+0xcc/0x170
> > [   28.071585]  [] cmos_do_probe+0x1e6/0x450
> > [   28.073240]  [] ? cmos_do_probe+0x450/0x450
> > [   28.074911]  [] cmos_pnp_probe+0xbb/0xc0
> > [   28.076533]  [] pnp_device_probe+0x65/0xd0
> > [   28.078198]  [] driver_probe_device+0x87/0x390
> > [   28.079971]  [] __driver_attach+0x93/0xa0
> > [   28.081660]  [] ? __device_attach+0x40/0x40
> > [   28.083662]  [] bus_for_each_dev+0x73/0xc0
> > [   28.085370]  [] driver_attach+0x1e/0x20
> > [   28.086974]  [] bus_add_driver+0x200/0x2d0
> > [   28.088634]  [] ? rtc_sysfs_init+0xe/0xe
> > [   28.090349]  [] driver_register+0x64/0xf0
> > [   28.091989]  [] pnp_register_driver+0x20/0x30
> > [   28.093707]  [] cmos_init+0x11/0x71
> > ---<-snip->---
> > 
> > The previous patch split hpet_rtc_timer_init into
> > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> > 
> > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > registration, so that we can gracefully handle such spurious interrupts.
> > 
> > We were able to reproduce the problem in maximum 15 trials of kdump
> > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > However, more than 35 trials went fine after applying this patch.
> > 
> > Signed-off-by: Pratyush Anand 
> > [dzic...@redhat.com: edited the patch's summary]
> > Signed-off-by: Don Zickus 
> > ---
> >  drivers/rtc/rtc-cmos.c | 13 -
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> > index 43745cac0141..089d987f2638 100644
> > --- a/drivers/rtc/rtc-cmos.c
> > +++ b/drivers/rtc/rtc-cmos.c
> > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
> > return 0;
> >  }
> >  
> > +static inline int hpet_rtc_timer_counter_init(void)
> > +{
> > +   return 0;
> > +}
> > +
> > +static inline int hpet_rtc_timer_enable(void)
> > +{
> > +   return

Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered

2016-08-30 Thread Dave Young

Hi, Pratyush

On 08/16/16 at 08:55am, Pratyush Anand wrote:
> We have observed on few x86 machines with rtc-cmos device that
> hpet_rtc_interrupt() is called just after irq registration and before
> cmos_do_probe() could call hpet_rtc_timer_init().
> 
> So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> interrupt is raised in the given situation, and this results in NMI
> watchdog LOCKUP.
> 
> It has only been observed sporadically on kdump secondary kernels.
> 
> See the call trace:
> ---<-snip->---
>27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> cpu 0
> [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 3.10.0-342.el7.x86_64 #1
> [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> [   27.919455]  8186a728 59c82488 880034e05af0
> 81637bd4
> [   27.921870]  880034e05b70 8163144a 0010
> 880034e05b80
> [   27.924257]  880034e05b20 59c82488 
> 
> [   27.926599] Call Trace:
> [   27.927352][] dump_stack+0x19/0x1b
> [   27.929080]  [] panic+0xd8/0x1e7
> [   27.930588]  [] ? restart_watchdog_hrtimer+0x50/0x50
> [   27.932502]  [] watchdog_overflow_callback+0xc2/0xd0
> [   27.934427]  [] __perf_event_overflow+0xa1/0x250
> [   27.936232]  [] perf_event_overflow+0x14/0x20
> [   27.937957]  [] intel_pmu_handle_irq+0x1e8/0x470
> [   27.939799]  [] perf_event_nmi_handler+0x2b/0x50
> [   27.941649]  [] nmi_handle.isra.0+0x69/0xb0
> [   27.943348]  [] do_nmi+0x169/0x340
> [   27.944802]  [] end_repeat_nmi+0x1e/0x2e
> [   27.946424]  [] ? hpet_rtc_interrupt+0x85/0x380
> [   27.948197]  [] ? hpet_rtc_interrupt+0x85/0x380
> [   27.949992]  [] ? hpet_rtc_interrupt+0x85/0x380
> [   27.951816]  <>[] ?
> run_timer_softirq+0x43/0x340
> [   27.954114]  [] handle_irq_event_percpu+0x3e/0x1e0
> [   27.955962]  [] handle_irq_event+0x3d/0x60
> [   27.957635]  [] handle_edge_irq+0x77/0x130
> [   27.959332]  [] handle_irq+0xbf/0x150
> [   27.960949]  [] do_IRQ+0x4f/0xf0
> [   27.962434]  [] common_interrupt+0x6d/0x6d
> [   27.964101][] ?
> _raw_spin_unlock_irqrestore+0x1b/0x40
> [   27.966308]  [] __setup_irq+0x2a7/0x570
> [   28.067859]  [] ? hpet_cpuhp_notify+0x140/0x140
> [   28.069709]  [] request_threaded_irq+0xcc/0x170
> [   28.071585]  [] cmos_do_probe+0x1e6/0x450
> [   28.073240]  [] ? cmos_do_probe+0x450/0x450
> [   28.074911]  [] cmos_pnp_probe+0xbb/0xc0
> [   28.076533]  [] pnp_device_probe+0x65/0xd0
> [   28.078198]  [] driver_probe_device+0x87/0x390
> [   28.079971]  [] __driver_attach+0x93/0xa0
> [   28.081660]  [] ? __device_attach+0x40/0x40
> [   28.083662]  [] bus_for_each_dev+0x73/0xc0
> [   28.085370]  [] driver_attach+0x1e/0x20
> [   28.086974]  [] bus_add_driver+0x200/0x2d0
> [   28.088634]  [] ? rtc_sysfs_init+0xe/0xe
> [   28.090349]  [] driver_register+0x64/0xf0
> [   28.091989]  [] pnp_register_driver+0x20/0x30
> [   28.093707]  [] cmos_init+0x11/0x71
> ---<-snip->---
> 
> The previous patch split hpet_rtc_timer_init into
> hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> 
> Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> registration, so that we can gracefully handle such spurious interrupts.
> 
> We were able to reproduce the problem in maximum 15 trials of kdump
> secondary kernel boot on an hp-dl160gen8 machine without this patch.
> However, more than 35 trials went fine after applying this patch.
> 
> Signed-off-by: Pratyush Anand 
> [dzic...@redhat.com: edited the patch's summary]
> Signed-off-by: Don Zickus 
> ---
>  drivers/rtc/rtc-cmos.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> index 43745cac0141..089d987f2638 100644
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
>   return 0;
>  }
>  
> +static inline int hpet_rtc_timer_counter_init(void)
> +{
> + return 0;
> +}
> +
> +static inline int hpet_rtc_timer_enable(void)
> +{
> + return 0;
> +}
> +

Can these dummy functions go to /usr/include/linux/hpet.h alont with
the #ifdef  etc.

>  static inline int hpet_rtc_timer_init(void)
>  {
>   return 0;
> @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, 
> int rtc_irq)
>   goto cleanup1;
>   }
>  
> + hpet_rtc_timer_counter_init();
>   if (is_valid_irq(rtc_irq)) {
>   irq_handler_t rtc_cmos_int_handler;
>  
> @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, 
> int rtc_irq)
>   goto cleanup1;
>   }
>   }
> - hpet_rtc_timer_init();
> + hpet_rtc_timer_enable();
>  
>   /* export at least the first block of NVRAM */
>   nvram.size = address_space - NVRAM_OFFSET;
> -- 
> 2.5.5
> 

Thanks
Dave

Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered

2016-08-30 Thread Dave Young

Hi, Pratyush

On 08/16/16 at 08:55am, Pratyush Anand wrote:
> We have observed on few x86 machines with rtc-cmos device that
> hpet_rtc_interrupt() is called just after irq registration and before
> cmos_do_probe() could call hpet_rtc_timer_init().
> 
> So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> interrupt is raised in the given situation, and this results in NMI
> watchdog LOCKUP.
> 
> It has only been observed sporadically on kdump secondary kernels.
> 
> See the call trace:
> ---<-snip->---
>27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> cpu 0
> [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 3.10.0-342.el7.x86_64 #1
> [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> [   27.919455]  8186a728 59c82488 880034e05af0
> 81637bd4
> [   27.921870]  880034e05b70 8163144a 0010
> 880034e05b80
> [   27.924257]  880034e05b20 59c82488 
> 
> [   27.926599] Call Trace:
> [   27.927352][] dump_stack+0x19/0x1b
> [   27.929080]  [] panic+0xd8/0x1e7
> [   27.930588]  [] ? restart_watchdog_hrtimer+0x50/0x50
> [   27.932502]  [] watchdog_overflow_callback+0xc2/0xd0
> [   27.934427]  [] __perf_event_overflow+0xa1/0x250
> [   27.936232]  [] perf_event_overflow+0x14/0x20
> [   27.937957]  [] intel_pmu_handle_irq+0x1e8/0x470
> [   27.939799]  [] perf_event_nmi_handler+0x2b/0x50
> [   27.941649]  [] nmi_handle.isra.0+0x69/0xb0
> [   27.943348]  [] do_nmi+0x169/0x340
> [   27.944802]  [] end_repeat_nmi+0x1e/0x2e
> [   27.946424]  [] ? hpet_rtc_interrupt+0x85/0x380
> [   27.948197]  [] ? hpet_rtc_interrupt+0x85/0x380
> [   27.949992]  [] ? hpet_rtc_interrupt+0x85/0x380
> [   27.951816]  <>[] ?
> run_timer_softirq+0x43/0x340
> [   27.954114]  [] handle_irq_event_percpu+0x3e/0x1e0
> [   27.955962]  [] handle_irq_event+0x3d/0x60
> [   27.957635]  [] handle_edge_irq+0x77/0x130
> [   27.959332]  [] handle_irq+0xbf/0x150
> [   27.960949]  [] do_IRQ+0x4f/0xf0
> [   27.962434]  [] common_interrupt+0x6d/0x6d
> [   27.964101][] ?
> _raw_spin_unlock_irqrestore+0x1b/0x40
> [   27.966308]  [] __setup_irq+0x2a7/0x570
> [   28.067859]  [] ? hpet_cpuhp_notify+0x140/0x140
> [   28.069709]  [] request_threaded_irq+0xcc/0x170
> [   28.071585]  [] cmos_do_probe+0x1e6/0x450
> [   28.073240]  [] ? cmos_do_probe+0x450/0x450
> [   28.074911]  [] cmos_pnp_probe+0xbb/0xc0
> [   28.076533]  [] pnp_device_probe+0x65/0xd0
> [   28.078198]  [] driver_probe_device+0x87/0x390
> [   28.079971]  [] __driver_attach+0x93/0xa0
> [   28.081660]  [] ? __device_attach+0x40/0x40
> [   28.083662]  [] bus_for_each_dev+0x73/0xc0
> [   28.085370]  [] driver_attach+0x1e/0x20
> [   28.086974]  [] bus_add_driver+0x200/0x2d0
> [   28.088634]  [] ? rtc_sysfs_init+0xe/0xe
> [   28.090349]  [] driver_register+0x64/0xf0
> [   28.091989]  [] pnp_register_driver+0x20/0x30
> [   28.093707]  [] cmos_init+0x11/0x71
> ---<-snip->---
> 
> The previous patch split hpet_rtc_timer_init into
> hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> 
> Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> registration, so that we can gracefully handle such spurious interrupts.
> 
> We were able to reproduce the problem in maximum 15 trials of kdump
> secondary kernel boot on an hp-dl160gen8 machine without this patch.
> However, more than 35 trials went fine after applying this patch.
> 
> Signed-off-by: Pratyush Anand 
> [dzic...@redhat.com: edited the patch's summary]
> Signed-off-by: Don Zickus 
> ---
>  drivers/rtc/rtc-cmos.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> index 43745cac0141..089d987f2638 100644
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
>   return 0;
>  }
>  
> +static inline int hpet_rtc_timer_counter_init(void)
> +{
> + return 0;
> +}
> +
> +static inline int hpet_rtc_timer_enable(void)
> +{
> + return 0;
> +}
> +

Can these dummy functions go to /usr/include/linux/hpet.h alont with
the #ifdef  etc.

>  static inline int hpet_rtc_timer_init(void)
>  {
>   return 0;
> @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, 
> int rtc_irq)
>   goto cleanup1;
>   }
>  
> + hpet_rtc_timer_counter_init();
>   if (is_valid_irq(rtc_irq)) {
>   irq_handler_t rtc_cmos_int_handler;
>  
> @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, 
> int rtc_irq)
>   goto cleanup1;
>   }
>   }
> - hpet_rtc_timer_init();
> + hpet_rtc_timer_enable();
>  
>   /* export at least the first block of NVRAM */
>   nvram.size = address_space - NVRAM_OFFSET;
> -- 
> 2.5.5
> 

Thanks
Dave

Re: Capturing crash with 4.6.0 and above kernel does not work

2016-08-25 Thread Dave Young

On 08/25/16 at 05:45pm, Himanshu Madhani wrote:
> 
> 
> On 8/25/16, 1:10 AM, "Michal Hocko"  wrote:
> 
> >[Let's add kdump people]
> >
> >On Wed 24-08-16 16:38:56, Himanshu Madhani wrote:
> >> Hello list,
> >> 
> >> I am wondering if anybody has issue capturing crash dump with the 4.6.0 
> >> and above kenrel.
> >> 
> >> I have a system, when booted in 4.5.7 kernel is able to capture crash dump.
> >> However, when I boot this system in 4.6.4 and 4.7.2 kernel, crash dump is 
> >> not 
> >> able to capture any crash. 
> >> 
> >> I am still facing same issue with 4.8.0-rc2+ kernel and from the error at 
> >> the command prompt, 
> >> it seems like kexec is ignoring “crashkenrel” parameter. 
> >> 
> >> I added below information in 
> >> https://bugzilla.kernel.org/show_bug.cgi?id=119291. 
> >> 
> >> # uname -r
> >> 4.8.0-rc2+
> >> 
> >> # cat /proc/cmdline 
> >> ro root=/dev/mapper/vg_dut4110-lv_root rd_NO_LUKS  KEYBOARDTYPE=pc 
> >> KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 
> >> crashkernel=512M rd_LVM_LV=vg_dut4110/lv_swap rd_LVM_LV=vg_dut4110/lv_root 
> >> rd_NO_DM rhgb quiet
> >> 
> >> # service kdump status
> >> Kdump is not operational
> >> 
> >> # service kdump start
> >> Memory for crashkernel is not reserved
> >> Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
> >> Starting kdump:[FAILED]
> >
> >It smells like the crash kernel reservation has failed. Could you
> >provide the full kernel log?
> 
> Attached is kernel log from fresh kernel compile of 4.6.5 (linux-stable) tree

Where is the kernel log?

Seems the kernel log in bugzilla is for crashkernel=128M

Thanks
Dave

Re: Capturing crash with 4.6.0 and above kernel does not work

2016-08-25 Thread Dave Young

On 08/25/16 at 05:45pm, Himanshu Madhani wrote:
> 
> 
> On 8/25/16, 1:10 AM, "Michal Hocko"  wrote:
> 
> >[Let's add kdump people]
> >
> >On Wed 24-08-16 16:38:56, Himanshu Madhani wrote:
> >> Hello list,
> >> 
> >> I am wondering if anybody has issue capturing crash dump with the 4.6.0 
> >> and above kenrel.
> >> 
> >> I have a system, when booted in 4.5.7 kernel is able to capture crash dump.
> >> However, when I boot this system in 4.6.4 and 4.7.2 kernel, crash dump is 
> >> not 
> >> able to capture any crash. 
> >> 
> >> I am still facing same issue with 4.8.0-rc2+ kernel and from the error at 
> >> the command prompt, 
> >> it seems like kexec is ignoring “crashkenrel” parameter. 
> >> 
> >> I added below information in 
> >> https://bugzilla.kernel.org/show_bug.cgi?id=119291. 
> >> 
> >> # uname -r
> >> 4.8.0-rc2+
> >> 
> >> # cat /proc/cmdline 
> >> ro root=/dev/mapper/vg_dut4110-lv_root rd_NO_LUKS  KEYBOARDTYPE=pc 
> >> KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 
> >> crashkernel=512M rd_LVM_LV=vg_dut4110/lv_swap rd_LVM_LV=vg_dut4110/lv_root 
> >> rd_NO_DM rhgb quiet
> >> 
> >> # service kdump status
> >> Kdump is not operational
> >> 
> >> # service kdump start
> >> Memory for crashkernel is not reserved
> >> Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
> >> Starting kdump:[FAILED]
> >
> >It smells like the crash kernel reservation has failed. Could you
> >provide the full kernel log?
> 
> Attached is kernel log from fresh kernel compile of 4.6.5 (linux-stable) tree

Where is the kernel log?

Seems the kernel log in bugzilla is for crashkernel=128M

Thanks
Dave

Re: [PATCH v3 2/2] powerpc/fadump: parse fadump reserve memory size based on memory range

2016-08-25 Thread Dave Young

On 08/25/16 at 11:00pm, Hari Bathini wrote:
> 
> 
> On Thursday 25 August 2016 12:31 PM, Dave Young wrote:
> > On 08/10/16 at 03:35pm, Hari Bathini wrote:
> > > When fadump is enabled, by default 5% of system RAM is reserved for
> > > fadump kernel. While that works for most cases, it is not good enough
> > > for every case.
> > > 
> > > Currently, to override the default value, fadump supports specifying
> > > memory to reserve with fadump_reserve_mem=size, where only a fixed size
> > > can be specified. This patch adds support to specify memory size to
> > > reserve for different memory ranges as below:
> > > 
> > >   fadump_reserve_mem=:[,:,...]
> > Hi, Hari
> 
> Hi Dave,
> 
> > I do not understand why you need introduce the new cmdline param, what's
> > the difference between the "fadump reserved" memory and the memory
> 
> I am not introducing a new parameter but adding a new syntax for
> an existing parameter.

Apologize for that, I was not aware it because it is not documented in
kernel-parameters.txt

> 
> > reserved by "crashkernel="? Can fadump just use crashkernel= to reserve
> > memory?
> 
> Not all syntaxes supported by crashkernel apply for fadump_reserve_mem.
> Nonetheless, it is worth considering reuse of crashkernel parameter instead
> of fadump_reserve_mem. Let me see what I can do about this..

Thanks! I originally thought fadump will reserve memory in firmware
code, if it is in kernel then it will be better to just extend and reuse
crashkernel=.

Dave
> 
> Thanks
> Hari
> 
> > Thanks
> > Dave
> > 
> > > Supporting range based input for "fadump_reserve_mem" parameter helps
> > > using the same commandline parameter for different system memory sizes.
> > > 
> > > Signed-off-by: Hari Bathini <hbath...@linux.vnet.ibm.com>
> > > Reviewed-by: Mahesh J Salgaonkar <mah...@linux.vnet.ibm.com>
> > > ---
> > > 
> > > Changes from v2:
> > > 1. Updated changelog
> > > 
> > > 
> > >   arch/powerpc/kernel/fadump.c |   63 
> > > --
> > >   1 file changed, 54 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> > > index b3a6633..7c01b5b 100644
> > > --- a/arch/powerpc/kernel/fadump.c
> > > +++ b/arch/powerpc/kernel/fadump.c
> > > @@ -193,6 +193,55 @@ static unsigned long init_fadump_mem_struct(struct 
> > > fadump_mem_struct *fdm,
> > >   return addr;
> > >   }
> > > +/*
> > > + * This function parses command line for fadump_reserve_mem=
> > > + *
> > > + * Supports the below two syntaxes:
> > > + *1. fadump_reserve_mem=size
> > > + *2. fadump_reserve_mem=ramsize-range:size[,...]
> > > + *
> > > + * Sets fw_dump.reserve_bootvar with the memory size
> > > + * provided, 0 otherwise
> > > + *
> > > + * The function returns -EINVAL on failure, 0 otherwise.
> > > + */
> > > +static int __init parse_fadump_reserve_mem(void)
> > > +{
> > > + char *name = "fadump_reserve_mem=";
> > > + char *fadump_cmdline = NULL, *cur;
> > > +
> > > + fw_dump.reserve_bootvar = 0;
> > > +
> > > + /* find fadump_reserve_mem and use the last one if there are many */
> > > + cur = strstr(boot_command_line, name);
> > > + while (cur) {
> > > + fadump_cmdline = cur;
> > > + cur = strstr(cur+1, name);
> > > + }
> > > +
> > > + /* when no fadump_reserve_mem= cmdline option is provided */
> > > + if (!fadump_cmdline)
> > > + return 0;
> > > +
> > > + fadump_cmdline += strlen(name);
> > > +
> > > + /* for fadump_reserve_mem=size cmdline syntax */
> > > + if (!is_colon_in_param(fadump_cmdline)) {
> > > + fw_dump.reserve_bootvar = memparse(fadump_cmdline, NULL);
> > > + return 0;
> > > + }
> > > +
> > > + /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */
> > > + cur = fadump_cmdline;
> > > + fw_dump.reserve_bootvar = parse_mem_range_size("fadump_reserve_mem",
> > > + , memblock_phys_mem_size());
> > > + if (cur == fadump_cmdline) {
> > > + return -EINVAL;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > >   /**
&g

Re: [PATCH v3 2/2] powerpc/fadump: parse fadump reserve memory size based on memory range

2016-08-25 Thread Dave Young

On 08/25/16 at 11:00pm, Hari Bathini wrote:
> 
> 
> On Thursday 25 August 2016 12:31 PM, Dave Young wrote:
> > On 08/10/16 at 03:35pm, Hari Bathini wrote:
> > > When fadump is enabled, by default 5% of system RAM is reserved for
> > > fadump kernel. While that works for most cases, it is not good enough
> > > for every case.
> > > 
> > > Currently, to override the default value, fadump supports specifying
> > > memory to reserve with fadump_reserve_mem=size, where only a fixed size
> > > can be specified. This patch adds support to specify memory size to
> > > reserve for different memory ranges as below:
> > > 
> > >   fadump_reserve_mem=:[,:,...]
> > Hi, Hari
> 
> Hi Dave,
> 
> > I do not understand why you need introduce the new cmdline param, what's
> > the difference between the "fadump reserved" memory and the memory
> 
> I am not introducing a new parameter but adding a new syntax for
> an existing parameter.

Apologize for that, I was not aware it because it is not documented in
kernel-parameters.txt

> 
> > reserved by "crashkernel="? Can fadump just use crashkernel= to reserve
> > memory?
> 
> Not all syntaxes supported by crashkernel apply for fadump_reserve_mem.
> Nonetheless, it is worth considering reuse of crashkernel parameter instead
> of fadump_reserve_mem. Let me see what I can do about this..

Thanks! I originally thought fadump will reserve memory in firmware
code, if it is in kernel then it will be better to just extend and reuse
crashkernel=.

Dave
> 
> Thanks
> Hari
> 
> > Thanks
> > Dave
> > 
> > > Supporting range based input for "fadump_reserve_mem" parameter helps
> > > using the same commandline parameter for different system memory sizes.
> > > 
> > > Signed-off-by: Hari Bathini 
> > > Reviewed-by: Mahesh J Salgaonkar 
> > > ---
> > > 
> > > Changes from v2:
> > > 1. Updated changelog
> > > 
> > > 
> > >   arch/powerpc/kernel/fadump.c |   63 
> > > --
> > >   1 file changed, 54 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> > > index b3a6633..7c01b5b 100644
> > > --- a/arch/powerpc/kernel/fadump.c
> > > +++ b/arch/powerpc/kernel/fadump.c
> > > @@ -193,6 +193,55 @@ static unsigned long init_fadump_mem_struct(struct 
> > > fadump_mem_struct *fdm,
> > >   return addr;
> > >   }
> > > +/*
> > > + * This function parses command line for fadump_reserve_mem=
> > > + *
> > > + * Supports the below two syntaxes:
> > > + *1. fadump_reserve_mem=size
> > > + *2. fadump_reserve_mem=ramsize-range:size[,...]
> > > + *
> > > + * Sets fw_dump.reserve_bootvar with the memory size
> > > + * provided, 0 otherwise
> > > + *
> > > + * The function returns -EINVAL on failure, 0 otherwise.
> > > + */
> > > +static int __init parse_fadump_reserve_mem(void)
> > > +{
> > > + char *name = "fadump_reserve_mem=";
> > > + char *fadump_cmdline = NULL, *cur;
> > > +
> > > + fw_dump.reserve_bootvar = 0;
> > > +
> > > + /* find fadump_reserve_mem and use the last one if there are many */
> > > + cur = strstr(boot_command_line, name);
> > > + while (cur) {
> > > + fadump_cmdline = cur;
> > > + cur = strstr(cur+1, name);
> > > + }
> > > +
> > > + /* when no fadump_reserve_mem= cmdline option is provided */
> > > + if (!fadump_cmdline)
> > > + return 0;
> > > +
> > > + fadump_cmdline += strlen(name);
> > > +
> > > + /* for fadump_reserve_mem=size cmdline syntax */
> > > + if (!is_colon_in_param(fadump_cmdline)) {
> > > + fw_dump.reserve_bootvar = memparse(fadump_cmdline, NULL);
> > > + return 0;
> > > + }
> > > +
> > > + /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */
> > > + cur = fadump_cmdline;
> > > + fw_dump.reserve_bootvar = parse_mem_range_size("fadump_reserve_mem",
> > > + , memblock_phys_mem_size());
> > > + if (cur == fadump_cmdline) {
> > > + return -EINVAL;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > >   /**
> > >* fadump_calculate_reserve_size(): reserve

Re: [PATCH v3 2/2] powerpc/fadump: parse fadump reserve memory size based on memory range

2016-08-25 Thread Dave Young

On 08/10/16 at 03:35pm, Hari Bathini wrote:
> When fadump is enabled, by default 5% of system RAM is reserved for
> fadump kernel. While that works for most cases, it is not good enough
> for every case.
> 
> Currently, to override the default value, fadump supports specifying
> memory to reserve with fadump_reserve_mem=size, where only a fixed size
> can be specified. This patch adds support to specify memory size to
> reserve for different memory ranges as below:
> 
>   fadump_reserve_mem=:[,:,...]

Hi, Hari

I do not understand why you need introduce the new cmdline param, what's
the difference between the "fadump reserved" memory and the memory
reserved by "crashkernel="? Can fadump just use crashkernel= to reserve
memory?

Thanks
Dave

> 
> Supporting range based input for "fadump_reserve_mem" parameter helps
> using the same commandline parameter for different system memory sizes.
> 
> Signed-off-by: Hari Bathini 
> Reviewed-by: Mahesh J Salgaonkar 
> ---
> 
> Changes from v2:
> 1. Updated changelog
> 
> 
>  arch/powerpc/kernel/fadump.c |   63 
> --
>  1 file changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index b3a6633..7c01b5b 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -193,6 +193,55 @@ static unsigned long init_fadump_mem_struct(struct 
> fadump_mem_struct *fdm,
>   return addr;
>  }
>  
> +/*
> + * This function parses command line for fadump_reserve_mem=
> + *
> + * Supports the below two syntaxes:
> + *1. fadump_reserve_mem=size
> + *2. fadump_reserve_mem=ramsize-range:size[,...]
> + *
> + * Sets fw_dump.reserve_bootvar with the memory size
> + * provided, 0 otherwise
> + *
> + * The function returns -EINVAL on failure, 0 otherwise.
> + */
> +static int __init parse_fadump_reserve_mem(void)
> +{
> + char *name = "fadump_reserve_mem=";
> + char *fadump_cmdline = NULL, *cur;
> +
> + fw_dump.reserve_bootvar = 0;
> +
> + /* find fadump_reserve_mem and use the last one if there are many */
> + cur = strstr(boot_command_line, name);
> + while (cur) {
> + fadump_cmdline = cur;
> + cur = strstr(cur+1, name);
> + }
> +
> + /* when no fadump_reserve_mem= cmdline option is provided */
> + if (!fadump_cmdline)
> + return 0;
> +
> + fadump_cmdline += strlen(name);
> +
> + /* for fadump_reserve_mem=size cmdline syntax */
> + if (!is_colon_in_param(fadump_cmdline)) {
> + fw_dump.reserve_bootvar = memparse(fadump_cmdline, NULL);
> + return 0;
> + }
> +
> + /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */
> + cur = fadump_cmdline;
> + fw_dump.reserve_bootvar = parse_mem_range_size("fadump_reserve_mem",
> + , memblock_phys_mem_size());
> + if (cur == fadump_cmdline) {
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  /**
>   * fadump_calculate_reserve_size(): reserve variable boot area 5% of System 
> RAM
>   *
> @@ -212,12 +261,17 @@ static inline unsigned long 
> fadump_calculate_reserve_size(void)
>  {
>   unsigned long size;
>  
> + /* sets fw_dump.reserve_bootvar */
> + parse_fadump_reserve_mem();
> +
>   /*
>* Check if the size is specified through fadump_reserve_mem= cmdline
>* option. If yes, then use that.
>*/
>   if (fw_dump.reserve_bootvar)
>   return fw_dump.reserve_bootvar;
> + else
> + printk(KERN_INFO "fadump: calculating default boot size\n");
>  
>   /* divide by 20 to get 5% of value */
>   size = memblock_end_of_DRAM() / 20;
> @@ -348,15 +402,6 @@ static int __init early_fadump_param(char *p)
>  }
>  early_param("fadump", early_fadump_param);
>  
> -/* Look for fadump_reserve_mem= cmdline option */
> -static int __init early_fadump_reserve_mem(char *p)
> -{
> - if (p)
> - fw_dump.reserve_bootvar = memparse(p, );
> - return 0;
> -}
> -early_param("fadump_reserve_mem", early_fadump_reserve_mem);
> -
>  static void register_fw_dump(struct fadump_mem_struct *fdm)
>  {
>   int rc;
>

Re: [PATCH v3 2/2] powerpc/fadump: parse fadump reserve memory size based on memory range

2016-08-25 Thread Dave Young

On 08/10/16 at 03:35pm, Hari Bathini wrote:
> When fadump is enabled, by default 5% of system RAM is reserved for
> fadump kernel. While that works for most cases, it is not good enough
> for every case.
> 
> Currently, to override the default value, fadump supports specifying
> memory to reserve with fadump_reserve_mem=size, where only a fixed size
> can be specified. This patch adds support to specify memory size to
> reserve for different memory ranges as below:
> 
>   fadump_reserve_mem=:[,:,...]

Hi, Hari

I do not understand why you need introduce the new cmdline param, what's
the difference between the "fadump reserved" memory and the memory
reserved by "crashkernel="? Can fadump just use crashkernel= to reserve
memory?

Thanks
Dave

> 
> Supporting range based input for "fadump_reserve_mem" parameter helps
> using the same commandline parameter for different system memory sizes.
> 
> Signed-off-by: Hari Bathini 
> Reviewed-by: Mahesh J Salgaonkar 
> ---
> 
> Changes from v2:
> 1. Updated changelog
> 
> 
>  arch/powerpc/kernel/fadump.c |   63 
> --
>  1 file changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index b3a6633..7c01b5b 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -193,6 +193,55 @@ static unsigned long init_fadump_mem_struct(struct 
> fadump_mem_struct *fdm,
>   return addr;
>  }
>  
> +/*
> + * This function parses command line for fadump_reserve_mem=
> + *
> + * Supports the below two syntaxes:
> + *1. fadump_reserve_mem=size
> + *2. fadump_reserve_mem=ramsize-range:size[,...]
> + *
> + * Sets fw_dump.reserve_bootvar with the memory size
> + * provided, 0 otherwise
> + *
> + * The function returns -EINVAL on failure, 0 otherwise.
> + */
> +static int __init parse_fadump_reserve_mem(void)
> +{
> + char *name = "fadump_reserve_mem=";
> + char *fadump_cmdline = NULL, *cur;
> +
> + fw_dump.reserve_bootvar = 0;
> +
> + /* find fadump_reserve_mem and use the last one if there are many */
> + cur = strstr(boot_command_line, name);
> + while (cur) {
> + fadump_cmdline = cur;
> + cur = strstr(cur+1, name);
> + }
> +
> + /* when no fadump_reserve_mem= cmdline option is provided */
> + if (!fadump_cmdline)
> + return 0;
> +
> + fadump_cmdline += strlen(name);
> +
> + /* for fadump_reserve_mem=size cmdline syntax */
> + if (!is_colon_in_param(fadump_cmdline)) {
> + fw_dump.reserve_bootvar = memparse(fadump_cmdline, NULL);
> + return 0;
> + }
> +
> + /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */
> + cur = fadump_cmdline;
> + fw_dump.reserve_bootvar = parse_mem_range_size("fadump_reserve_mem",
> + , memblock_phys_mem_size());
> + if (cur == fadump_cmdline) {
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  /**
>   * fadump_calculate_reserve_size(): reserve variable boot area 5% of System 
> RAM
>   *
> @@ -212,12 +261,17 @@ static inline unsigned long 
> fadump_calculate_reserve_size(void)
>  {
>   unsigned long size;
>  
> + /* sets fw_dump.reserve_bootvar */
> + parse_fadump_reserve_mem();
> +
>   /*
>* Check if the size is specified through fadump_reserve_mem= cmdline
>* option. If yes, then use that.
>*/
>   if (fw_dump.reserve_bootvar)
>   return fw_dump.reserve_bootvar;
> + else
> + printk(KERN_INFO "fadump: calculating default boot size\n");
>  
>   /* divide by 20 to get 5% of value */
>   size = memblock_end_of_DRAM() / 20;
> @@ -348,15 +402,6 @@ static int __init early_fadump_param(char *p)
>  }
>  early_param("fadump", early_fadump_param);
>  
> -/* Look for fadump_reserve_mem= cmdline option */
> -static int __init early_fadump_reserve_mem(char *p)
> -{
> - if (p)
> - fw_dump.reserve_bootvar = memparse(p, );
> - return 0;
> -}
> -early_param("fadump_reserve_mem", early_fadump_reserve_mem);
> -
>  static void register_fw_dump(struct fadump_mem_struct *fdm)
>  {
>   int rc;
>

Re: [PATCH v2 1/2] kexec: Introduce "/sys/kernel/kexec_crash_low_size"

2016-08-24 Thread Dave Young

On 08/23/16 at 06:11pm, Yinghai Lu wrote:
> On Wed, Aug 17, 2016 at 1:20 AM, Dave Young <dyo...@redhat.com> wrote:
> > On 08/17/16 at 09:50am, Xunlei Pang wrote:
> >> "/sys/kernel/kexec_crash_size" only handles crashk_res, it
> >> is fine in most cases, but sometimes we have crashk_low_res.
> >> For example, when "crashkernel=size[KMG],high" combined with
> >> "crashkernel=size[KMG],low" is used for 64-bit x86.
> >>
> >> Like crashk_res, we introduce the corresponding sysfs file
> >> "/sys/kernel/kexec_crash_low_size" for crashk_low_res.
> >>
> >> So, the exact total reserved memory is the sum of the two.
> >>
> >> crashk_low_res can also be shrunk via this new interface,
> >> and users should be aware of what they are doing.
> ...
> >> @@ -218,6 +238,7 @@ static struct attribute * kernel_attrs[] = {
> >>  #ifdef CONFIG_KEXEC_CORE
> >>   _loaded_attr.attr,
> >>   _crash_loaded_attr.attr,
> >> + _crash_low_size_attr.attr,
> >>   _crash_size_attr.attr,
> >>   _attr.attr,
> >>  #endif
> 
> would be better if you can use attribute_group .is_visible to control showing 
> of
> crash_low_size only when the crash_base is above 4G.

I have same feeling that it looks odd to show low in sysfs in case no
crashkernel=,high being used. Even if crashkernel=,high is used only in
x86 the resource crashk_low is in common code. What do you think to move
it to x86?

Thanks
Dave

> 
> Thanks
> 
> Yinghai
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2 1/2] kexec: Introduce "/sys/kernel/kexec_crash_low_size"

2016-08-24 Thread Dave Young

On 08/23/16 at 06:11pm, Yinghai Lu wrote:
> On Wed, Aug 17, 2016 at 1:20 AM, Dave Young  wrote:
> > On 08/17/16 at 09:50am, Xunlei Pang wrote:
> >> "/sys/kernel/kexec_crash_size" only handles crashk_res, it
> >> is fine in most cases, but sometimes we have crashk_low_res.
> >> For example, when "crashkernel=size[KMG],high" combined with
> >> "crashkernel=size[KMG],low" is used for 64-bit x86.
> >>
> >> Like crashk_res, we introduce the corresponding sysfs file
> >> "/sys/kernel/kexec_crash_low_size" for crashk_low_res.
> >>
> >> So, the exact total reserved memory is the sum of the two.
> >>
> >> crashk_low_res can also be shrunk via this new interface,
> >> and users should be aware of what they are doing.
> ...
> >> @@ -218,6 +238,7 @@ static struct attribute * kernel_attrs[] = {
> >>  #ifdef CONFIG_KEXEC_CORE
> >>   _loaded_attr.attr,
> >>   _crash_loaded_attr.attr,
> >> + _crash_low_size_attr.attr,
> >>   _crash_size_attr.attr,
> >>   _attr.attr,
> >>  #endif
> 
> would be better if you can use attribute_group .is_visible to control showing 
> of
> crash_low_size only when the crash_base is above 4G.

I have same feeling that it looks odd to show low in sysfs in case no
crashkernel=,high being used. Even if crashkernel=,high is used only in
x86 the resource crashk_low is in common code. What do you think to move
it to x86?

Thanks
Dave

> 
> Thanks
> 
> Yinghai
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] x86/efi-bgrt: remove the check of the version field

2016-08-24 Thread Dave Young

On 08/22/16 at 04:49pm, Icenowy Zheng wrote:
> 
> 
> 22.08.2016, 15:28, "Dave Young" <dyo...@redhat.com>:
> > On 08/18/16 at 09:41pm, Matt Fleming wrote:
> >>  On Wed, 17 Aug, at 01:44:13PM, Dave Young wrote:
> >>  >
> >>  > Could we add some quirk for these broken hardware instead of changing
> >>  > the normal code?
> >>
> >>  I'd prefer not to do that if possible. Due to the way that the BIOS
> >>  ecosystem works, this kind of broken firmware spreads across the
> >>  industry, appearing in newer versions of products from the same vendor
> >>  and even products from different vendors.
> >>
> >>  Continuously updating a quirks table as additional broken platforms
> >>  are discovered simply does not scale.
> >
> > Ok, I assumed that they are limited like one point in the web url
> > http://wiki.osdev.org/Broken_UEFI_implementations
> 
> At least I think all Thinkpads suffer from this.

Icenowy, sorry for late reply, I missed it. I'm not sure other version, but my
T440s does work well.

> 
> >
> > But I arm probably wrong like you said. Please ignore the comment then.
> >

Thanks
Dave

Re: [PATCH] x86/efi-bgrt: remove the check of the version field

2016-08-24 Thread Dave Young

On 08/22/16 at 04:49pm, Icenowy Zheng wrote:
> 
> 
> 22.08.2016, 15:28, "Dave Young" :
> > On 08/18/16 at 09:41pm, Matt Fleming wrote:
> >>  On Wed, 17 Aug, at 01:44:13PM, Dave Young wrote:
> >>  >
> >>  > Could we add some quirk for these broken hardware instead of changing
> >>  > the normal code?
> >>
> >>  I'd prefer not to do that if possible. Due to the way that the BIOS
> >>  ecosystem works, this kind of broken firmware spreads across the
> >>  industry, appearing in newer versions of products from the same vendor
> >>  and even products from different vendors.
> >>
> >>  Continuously updating a quirks table as additional broken platforms
> >>  are discovered simply does not scale.
> >
> > Ok, I assumed that they are limited like one point in the web url
> > http://wiki.osdev.org/Broken_UEFI_implementations
> 
> At least I think all Thinkpads suffer from this.

Icenowy, sorry for late reply, I missed it. I'm not sure other version, but my
T440s does work well.

> 
> >
> > But I arm probably wrong like you said. Please ignore the comment then.
> >

Thanks
Dave

Re: [PATCH] x86/efi-bgrt: remove the check of the version field

2016-08-22 Thread Dave Young

On 08/18/16 at 09:41pm, Matt Fleming wrote:
> On Wed, 17 Aug, at 01:44:13PM, Dave Young wrote:
> > 
> > Could we add some quirk for these broken hardware instead of changing
> > the normal code?
> 
> I'd prefer not to do that if possible. Due to the way that the BIOS
> ecosystem works, this kind of broken firmware spreads across the
> industry, appearing in newer versions of products from the same vendor
> and even products from different vendors.
> 
> Continuously updating a quirks table as additional broken platforms
> are discovered simply does not scale.

Ok, I assumed that they are limited like one point in the web url
http://wiki.osdev.org/Broken_UEFI_implementations 

But I arm probably wrong like you said. Please ignore the comment then.

Thanks
Dave

Re: [PATCH] x86/efi-bgrt: remove the check of the version field

2016-08-22 Thread Dave Young

On 08/18/16 at 09:41pm, Matt Fleming wrote:
> On Wed, 17 Aug, at 01:44:13PM, Dave Young wrote:
> > 
> > Could we add some quirk for these broken hardware instead of changing
> > the normal code?
> 
> I'd prefer not to do that if possible. Due to the way that the BIOS
> ecosystem works, this kind of broken firmware spreads across the
> industry, appearing in newer versions of products from the same vendor
> and even products from different vendors.
> 
> Continuously updating a quirks table as additional broken platforms
> are discovered simply does not scale.

Ok, I assumed that they are limited like one point in the web url
http://wiki.osdev.org/Broken_UEFI_implementations 

But I arm probably wrong like you said. Please ignore the comment then.

Thanks
Dave

Re: [PATCH v2 2/6] powerpc: kexec_file: Add buffer hand-over support for the next kernel

2016-08-22 Thread Dave Young

On 08/22/16 at 12:38am, Thiago Jung Bauermann wrote:
> Am Montag, 22 August 2016, 11:21:35 schrieb Dave Young:
> > On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> > > diff --git a/arch/powerpc/kernel/machine_kexec_64.c
> > > b/arch/powerpc/kernel/machine_kexec_64.c index
> > > a484a6346146..190c652e49b7 100644
> > > --- a/arch/powerpc/kernel/machine_kexec_64.c
> > > +++ b/arch/powerpc/kernel/machine_kexec_64.c
> > > @@ -490,6 +490,60 @@ int arch_kimage_file_post_load_cleanup(struct
> > > kimage *image)> 
> > >   return image->fops->cleanup(image->image_loader_data);
> > >  
> > >  }
> > > 
> > > +bool kexec_can_hand_over_buffer(void)
> > > +{
> > > + return true;
> > > +}
> > > +
> > > +int arch_kexec_add_handover_buffer(struct kimage *image,
> > > +unsigned long load_addr, unsigned long 
> size)
> > > +{
> > > + image->arch.handover_buffer_addr = load_addr;
> > > + image->arch.handover_buffer_size = size;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +int kexec_get_handover_buffer(void **addr, unsigned long *size)
> > > +{
> > > + int ret;
> > > + u64 start_addr, end_addr;
> > > +
> > > + ret = of_property_read_u64(of_chosen,
> > > +"linux,kexec-handover-buffer-start",
> > > +_addr);
> > > + if (ret == -EINVAL)
> > > + return -ENOENT;
> > > + else if (ret)
> > > + return -EINVAL;
> > > +
> > > + ret = of_property_read_u64(of_chosen,
> > > "linux,kexec-handover-buffer-end", + 
> _addr);
> > > + if (ret == -EINVAL)
> > > + return -ENOENT;
> > > + else if (ret)
> > > + return -EINVAL;
> > > +
> > > + *addr =  __va(start_addr);
> > > + /* -end is the first address after the buffer. */
> > > + *size = end_addr - start_addr;
> > > +
> > > + return 0;
> > > +}
> > 
> > This depends on dtb, so if IMA want to extend it to arches like x86 in
> > the future you will have to think about other way to pass it.
> > 
> > How about think about a general way now?
> 
> The only general way I can think of is by adding a kernel command line 
> parameter which the first kernel would pass to the second kernel, but IMHO 
> that is ugly, because such parameter wouldn't be useful to a user, and it 
> would also be something that, from the perspective of the user, would 
> magically appear in the kernel command line of the second kernel...

Sorry I just brought up the question, actually I have no idea either.
Maybe we have to do this with arch specific ways..

Thanks
Dave

Re: [PATCH v2 2/6] powerpc: kexec_file: Add buffer hand-over support for the next kernel

2016-08-22 Thread Dave Young

On 08/22/16 at 12:38am, Thiago Jung Bauermann wrote:
> Am Montag, 22 August 2016, 11:21:35 schrieb Dave Young:
> > On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> > > diff --git a/arch/powerpc/kernel/machine_kexec_64.c
> > > b/arch/powerpc/kernel/machine_kexec_64.c index
> > > a484a6346146..190c652e49b7 100644
> > > --- a/arch/powerpc/kernel/machine_kexec_64.c
> > > +++ b/arch/powerpc/kernel/machine_kexec_64.c
> > > @@ -490,6 +490,60 @@ int arch_kimage_file_post_load_cleanup(struct
> > > kimage *image)> 
> > >   return image->fops->cleanup(image->image_loader_data);
> > >  
> > >  }
> > > 
> > > +bool kexec_can_hand_over_buffer(void)
> > > +{
> > > + return true;
> > > +}
> > > +
> > > +int arch_kexec_add_handover_buffer(struct kimage *image,
> > > +unsigned long load_addr, unsigned long 
> size)
> > > +{
> > > + image->arch.handover_buffer_addr = load_addr;
> > > + image->arch.handover_buffer_size = size;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +int kexec_get_handover_buffer(void **addr, unsigned long *size)
> > > +{
> > > + int ret;
> > > + u64 start_addr, end_addr;
> > > +
> > > + ret = of_property_read_u64(of_chosen,
> > > +"linux,kexec-handover-buffer-start",
> > > +_addr);
> > > + if (ret == -EINVAL)
> > > + return -ENOENT;
> > > + else if (ret)
> > > + return -EINVAL;
> > > +
> > > + ret = of_property_read_u64(of_chosen,
> > > "linux,kexec-handover-buffer-end", + 
> _addr);
> > > + if (ret == -EINVAL)
> > > + return -ENOENT;
> > > + else if (ret)
> > > + return -EINVAL;
> > > +
> > > + *addr =  __va(start_addr);
> > > + /* -end is the first address after the buffer. */
> > > + *size = end_addr - start_addr;
> > > +
> > > + return 0;
> > > +}
> > 
> > This depends on dtb, so if IMA want to extend it to arches like x86 in
> > the future you will have to think about other way to pass it.
> > 
> > How about think about a general way now?
> 
> The only general way I can think of is by adding a kernel command line 
> parameter which the first kernel would pass to the second kernel, but IMHO 
> that is ugly, because such parameter wouldn't be useful to a user, and it 
> would also be something that, from the perspective of the user, would 
> magically appear in the kernel command line of the second kernel...

Sorry I just brought up the question, actually I have no idea either.
Maybe we have to do this with arch specific ways..

Thanks
Dave

Re: [PATCH v2 3/6] kexec_file: Allow skipping checksum calculation for some segments.

2016-08-21 Thread Dave Young

On 08/22/16 at 12:25am, Thiago Jung Bauermann wrote:
> Am Montag, 22 August 2016, 11:17:45 schrieb Dave Young:
> > On 08/18/16 at 06:09pm, Thiago Jung Bauermann wrote:
> > > Hello Dave,
> > > 
> > > Thanks for your review!
> > > 
> > > [ Trimming down Cc: list a little to try to clear the "too many
> > > recipients"> 
> > >   mailing list restriction. ]
> > 
> > I also got "too many recipients".. Thanks for the trimming.
> 
> Didn't work though. What is the maximum number of recipients?

I have no idea as well..

> 
> > > Am Donnerstag, 18 August 2016, 17:03:30 schrieb Dave Young:
> > > > On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> > > > > Adds checksum argument to kexec_add_buffer specifying whether the
> > > > > given
> > > > > segment should be part of the checksum calculation.
> > > > 
> > > > Since it is used with add buffer, could it be added to kbuf as a new
> > > > field?
> > > 
> > > I was on the fence about adding it as a new argument to kexec_add_buffer
> > > or as a new field to struct kexec_buf. Both alternatives make sense to
> > > me. I implemented your suggestion in the patch below, what do you
> > > think?> 
> > > > Like kbuf.no_checksum, default value is 0 that means checksum is
> > > > needed
> > > > if it is 1 then no need a checksum.
> > > 
> > > It's an interesting idea and I implemented it that way, though in
> > > practice all current users of struct kexec_buf put it on the stack so
> > > the field needs to be initialized explicitly.
> > 
> > No need to set it as false because it will be initialized to 0 by
> > default?
> 
> As far as I know, variables on the stack are not initialized. Only global 
> and static variables are.

But designated initializers will do it.

Thanks
Dave

Re: [PATCH v2 3/6] kexec_file: Allow skipping checksum calculation for some segments.

2016-08-21 Thread Dave Young

On 08/22/16 at 12:25am, Thiago Jung Bauermann wrote:
> Am Montag, 22 August 2016, 11:17:45 schrieb Dave Young:
> > On 08/18/16 at 06:09pm, Thiago Jung Bauermann wrote:
> > > Hello Dave,
> > > 
> > > Thanks for your review!
> > > 
> > > [ Trimming down Cc: list a little to try to clear the "too many
> > > recipients"> 
> > >   mailing list restriction. ]
> > 
> > I also got "too many recipients".. Thanks for the trimming.
> 
> Didn't work though. What is the maximum number of recipients?

I have no idea as well..

> 
> > > Am Donnerstag, 18 August 2016, 17:03:30 schrieb Dave Young:
> > > > On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> > > > > Adds checksum argument to kexec_add_buffer specifying whether the
> > > > > given
> > > > > segment should be part of the checksum calculation.
> > > > 
> > > > Since it is used with add buffer, could it be added to kbuf as a new
> > > > field?
> > > 
> > > I was on the fence about adding it as a new argument to kexec_add_buffer
> > > or as a new field to struct kexec_buf. Both alternatives make sense to
> > > me. I implemented your suggestion in the patch below, what do you
> > > think?> 
> > > > Like kbuf.no_checksum, default value is 0 that means checksum is
> > > > needed
> > > > if it is 1 then no need a checksum.
> > > 
> > > It's an interesting idea and I implemented it that way, though in
> > > practice all current users of struct kexec_buf put it on the stack so
> > > the field needs to be initialized explicitly.
> > 
> > No need to set it as false because it will be initialized to 0 by
> > default?
> 
> As far as I know, variables on the stack are not initialized. Only global 
> and static variables are.

But designated initializers will do it.

Thanks
Dave

Re: [PATCH v2 2/6] powerpc: kexec_file: Add buffer hand-over support for the next kernel

2016-08-21 Thread Dave Young

On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> The buffer hand-over mechanism allows the currently running kernel to pass
> data to kernel that will be kexec'd via a kexec segment. The second kernel
> can check whether the previous kernel sent data and retrieve it.
> 
> This is the architecture-specific part.
> 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/include/asm/kexec.h   |  12 +++-
>  arch/powerpc/kernel/kexec_elf_64.c |   2 +-
>  arch/powerpc/kernel/machine_kexec_64.c | 114 
> +++--
>  3 files changed, 120 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 31bc64e07c8f..b20738df26f8 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -92,12 +92,20 @@ static inline bool kdump_in_progress(void)
>  }
>  
>  #ifdef CONFIG_KEXEC_FILE
> +#define ARCH_HAS_KIMAGE_ARCH
> +
> +struct kimage_arch {
> + phys_addr_t handover_buffer_addr;
> + unsigned long handover_buffer_size;
> +};
> +
>  int setup_purgatory(struct kimage *image, const void *slave_code,
>   const void *fdt, unsigned long kernel_load_addr,
>   unsigned long fdt_load_addr, unsigned long stack_top,
>   int debug);
> -int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
> -   unsigned long initrd_len, const char *cmdline);
> +int setup_new_fdt(const struct kimage *image, void *fdt,
> +   unsigned long initrd_load_addr, unsigned long initrd_len,
> +   const char *cmdline);
>  bool find_debug_console(const void *fdt, int chosen_node);
>  int merge_partial_dtb(void *to, const void *from);
>  #endif /* CONFIG_KEXEC_FILE */
> diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
> b/arch/powerpc/kernel/kexec_elf_64.c
> index 1b902ad66e2a..22afc7b5ee73 100644
> --- a/arch/powerpc/kernel/kexec_elf_64.c
> +++ b/arch/powerpc/kernel/kexec_elf_64.c
> @@ -219,7 +219,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
>   }
>   }
>  
> - ret = setup_new_fdt(fdt, initrd_load_addr, initrd_len, cmdline);
> + ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline);
>   if (ret)
>   goto out;
>  
> diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
> b/arch/powerpc/kernel/machine_kexec_64.c
> index a484a6346146..190c652e49b7 100644
> --- a/arch/powerpc/kernel/machine_kexec_64.c
> +++ b/arch/powerpc/kernel/machine_kexec_64.c
> @@ -490,6 +490,60 @@ int arch_kimage_file_post_load_cleanup(struct kimage 
> *image)
>   return image->fops->cleanup(image->image_loader_data);
>  }
>  
> +bool kexec_can_hand_over_buffer(void)
> +{
> + return true;
> +}
> +
> +int arch_kexec_add_handover_buffer(struct kimage *image,
> +unsigned long load_addr, unsigned long size)
> +{
> + image->arch.handover_buffer_addr = load_addr;
> + image->arch.handover_buffer_size = size;
> +
> + return 0;
> +}
> +
> +int kexec_get_handover_buffer(void **addr, unsigned long *size)
> +{
> + int ret;
> + u64 start_addr, end_addr;
> +
> + ret = of_property_read_u64(of_chosen,
> +"linux,kexec-handover-buffer-start",
> +_addr);
> + if (ret == -EINVAL)
> + return -ENOENT;
> + else if (ret)
> + return -EINVAL;
> +
> + ret = of_property_read_u64(of_chosen, "linux,kexec-handover-buffer-end",
> +_addr);
> + if (ret == -EINVAL)
> + return -ENOENT;
> + else if (ret)
> + return -EINVAL;
> +
> + *addr =  __va(start_addr);
> + /* -end is the first address after the buffer. */
> + *size = end_addr - start_addr;
> +
> + return 0;
> +}

This depends on dtb, so if IMA want to extend it to arches like x86 in
the future you will have to think about other way to pass it.

How about think about a general way now?

> +
> +int kexec_free_handover_buffer(void)
> +{
> + int ret;
> + void *addr;
> + unsigned long size;
> +
> + ret = kexec_get_handover_buffer(, );
> + if (ret)
> + return ret;
> +
> + return memblock_free((phys_addr_t) addr, size);
> +}
> +
>  /**
>   * arch_kexec_walk_mem() - call func(data) for each unreserved memory block
>   * @kbuf:Context info for the search. Also passed to @func.
> @@ -687,9 +741,52 @@ int setup_purgatory(struct kimage *image, const void 
> *slave_code,
>   return 0;
>  }
>  
> -/*
> - * setup_new_fdt() - modify /chosen and memory reservation for the next 
> kernel
> - * @fdt:
> +/**
> + * setup_handover_buffer() - add properties and reservation for the handover 
> buffer
> + * @image:   kexec image being loaded.
> + * @fdt: Flattened device tree for the next kernel.
> + * @chosen_node: Offset to the chosen node.
> + *
> +

Re: [PATCH v2 2/6] powerpc: kexec_file: Add buffer hand-over support for the next kernel

2016-08-21 Thread Dave Young

On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> The buffer hand-over mechanism allows the currently running kernel to pass
> data to kernel that will be kexec'd via a kexec segment. The second kernel
> can check whether the previous kernel sent data and retrieve it.
> 
> This is the architecture-specific part.
> 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/include/asm/kexec.h   |  12 +++-
>  arch/powerpc/kernel/kexec_elf_64.c |   2 +-
>  arch/powerpc/kernel/machine_kexec_64.c | 114 
> +++--
>  3 files changed, 120 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 31bc64e07c8f..b20738df26f8 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -92,12 +92,20 @@ static inline bool kdump_in_progress(void)
>  }
>  
>  #ifdef CONFIG_KEXEC_FILE
> +#define ARCH_HAS_KIMAGE_ARCH
> +
> +struct kimage_arch {
> + phys_addr_t handover_buffer_addr;
> + unsigned long handover_buffer_size;
> +};
> +
>  int setup_purgatory(struct kimage *image, const void *slave_code,
>   const void *fdt, unsigned long kernel_load_addr,
>   unsigned long fdt_load_addr, unsigned long stack_top,
>   int debug);
> -int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
> -   unsigned long initrd_len, const char *cmdline);
> +int setup_new_fdt(const struct kimage *image, void *fdt,
> +   unsigned long initrd_load_addr, unsigned long initrd_len,
> +   const char *cmdline);
>  bool find_debug_console(const void *fdt, int chosen_node);
>  int merge_partial_dtb(void *to, const void *from);
>  #endif /* CONFIG_KEXEC_FILE */
> diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
> b/arch/powerpc/kernel/kexec_elf_64.c
> index 1b902ad66e2a..22afc7b5ee73 100644
> --- a/arch/powerpc/kernel/kexec_elf_64.c
> +++ b/arch/powerpc/kernel/kexec_elf_64.c
> @@ -219,7 +219,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
>   }
>   }
>  
> - ret = setup_new_fdt(fdt, initrd_load_addr, initrd_len, cmdline);
> + ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline);
>   if (ret)
>   goto out;
>  
> diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
> b/arch/powerpc/kernel/machine_kexec_64.c
> index a484a6346146..190c652e49b7 100644
> --- a/arch/powerpc/kernel/machine_kexec_64.c
> +++ b/arch/powerpc/kernel/machine_kexec_64.c
> @@ -490,6 +490,60 @@ int arch_kimage_file_post_load_cleanup(struct kimage 
> *image)
>   return image->fops->cleanup(image->image_loader_data);
>  }
>  
> +bool kexec_can_hand_over_buffer(void)
> +{
> + return true;
> +}
> +
> +int arch_kexec_add_handover_buffer(struct kimage *image,
> +unsigned long load_addr, unsigned long size)
> +{
> + image->arch.handover_buffer_addr = load_addr;
> + image->arch.handover_buffer_size = size;
> +
> + return 0;
> +}
> +
> +int kexec_get_handover_buffer(void **addr, unsigned long *size)
> +{
> + int ret;
> + u64 start_addr, end_addr;
> +
> + ret = of_property_read_u64(of_chosen,
> +"linux,kexec-handover-buffer-start",
> +_addr);
> + if (ret == -EINVAL)
> + return -ENOENT;
> + else if (ret)
> + return -EINVAL;
> +
> + ret = of_property_read_u64(of_chosen, "linux,kexec-handover-buffer-end",
> +_addr);
> + if (ret == -EINVAL)
> + return -ENOENT;
> + else if (ret)
> + return -EINVAL;
> +
> + *addr =  __va(start_addr);
> + /* -end is the first address after the buffer. */
> + *size = end_addr - start_addr;
> +
> + return 0;
> +}

This depends on dtb, so if IMA want to extend it to arches like x86 in
the future you will have to think about other way to pass it.

How about think about a general way now?

> +
> +int kexec_free_handover_buffer(void)
> +{
> + int ret;
> + void *addr;
> + unsigned long size;
> +
> + ret = kexec_get_handover_buffer(, );
> + if (ret)
> + return ret;
> +
> + return memblock_free((phys_addr_t) addr, size);
> +}
> +
>  /**
>   * arch_kexec_walk_mem() - call func(data) for each unreserved memory block
>   * @kbuf:Context info for the search. Also passed to @func.
> @@ -687,9 +741,52 @@ int setup_purgatory(struct kimage *image, const void 
> *slave_code,
>   return 0;
>  }
>  
> -/*
> - * setup_new_fdt() - modify /chosen and memory reservation for the next 
> kernel
> - * @fdt:
> +/**
> + * setup_handover_buffer() - add properties and reservation for the handover 
> buffer
> + * @image:   kexec image being loaded.
> + * @fdt: Flattened device tree for the next kernel.
> + * @chosen_node: Offset to the chosen node.
> + *
> + * Return: 0 on success,

Re: [PATCH v2 3/6] kexec_file: Allow skipping checksum calculation for some segments.

2016-08-21 Thread Dave Young

On 08/18/16 at 06:09pm, Thiago Jung Bauermann wrote:
> Hello Dave,
> 
> Thanks for your review!
> 
> [ Trimming down Cc: list a little to try to clear the "too many recipients"   
>   mailing list restriction. ]

I also got "too many recipients".. Thanks for the trimming.

> 
> Am Donnerstag, 18 August 2016, 17:03:30 schrieb Dave Young:
> > On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> > > Adds checksum argument to kexec_add_buffer specifying whether the given
> > > segment should be part of the checksum calculation.
> > 
> > Since it is used with add buffer, could it be added to kbuf as a new
> > field?
> 
> I was on the fence about adding it as a new argument to kexec_add_buffer or 
> as a new field to struct kexec_buf. Both alternatives make sense to me. I 
> implemented your suggestion in the patch below, what do you think?
> 
> > Like kbuf.no_checksum, default value is 0 that means checksum is needed
> > if it is 1 then no need a checksum.
> 
> It's an interesting idea and I implemented it that way, though in practice 
> all current users of struct kexec_buf put it on the stack so the field needs 
> to be initialized explicitly.
> 
> -- 
> []'s
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> Subject: [PATCH v2 3/6] kexec_file: Allow skipping checksum calculation for
>  some segments.
> 
> Add skip_checksum member to struct kexec_buf to specify whether the
> corresponding segment should be part of the checksum calculation.
> 
> The next patch will add a way to update segments after a kimage is loaded.
> Segments that will be updated in this way should not be checksummed,
> otherwise they will cause the purgatory checksum verification to fail
> when the machine is rebooted.
> 
> As a bonus, we don't need to special-case the purgatory segment anymore
> to avoid checksumming it.
> 
> Adjust places using struct kexec_buf to set skip_checksum.
> 
> Signed-off-by: Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/kexec_elf_64.c |  5 +++--
>  arch/x86/kernel/crash.c|  3 ++-
>  arch/x86/kernel/kexec-bzimage64.c  |  2 +-
>  include/linux/kexec.h  | 23 ++-
>  kernel/kexec_file.c| 15 +++
>  5 files changed, 27 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
> b/arch/powerpc/kernel/kexec_elf_64.c
> index 22afc7b5ee73..d009f5363968 100644
> --- a/arch/powerpc/kernel/kexec_elf_64.c
> +++ b/arch/powerpc/kernel/kexec_elf_64.c
> @@ -107,7 +107,7 @@ static int elf_exec_load(struct kimage *image, struct 
> elfhdr *ehdr,
>   int ret;
>   size_t i;
>   struct kexec_buf kbuf = { .image = image, .buf_max = ppc64_rma_size,
> -   .top_down = false };
> +   .top_down = false, .skip_checksum = false };

No need to set it as false because it will be initialized to 0 by
default?

>  
>   /* Read in the PT_LOAD segments. */
>   for (i = 0; i < ehdr->e_phnum; i++) {
> @@ -162,7 +162,8 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
>   struct elf_info elf_info;
>   struct fdt_reserve_entry *rsvmap;
>   struct kexec_buf kbuf = { .image = image, .buf_min = 0,
> -   .buf_max = ppc64_rma_size };
> +   .buf_max = ppc64_rma_size,
> +   .skip_checksum = false };
>  
>   ret = build_elf_exec_info(kernel_buf, kernel_len, , _info);
>   if (ret)
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 38a1cdf6aa05..7b8f62c86651 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -617,7 +617,8 @@ int crash_load_segments(struct kimage *image)
>  {
>   int ret;
>   struct kexec_buf kbuf = { .image = image, .buf_min = 0,
> -   .buf_max = ULONG_MAX, .top_down = false };
> +   .buf_max = ULONG_MAX, .top_down = false,
> +   .skip_checksum = false };
>  
>   /*
>* Determine and load a segment for backup area. First 640K RAM
> diff --git a/arch/x86/kernel/kexec-bzimage64.c 
> b/arch/x86/kernel/kexec-bzimage64.c
> index 4b3a75329fb6..449f433cd225 100644
> --- a/arch/x86/kernel/kexec-bzimage64.c
> +++ b/arch/x86/kernel/kexec-bzimage64.c
> @@ -341,7 +341,7 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>   unsigned int setup_hdr_offset = offsetof(struct boot_params, hdr);
>   unsigned int efi_map_offset, efi_map_sz, efi_setup_data_offset;
>   struct kexec_

Re: [PATCH v2 3/6] kexec_file: Allow skipping checksum calculation for some segments.

2016-08-21 Thread Dave Young

On 08/18/16 at 06:09pm, Thiago Jung Bauermann wrote:
> Hello Dave,
> 
> Thanks for your review!
> 
> [ Trimming down Cc: list a little to try to clear the "too many recipients"   
>   mailing list restriction. ]

I also got "too many recipients".. Thanks for the trimming.

> 
> Am Donnerstag, 18 August 2016, 17:03:30 schrieb Dave Young:
> > On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> > > Adds checksum argument to kexec_add_buffer specifying whether the given
> > > segment should be part of the checksum calculation.
> > 
> > Since it is used with add buffer, could it be added to kbuf as a new
> > field?
> 
> I was on the fence about adding it as a new argument to kexec_add_buffer or 
> as a new field to struct kexec_buf. Both alternatives make sense to me. I 
> implemented your suggestion in the patch below, what do you think?
> 
> > Like kbuf.no_checksum, default value is 0 that means checksum is needed
> > if it is 1 then no need a checksum.
> 
> It's an interesting idea and I implemented it that way, though in practice 
> all current users of struct kexec_buf put it on the stack so the field needs 
> to be initialized explicitly.
> 
> -- 
> []'s
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> Subject: [PATCH v2 3/6] kexec_file: Allow skipping checksum calculation for
>  some segments.
> 
> Add skip_checksum member to struct kexec_buf to specify whether the
> corresponding segment should be part of the checksum calculation.
> 
> The next patch will add a way to update segments after a kimage is loaded.
> Segments that will be updated in this way should not be checksummed,
> otherwise they will cause the purgatory checksum verification to fail
> when the machine is rebooted.
> 
> As a bonus, we don't need to special-case the purgatory segment anymore
> to avoid checksumming it.
> 
> Adjust places using struct kexec_buf to set skip_checksum.
> 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/kernel/kexec_elf_64.c |  5 +++--
>  arch/x86/kernel/crash.c|  3 ++-
>  arch/x86/kernel/kexec-bzimage64.c  |  2 +-
>  include/linux/kexec.h  | 23 ++-
>  kernel/kexec_file.c| 15 +++
>  5 files changed, 27 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
> b/arch/powerpc/kernel/kexec_elf_64.c
> index 22afc7b5ee73..d009f5363968 100644
> --- a/arch/powerpc/kernel/kexec_elf_64.c
> +++ b/arch/powerpc/kernel/kexec_elf_64.c
> @@ -107,7 +107,7 @@ static int elf_exec_load(struct kimage *image, struct 
> elfhdr *ehdr,
>   int ret;
>   size_t i;
>   struct kexec_buf kbuf = { .image = image, .buf_max = ppc64_rma_size,
> -   .top_down = false };
> +   .top_down = false, .skip_checksum = false };

No need to set it as false because it will be initialized to 0 by
default?

>  
>   /* Read in the PT_LOAD segments. */
>   for (i = 0; i < ehdr->e_phnum; i++) {
> @@ -162,7 +162,8 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
>   struct elf_info elf_info;
>   struct fdt_reserve_entry *rsvmap;
>   struct kexec_buf kbuf = { .image = image, .buf_min = 0,
> -   .buf_max = ppc64_rma_size };
> +   .buf_max = ppc64_rma_size,
> +   .skip_checksum = false };
>  
>   ret = build_elf_exec_info(kernel_buf, kernel_len, , _info);
>   if (ret)
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 38a1cdf6aa05..7b8f62c86651 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -617,7 +617,8 @@ int crash_load_segments(struct kimage *image)
>  {
>   int ret;
>   struct kexec_buf kbuf = { .image = image, .buf_min = 0,
> -   .buf_max = ULONG_MAX, .top_down = false };
> +   .buf_max = ULONG_MAX, .top_down = false,
> +   .skip_checksum = false };
>  
>   /*
>* Determine and load a segment for backup area. First 640K RAM
> diff --git a/arch/x86/kernel/kexec-bzimage64.c 
> b/arch/x86/kernel/kexec-bzimage64.c
> index 4b3a75329fb6..449f433cd225 100644
> --- a/arch/x86/kernel/kexec-bzimage64.c
> +++ b/arch/x86/kernel/kexec-bzimage64.c
> @@ -341,7 +341,7 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>   unsigned int setup_hdr_offset = offsetof(struct boot_params, hdr);
>   unsigned int efi_map_offset, efi_map_sz, efi_setup_data_offset;
>   struct kexec_buf kbuf = { .image = image

Re: [PATCH v2 3/6] kexec_file: Allow skipping checksum calculation for some segments.

2016-08-18 Thread Dave Young

On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> Adds checksum argument to kexec_add_buffer specifying whether the given
> segment should be part of the checksum calculation.
> 

Since it is used with add buffer, could it be added to kbuf as a new
field?

Like kbuf.no_checksum, default value is 0 that means checksum is needed
if it is 1 then no need a checksum.

> The next patch will add a way to update segments after a kimage is loaded.
> Segments that will be updated in this way should not be checksummed,
> otherwise they will cause the purgatory checksum verification to fail
> when the machine is rebooted.
> 
> As a bonus, we don't need to special-case the purgatory segment anymore
> to avoid checksumming it.
> 
> Adjust call sites for the new argument.
> 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/kernel/kexec_elf_64.c |  6 +++---
>  arch/x86/kernel/crash.c|  4 ++--
>  arch/x86/kernel/kexec-bzimage64.c  |  6 +++---
>  include/linux/kexec.h  | 10 +++---
>  kernel/kexec_file.c| 23 ---
>  5 files changed, 27 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
> b/arch/powerpc/kernel/kexec_elf_64.c
> index 22afc7b5ee73..4c528c81b076 100644
> --- a/arch/powerpc/kernel/kexec_elf_64.c
> +++ b/arch/powerpc/kernel/kexec_elf_64.c
> @@ -128,7 +128,7 @@ static int elf_exec_load(struct kimage *image, struct 
> elfhdr *ehdr,
>   kbuf.memsz = phdr->p_memsz;
>   kbuf.buf_align = phdr->p_align;
>   kbuf.buf_min = phdr->p_paddr + base;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out;
>   load_addr = kbuf.mem;
> @@ -188,7 +188,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
>   kbuf.bufsz = kbuf.memsz = initrd_len;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = false;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out;
>   initrd_load_addr = kbuf.mem;
> @@ -245,7 +245,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
>   kbuf.bufsz = kbuf.memsz = fdt_size;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = true;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out;
>   fdt_load_addr = kbuf.mem;
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 38a1cdf6aa05..634ab16377b1 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -642,7 +642,7 @@ int crash_load_segments(struct kimage *image)
>* copied in purgatory after crash. Just add a zero filled
>* segment for now to make sure checksum logic works fine.
>*/
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   return ret;
>   image->arch.backup_load_addr = kbuf.mem;
> @@ -661,7 +661,7 @@ int crash_load_segments(struct kimage *image)
>  
>   kbuf.memsz = kbuf.bufsz;
>   kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret) {
>   vfree((void *)image->arch.elf_headers);
>   return ret;
> diff --git a/arch/x86/kernel/kexec-bzimage64.c 
> b/arch/x86/kernel/kexec-bzimage64.c
> index 4b3a75329fb6..a46e3fbb0639 100644
> --- a/arch/x86/kernel/kexec-bzimage64.c
> +++ b/arch/x86/kernel/kexec-bzimage64.c
> @@ -422,7 +422,7 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>   kbuf.memsz = kbuf.bufsz;
>   kbuf.buf_align = 16;
>   kbuf.buf_min = MIN_BOOTPARAM_ADDR;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out_free_params;
>   bootparam_load_addr = kbuf.mem;
> @@ -435,7 +435,7 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>   kbuf.memsz = PAGE_ALIGN(header->init_size);
>   kbuf.buf_align = header->kernel_alignment;
>   kbuf.buf_min = MIN_KERNEL_LOAD_ADDR;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out_free_params;
>   kernel_load_addr = kbuf.mem;
> @@ -449,7 +449,7 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>   kbuf.bufsz = kbuf.memsz = initrd_len;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.buf_min = MIN_INITRD_LOAD_ADDR;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out_free_params;
>   initrd_load_addr = kbuf.mem;
> diff --git a/include/linux/kexec.h

Re: [PATCH v2 3/6] kexec_file: Allow skipping checksum calculation for some segments.

2016-08-18 Thread Dave Young

On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> Adds checksum argument to kexec_add_buffer specifying whether the given
> segment should be part of the checksum calculation.
> 

Since it is used with add buffer, could it be added to kbuf as a new
field?

Like kbuf.no_checksum, default value is 0 that means checksum is needed
if it is 1 then no need a checksum.

> The next patch will add a way to update segments after a kimage is loaded.
> Segments that will be updated in this way should not be checksummed,
> otherwise they will cause the purgatory checksum verification to fail
> when the machine is rebooted.
> 
> As a bonus, we don't need to special-case the purgatory segment anymore
> to avoid checksumming it.
> 
> Adjust call sites for the new argument.
> 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/kernel/kexec_elf_64.c |  6 +++---
>  arch/x86/kernel/crash.c|  4 ++--
>  arch/x86/kernel/kexec-bzimage64.c  |  6 +++---
>  include/linux/kexec.h  | 10 +++---
>  kernel/kexec_file.c| 23 ---
>  5 files changed, 27 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
> b/arch/powerpc/kernel/kexec_elf_64.c
> index 22afc7b5ee73..4c528c81b076 100644
> --- a/arch/powerpc/kernel/kexec_elf_64.c
> +++ b/arch/powerpc/kernel/kexec_elf_64.c
> @@ -128,7 +128,7 @@ static int elf_exec_load(struct kimage *image, struct 
> elfhdr *ehdr,
>   kbuf.memsz = phdr->p_memsz;
>   kbuf.buf_align = phdr->p_align;
>   kbuf.buf_min = phdr->p_paddr + base;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out;
>   load_addr = kbuf.mem;
> @@ -188,7 +188,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
>   kbuf.bufsz = kbuf.memsz = initrd_len;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = false;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out;
>   initrd_load_addr = kbuf.mem;
> @@ -245,7 +245,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
>   kbuf.bufsz = kbuf.memsz = fdt_size;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = true;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out;
>   fdt_load_addr = kbuf.mem;
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 38a1cdf6aa05..634ab16377b1 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -642,7 +642,7 @@ int crash_load_segments(struct kimage *image)
>* copied in purgatory after crash. Just add a zero filled
>* segment for now to make sure checksum logic works fine.
>*/
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   return ret;
>   image->arch.backup_load_addr = kbuf.mem;
> @@ -661,7 +661,7 @@ int crash_load_segments(struct kimage *image)
>  
>   kbuf.memsz = kbuf.bufsz;
>   kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret) {
>   vfree((void *)image->arch.elf_headers);
>   return ret;
> diff --git a/arch/x86/kernel/kexec-bzimage64.c 
> b/arch/x86/kernel/kexec-bzimage64.c
> index 4b3a75329fb6..a46e3fbb0639 100644
> --- a/arch/x86/kernel/kexec-bzimage64.c
> +++ b/arch/x86/kernel/kexec-bzimage64.c
> @@ -422,7 +422,7 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>   kbuf.memsz = kbuf.bufsz;
>   kbuf.buf_align = 16;
>   kbuf.buf_min = MIN_BOOTPARAM_ADDR;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out_free_params;
>   bootparam_load_addr = kbuf.mem;
> @@ -435,7 +435,7 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>   kbuf.memsz = PAGE_ALIGN(header->init_size);
>   kbuf.buf_align = header->kernel_alignment;
>   kbuf.buf_min = MIN_KERNEL_LOAD_ADDR;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out_free_params;
>   kernel_load_addr = kbuf.mem;
> @@ -449,7 +449,7 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>   kbuf.bufsz = kbuf.memsz = initrd_len;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.buf_min = MIN_INITRD_LOAD_ADDR;
> - ret = kexec_add_buffer();
> + ret = kexec_add_buffer(, true);
>   if (ret)
>   goto out_free_params;
>   initrd_load_addr = kbuf.mem;
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index

Re: [PATCH v2 1/2] kexec: add dtb info to struct kimage

2016-08-18 Thread Dave Young

On 08/11/16 at 08:03pm, Thiago Jung Bauermann wrote:
> From: AKASHI Takahiro <takahiro.aka...@linaro.org>
> 
> Device tree blob must be passed to a second kernel on DTB-capable
> archs, like powerpc and arm64, but the current kernel interface
> lacks this support.
> 
> This patch adds dtb buffer information to struct kimage.
> When users don't specify dtb explicitly and the one used for the current
> kernel can be re-used, this change will be good enough for implementing
> kexec_file_load feature.
> 
> Signed-off-by: AKASHI Takahiro <takahiro.aka...@linaro.org>
> ---
>  include/linux/kexec.h | 3 +++
>  kernel/kexec_file.c   | 3 +++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index d743baaa..4f85d284ed0b 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -192,6 +192,9 @@ struct kimage {
>   char *cmdline_buf;
>   unsigned long cmdline_buf_len;
>  
> + void *dtb_buf;
> + unsigned long dtb_buf_len;
> +
>   /* File operations provided by image loader */
>   struct kexec_file_ops *fops;
>  
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 503bc2d348e5..113af2f219b9 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -92,6 +92,9 @@ void kimage_file_post_load_cleanup(struct kimage *image)
>   vfree(image->initrd_buf);
>   image->initrd_buf = NULL;
>  
> + vfree(image->dtb_buf);
> + image->dtb_buf = NULL;
> +
>   kfree(image->cmdline_buf);
>   image->cmdline_buf = NULL;
>  
> -- 
> 1.9.1
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Acked-by: Dave Young <dyo...@redhat.com>

Thanks
Dave

Re: [PATCH v2 1/2] kexec: add dtb info to struct kimage

2016-08-18 Thread Dave Young

On 08/11/16 at 08:03pm, Thiago Jung Bauermann wrote:
> From: AKASHI Takahiro 
> 
> Device tree blob must be passed to a second kernel on DTB-capable
> archs, like powerpc and arm64, but the current kernel interface
> lacks this support.
> 
> This patch adds dtb buffer information to struct kimage.
> When users don't specify dtb explicitly and the one used for the current
> kernel can be re-used, this change will be good enough for implementing
> kexec_file_load feature.
> 
> Signed-off-by: AKASHI Takahiro 
> ---
>  include/linux/kexec.h | 3 +++
>  kernel/kexec_file.c   | 3 +++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index d743baaa..4f85d284ed0b 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -192,6 +192,9 @@ struct kimage {
>   char *cmdline_buf;
>   unsigned long cmdline_buf_len;
>  
> + void *dtb_buf;
> + unsigned long dtb_buf_len;
> +
>   /* File operations provided by image loader */
>   struct kexec_file_ops *fops;
>  
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 503bc2d348e5..113af2f219b9 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -92,6 +92,9 @@ void kimage_file_post_load_cleanup(struct kimage *image)
>   vfree(image->initrd_buf);
>   image->initrd_buf = NULL;
>  
> + vfree(image->dtb_buf);
> + image->dtb_buf = NULL;
> +
>   kfree(image->cmdline_buf);
>   image->cmdline_buf = NULL;
>  
> -- 
> 1.9.1
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Acked-by: Dave Young 

Thanks
Dave

Re: [PATCH v2 2/2] kexec: extend kexec_file_load system call

2016-08-18 Thread Dave Young

Since Eric was objecting the extension, I think you should convince him,
but I will review from code point of view.

On 08/11/16 at 08:03pm, Thiago Jung Bauermann wrote:
> From: AKASHI Takahiro 
> 
> Device tree blob must be passed to a second kernel on DTB-capable
> archs, like powerpc and arm64, but the current kernel interface
> lacks this support.
> 
> This patch extends kexec_file_load system call by adding an extra
> argument to this syscall so that an arbitrary number of file descriptors
> can be handed out from user space to the kernel.
> 
>   long sys_kexec_file_load(int kernel_fd, int initrd_fd,
>unsigned long cmdline_len,
>const char __user *cmdline_ptr,
>unsigned long flags,
>const struct kexec_fdset __user *ufdset);
> 
> If KEXEC_FILE_EXTRA_FDS is set to the "flags" argument, the "ufdset"
> argument points to the following struct buffer:
> 
>   struct kexec_fdset {
>   int nr_fds;
>   struct kexec_file_fd fds[0];
>   }
> 
> Signed-off-by: AKASHI Takahiro 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  include/linux/fs.h |  1 +
>  include/linux/kexec.h  |  7 ++--
>  include/linux/syscalls.h   |  4 ++-
>  include/uapi/linux/kexec.h | 22 
>  kernel/kexec_file.c| 83 
> ++
>  5 files changed, 108 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3523bf62f328..847d9c31f428 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2656,6 +2656,7 @@ extern int do_pipe_flags(int *, int);
>   id(MODULE, kernel-module)   \
>   id(KEXEC_IMAGE, kexec-image)\
>   id(KEXEC_INITRAMFS, kexec-initramfs)\
> + id(KEXEC_PARTIAL_DTB, kexec-partial-dtb)\
>   id(POLICY, security-policy) \
>   id(MAX_ID, )
>  
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 4f85d284ed0b..29202935055d 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -148,7 +148,10 @@ struct kexec_file_ops {
>   kexec_verify_sig_t *verify_sig;
>  #endif
>  };
> -#endif
> +
> +int __weak arch_kexec_verify_buffer(enum kexec_file_type type, const void 
> *buf,
> + unsigned long size);
> +#endif /* CONFIG_KEXEC_FILE */
>  
>  struct kimage {
>   kimage_entry_t head;
> @@ -280,7 +283,7 @@ extern int kexec_load_disabled;
>  
>  /* List of defined/legal kexec file flags */
>  #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \
> -  KEXEC_FILE_NO_INITRAMFS)
> +  KEXEC_FILE_NO_INITRAMFS | KEXEC_FILE_EXTRA_FDS)
>  
>  #define VMCOREINFO_BYTES   (4096)
>  #define VMCOREINFO_NOTE_NAME   "VMCOREINFO"
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index d02239022bd0..fc072bdb74e3 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -66,6 +66,7 @@ struct perf_event_attr;
>  struct file_handle;
>  struct sigaltstack;
>  union bpf_attr;
> +struct kexec_fdset;
>  
>  #include 
>  #include 
> @@ -321,7 +322,8 @@ asmlinkage long sys_kexec_load(unsigned long entry, 
> unsigned long nr_segments,
>  asmlinkage long sys_kexec_file_load(int kernel_fd, int initrd_fd,
>   unsigned long cmdline_len,
>   const char __user *cmdline_ptr,
> - unsigned long flags);
> + unsigned long flags,
> + const struct kexec_fdset __user *ufdset);
>  
>  asmlinkage long sys_exit(int error_code);
>  asmlinkage long sys_exit_group(int error_code);
> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
> index aae5ebf2022b..6279be79efba 100644
> --- a/include/uapi/linux/kexec.h
> +++ b/include/uapi/linux/kexec.h
> @@ -23,6 +23,28 @@
>  #define KEXEC_FILE_UNLOAD0x0001
>  #define KEXEC_FILE_ON_CRASH  0x0002
>  #define KEXEC_FILE_NO_INITRAMFS  0x0004
> +#define KEXEC_FILE_EXTRA_FDS 0x0008
> +
> +enum kexec_file_type {
> + KEXEC_FILE_TYPE_KERNEL,
> + KEXEC_FILE_TYPE_INITRAMFS,
> +
> + /*
> +  * Device Tree Blob containing just the nodes and properties that
> +  * the kexec_file_load caller wants to add or modify.
> +  */
> + KEXEC_FILE_TYPE_PARTIAL_DTB,
> +};
> +
> +struct kexec_file_fd {
> + enum kexec_file_type type;
> + int fd;
> +};
> +
> +struct kexec_fdset {
> + int nr_fds;
> + struct kexec_file_fd fds[0];
> +};
>  
>  /* These values match the ELF architecture values.
>   * Unless there is a good reason that should continue to be the case.
> diff --git a/kernel/kexec_file.c

Re: [PATCH v2 2/2] kexec: extend kexec_file_load system call

2016-08-18 Thread Dave Young

Since Eric was objecting the extension, I think you should convince him,
but I will review from code point of view.

On 08/11/16 at 08:03pm, Thiago Jung Bauermann wrote:
> From: AKASHI Takahiro 
> 
> Device tree blob must be passed to a second kernel on DTB-capable
> archs, like powerpc and arm64, but the current kernel interface
> lacks this support.
> 
> This patch extends kexec_file_load system call by adding an extra
> argument to this syscall so that an arbitrary number of file descriptors
> can be handed out from user space to the kernel.
> 
>   long sys_kexec_file_load(int kernel_fd, int initrd_fd,
>unsigned long cmdline_len,
>const char __user *cmdline_ptr,
>unsigned long flags,
>const struct kexec_fdset __user *ufdset);
> 
> If KEXEC_FILE_EXTRA_FDS is set to the "flags" argument, the "ufdset"
> argument points to the following struct buffer:
> 
>   struct kexec_fdset {
>   int nr_fds;
>   struct kexec_file_fd fds[0];
>   }
> 
> Signed-off-by: AKASHI Takahiro 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  include/linux/fs.h |  1 +
>  include/linux/kexec.h  |  7 ++--
>  include/linux/syscalls.h   |  4 ++-
>  include/uapi/linux/kexec.h | 22 
>  kernel/kexec_file.c| 83 
> ++
>  5 files changed, 108 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3523bf62f328..847d9c31f428 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2656,6 +2656,7 @@ extern int do_pipe_flags(int *, int);
>   id(MODULE, kernel-module)   \
>   id(KEXEC_IMAGE, kexec-image)\
>   id(KEXEC_INITRAMFS, kexec-initramfs)\
> + id(KEXEC_PARTIAL_DTB, kexec-partial-dtb)\
>   id(POLICY, security-policy) \
>   id(MAX_ID, )
>  
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 4f85d284ed0b..29202935055d 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -148,7 +148,10 @@ struct kexec_file_ops {
>   kexec_verify_sig_t *verify_sig;
>  #endif
>  };
> -#endif
> +
> +int __weak arch_kexec_verify_buffer(enum kexec_file_type type, const void 
> *buf,
> + unsigned long size);
> +#endif /* CONFIG_KEXEC_FILE */
>  
>  struct kimage {
>   kimage_entry_t head;
> @@ -280,7 +283,7 @@ extern int kexec_load_disabled;
>  
>  /* List of defined/legal kexec file flags */
>  #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \
> -  KEXEC_FILE_NO_INITRAMFS)
> +  KEXEC_FILE_NO_INITRAMFS | KEXEC_FILE_EXTRA_FDS)
>  
>  #define VMCOREINFO_BYTES   (4096)
>  #define VMCOREINFO_NOTE_NAME   "VMCOREINFO"
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index d02239022bd0..fc072bdb74e3 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -66,6 +66,7 @@ struct perf_event_attr;
>  struct file_handle;
>  struct sigaltstack;
>  union bpf_attr;
> +struct kexec_fdset;
>  
>  #include 
>  #include 
> @@ -321,7 +322,8 @@ asmlinkage long sys_kexec_load(unsigned long entry, 
> unsigned long nr_segments,
>  asmlinkage long sys_kexec_file_load(int kernel_fd, int initrd_fd,
>   unsigned long cmdline_len,
>   const char __user *cmdline_ptr,
> - unsigned long flags);
> + unsigned long flags,
> + const struct kexec_fdset __user *ufdset);
>  
>  asmlinkage long sys_exit(int error_code);
>  asmlinkage long sys_exit_group(int error_code);
> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
> index aae5ebf2022b..6279be79efba 100644
> --- a/include/uapi/linux/kexec.h
> +++ b/include/uapi/linux/kexec.h
> @@ -23,6 +23,28 @@
>  #define KEXEC_FILE_UNLOAD0x0001
>  #define KEXEC_FILE_ON_CRASH  0x0002
>  #define KEXEC_FILE_NO_INITRAMFS  0x0004
> +#define KEXEC_FILE_EXTRA_FDS 0x0008
> +
> +enum kexec_file_type {
> + KEXEC_FILE_TYPE_KERNEL,
> + KEXEC_FILE_TYPE_INITRAMFS,
> +
> + /*
> +  * Device Tree Blob containing just the nodes and properties that
> +  * the kexec_file_load caller wants to add or modify.
> +  */
> + KEXEC_FILE_TYPE_PARTIAL_DTB,
> +};
> +
> +struct kexec_file_fd {
> + enum kexec_file_type type;
> + int fd;
> +};
> +
> +struct kexec_fdset {
> + int nr_fds;
> + struct kexec_file_fd fds[0];
> +};
>  
>  /* These values match the ELF architecture values.
>   * Unless there is a good reason that should continue to be the case.
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 113af2f219b9..d6803dd884e2 100644
> --- a/kernel/kexec_file.c
> +++

Re: [PATCH v8 1/2] Documentation: kdump: remind user of nr_cpus

2016-08-18 Thread Dave Young

On 08/17/16 at 07:36pm, Joe Perches wrote:
> On Thu, 2016-08-18 at 10:31 +0800, Zhou Wenjian wrote:
> > nr_cpus can help to save memory. So we should remind user of it.
> 
> trivia:
> > diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
> []
> > @@ -390,9 +390,11 @@ Notes on loading the dump-capture kernel:
> >  * Boot parameter "1" boots the dump-capture kernel into single-user
> >    mode without networking. If you want networking, use "3".
> >  
> > -* We generally don' have to bring up a SMP kernel just to capture the
> > +* We generally don' have to bring up an SMP kernel just to capture the
> 
> don't or do not
> 

Use do not is better, also need replace 'We' with 'You' to be
consistent with other part.

Re: [PATCH v8 1/2] Documentation: kdump: remind user of nr_cpus

2016-08-18 Thread Dave Young

On 08/17/16 at 07:36pm, Joe Perches wrote:
> On Thu, 2016-08-18 at 10:31 +0800, Zhou Wenjian wrote:
> > nr_cpus can help to save memory. So we should remind user of it.
> 
> trivia:
> > diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
> []
> > @@ -390,9 +390,11 @@ Notes on loading the dump-capture kernel:
> >  * Boot parameter "1" boots the dump-capture kernel into single-user
> >    mode without networking. If you want networking, use "3".
> >  
> > -* We generally don' have to bring up a SMP kernel just to capture the
> > +* We generally don' have to bring up an SMP kernel just to capture the
> 
> don't or do not
> 

Use do not is better, also need replace 'We' with 'You' to be
consistent with other part.

Re: [PATCH] Map in physical addresses in efi_map_region_fixed

2016-08-18 Thread Dave Young

On 08/17/16 at 11:00am, Alex Thorlton wrote:
> On Wed, Aug 17, 2016 at 03:01:51PM +0800, Dave Young wrote:
> > > > Why do you guys need the physical mapping all of a sudden?
> > > 
> > > It's not that we need it all of the sudden, necessarily, it's just that
> > > we've had to make other changes to make things work with the new,
> > > (almost) completely isolated, EFI page tables.  We ended up choosing the
> > > lesser of two evils, and have decided to temporarily rely on the
> > > physical address of our runtime code, instead of continuing to rely on
> > > EFI_OLD_MEMMAP.
> > 
> > In efi_map_region, there is already mapped md->phys_addr for broken
> > firmware. SGI still need EFI_OLD_MEMMAP? I means in 1st kernel instead
> > of kexec kernel.
> 
> We're actually in the middle of trying to move *away* from
> EFI_OLD_MEMMAP, which is why we're just starting to uncover some of
> these things.  efi_map_region covers us on the primary kernel, because
> it maps in the physical address of each md (as you note here), but that
> little piece is missing in the kexec'd kernel.  So, our primary kernel
> works without efi=old_map, but the second kernel won't, without this
> change (supplying "noefi" on the kexec command line also works, but then
> we don't have any of our runtime stuff available).
> 
> As noted in a previous message, we're aware that our code needs a little
> more work to be "perfect," but this small change buys us most of (all
> of?) the stuff we'd get by implementing the other changes that we're
> aware we need to make, i.e. update our runtime function pointer to its
> efi_va during SetVirtualAddressMap, at least from a kexec perspective.

Thanks for explanation, I still do not get why the original ioremap way
works if SetVirtualAddressMapdo does not update runtime function
pointer.

But if it fixes the problem in primary kernel, it should be fine to do
same in kexec kernel. 

Thanks
Dave

Re: [PATCH] Map in physical addresses in efi_map_region_fixed

2016-08-18 Thread Dave Young

On 08/17/16 at 11:00am, Alex Thorlton wrote:
> On Wed, Aug 17, 2016 at 03:01:51PM +0800, Dave Young wrote:
> > > > Why do you guys need the physical mapping all of a sudden?
> > > 
> > > It's not that we need it all of the sudden, necessarily, it's just that
> > > we've had to make other changes to make things work with the new,
> > > (almost) completely isolated, EFI page tables.  We ended up choosing the
> > > lesser of two evils, and have decided to temporarily rely on the
> > > physical address of our runtime code, instead of continuing to rely on
> > > EFI_OLD_MEMMAP.
> > 
> > In efi_map_region, there is already mapped md->phys_addr for broken
> > firmware. SGI still need EFI_OLD_MEMMAP? I means in 1st kernel instead
> > of kexec kernel.
> 
> We're actually in the middle of trying to move *away* from
> EFI_OLD_MEMMAP, which is why we're just starting to uncover some of
> these things.  efi_map_region covers us on the primary kernel, because
> it maps in the physical address of each md (as you note here), but that
> little piece is missing in the kexec'd kernel.  So, our primary kernel
> works without efi=old_map, but the second kernel won't, without this
> change (supplying "noefi" on the kexec command line also works, but then
> we don't have any of our runtime stuff available).
> 
> As noted in a previous message, we're aware that our code needs a little
> more work to be "perfect," but this small change buys us most of (all
> of?) the stuff we'd get by implementing the other changes that we're
> aware we need to make, i.e. update our runtime function pointer to its
> efi_va during SetVirtualAddressMap, at least from a kexec perspective.

Thanks for explanation, I still do not get why the original ioremap way
works if SetVirtualAddressMapdo does not update runtime function
pointer.

But if it fixes the problem in primary kernel, it should be fine to do
same in kexec kernel. 

Thanks
Dave

Re: [PATCH v2 1/2] kexec: Introduce "/sys/kernel/kexec_crash_low_size"

2016-08-17 Thread Dave Young

On 08/17/16 at 09:50am, Xunlei Pang wrote:
> "/sys/kernel/kexec_crash_size" only handles crashk_res, it
> is fine in most cases, but sometimes we have crashk_low_res.
> For example, when "crashkernel=size[KMG],high" combined with
> "crashkernel=size[KMG],low" is used for 64-bit x86.
> 
> Like crashk_res, we introduce the corresponding sysfs file
> "/sys/kernel/kexec_crash_low_size" for crashk_low_res.
> 
> So, the exact total reserved memory is the sum of the two.
> 
> crashk_low_res can also be shrunk via this new interface,
> and users should be aware of what they are doing.

Cc Yinghai Lu for review since he introduced the ,high and ,low logic.

> 
> Suggested-by: Dave Young <dyo...@redhat.com>
> Signed-off-by: Xunlei Pang <xlp...@redhat.com>
> ---
>  include/linux/kexec.h |  4 ++--
>  kernel/kexec_core.c   | 23 ---
>  kernel/ksysfs.c   | 25 +++--
>  3 files changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index d743777..4f271fc 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -304,8 +304,8 @@ int parse_crashkernel_high(char *cmdline, unsigned long 
> long system_ram,
>   unsigned long long *crash_size, unsigned long long *crash_base);
>  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>   unsigned long long *crash_size, unsigned long long *crash_base);
> -int crash_shrink_memory(unsigned long new_size);
> -size_t crash_get_memory_size(void);
> +int crash_shrink_memory(struct resource *res, unsigned long new_size);
> +size_t crash_get_memory_size(struct resource *res);
>  void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
>  
>  int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..707d18e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -925,13 +925,13 @@ void crash_kexec(struct pt_regs *regs)
>   }
>  }
>  
> -size_t crash_get_memory_size(void)
> +size_t crash_get_memory_size(struct resource *res)
>  {
>   size_t size = 0;
>  
>   mutex_lock(_mutex);
> - if (crashk_res.end != crashk_res.start)
> - size = resource_size(_res);
> + if (res->end != res->start)
> + size = resource_size(res);
>   mutex_unlock(_mutex);
>   return size;
>  }
> @@ -945,7 +945,7 @@ void __weak crash_free_reserved_phys_range(unsigned long 
> begin,
>   free_reserved_page(boot_pfn_to_page(addr >> PAGE_SHIFT));
>  }
>  
> -int crash_shrink_memory(unsigned long new_size)
> +int crash_shrink_memory(struct resource *res, unsigned long new_size)
>  {
>   int ret = 0;
>   unsigned long start, end;
> @@ -958,8 +958,9 @@ int crash_shrink_memory(unsigned long new_size)
>   ret = -ENOENT;
>   goto unlock;
>   }
> - start = crashk_res.start;
> - end = crashk_res.end;
> +
> + start = res->start;
> + end = res->end;
>   old_size = (end == 0) ? 0 : end - start + 1;
>   if (new_size >= old_size) {
>   ret = (new_size == old_size) ? 0 : -EINVAL;
> @@ -975,17 +976,17 @@ int crash_shrink_memory(unsigned long new_size)
>   start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
>   end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
>  
> - crash_free_reserved_phys_range(end, crashk_res.end);
> + crash_free_reserved_phys_range(end, res->end);
>  
> - if ((start == end) && (crashk_res.parent != NULL))
> - release_resource(_res);
> + if ((start == end) && (res->parent != NULL))
> + release_resource(res);
>  
>   ram_res->start = end;
> - ram_res->end = crashk_res.end;
> + ram_res->end = res->end;
>   ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM;
>   ram_res->name = "System RAM";
>  
> - crashk_res.end = end - 1;
> + res->end = end - 1;
>  
>   insert_resource(_resource, ram_res);
>  
> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> index ee1bc1b..3336fd5 100644
> --- a/kernel/ksysfs.c
> +++ b/kernel/ksysfs.c
> @@ -105,10 +105,30 @@ static ssize_t kexec_crash_loaded_show(struct kobject 
> *kobj,
>  }
>  KERNEL_ATTR_RO(kexec_crash_loaded);
>  
> +static ssize_t kexec_crash_low_size_show(struct kobject *kobj,
> +struct kobj_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "%zu\n", crash_get_memory_siz

Re: [PATCH v2 1/2] kexec: Introduce "/sys/kernel/kexec_crash_low_size"

2016-08-17 Thread Dave Young

On 08/17/16 at 09:50am, Xunlei Pang wrote:
> "/sys/kernel/kexec_crash_size" only handles crashk_res, it
> is fine in most cases, but sometimes we have crashk_low_res.
> For example, when "crashkernel=size[KMG],high" combined with
> "crashkernel=size[KMG],low" is used for 64-bit x86.
> 
> Like crashk_res, we introduce the corresponding sysfs file
> "/sys/kernel/kexec_crash_low_size" for crashk_low_res.
> 
> So, the exact total reserved memory is the sum of the two.
> 
> crashk_low_res can also be shrunk via this new interface,
> and users should be aware of what they are doing.

Cc Yinghai Lu for review since he introduced the ,high and ,low logic.

> 
> Suggested-by: Dave Young 
> Signed-off-by: Xunlei Pang 
> ---
>  include/linux/kexec.h |  4 ++--
>  kernel/kexec_core.c   | 23 ---
>  kernel/ksysfs.c   | 25 +++--
>  3 files changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index d743777..4f271fc 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -304,8 +304,8 @@ int parse_crashkernel_high(char *cmdline, unsigned long 
> long system_ram,
>   unsigned long long *crash_size, unsigned long long *crash_base);
>  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>   unsigned long long *crash_size, unsigned long long *crash_base);
> -int crash_shrink_memory(unsigned long new_size);
> -size_t crash_get_memory_size(void);
> +int crash_shrink_memory(struct resource *res, unsigned long new_size);
> +size_t crash_get_memory_size(struct resource *res);
>  void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
>  
>  int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..707d18e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -925,13 +925,13 @@ void crash_kexec(struct pt_regs *regs)
>   }
>  }
>  
> -size_t crash_get_memory_size(void)
> +size_t crash_get_memory_size(struct resource *res)
>  {
>   size_t size = 0;
>  
>   mutex_lock(_mutex);
> - if (crashk_res.end != crashk_res.start)
> - size = resource_size(_res);
> + if (res->end != res->start)
> + size = resource_size(res);
>   mutex_unlock(_mutex);
>   return size;
>  }
> @@ -945,7 +945,7 @@ void __weak crash_free_reserved_phys_range(unsigned long 
> begin,
>   free_reserved_page(boot_pfn_to_page(addr >> PAGE_SHIFT));
>  }
>  
> -int crash_shrink_memory(unsigned long new_size)
> +int crash_shrink_memory(struct resource *res, unsigned long new_size)
>  {
>   int ret = 0;
>   unsigned long start, end;
> @@ -958,8 +958,9 @@ int crash_shrink_memory(unsigned long new_size)
>   ret = -ENOENT;
>   goto unlock;
>   }
> - start = crashk_res.start;
> - end = crashk_res.end;
> +
> + start = res->start;
> + end = res->end;
>   old_size = (end == 0) ? 0 : end - start + 1;
>   if (new_size >= old_size) {
>   ret = (new_size == old_size) ? 0 : -EINVAL;
> @@ -975,17 +976,17 @@ int crash_shrink_memory(unsigned long new_size)
>   start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
>   end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
>  
> - crash_free_reserved_phys_range(end, crashk_res.end);
> + crash_free_reserved_phys_range(end, res->end);
>  
> - if ((start == end) && (crashk_res.parent != NULL))
> - release_resource(_res);
> + if ((start == end) && (res->parent != NULL))
> + release_resource(res);
>  
>   ram_res->start = end;
> - ram_res->end = crashk_res.end;
> + ram_res->end = res->end;
>   ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM;
>   ram_res->name = "System RAM";
>  
> - crashk_res.end = end - 1;
> + res->end = end - 1;
>  
>   insert_resource(_resource, ram_res);
>  
> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> index ee1bc1b..3336fd5 100644
> --- a/kernel/ksysfs.c
> +++ b/kernel/ksysfs.c
> @@ -105,10 +105,30 @@ static ssize_t kexec_crash_loaded_show(struct kobject 
> *kobj,
>  }
>  KERNEL_ATTR_RO(kexec_crash_loaded);
>  
> +static ssize_t kexec_crash_low_size_show(struct kobject *kobj,
> +struct kobj_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "%zu\n", crash_get_memory_size(_low_res));
> +}
> +static ssize_t kexec_

Re: [PATCH v2 2/2] kexec: Consider crashk_low_res in sanity_check_segment_list()

2016-08-17 Thread Dave Young

Hi, Xunlei,

On 08/17/16 at 09:50am, Xunlei Pang wrote:
> We have crashk_res only in most cases, but sometimes we have
> crashk_low_res.
> 
> For example, on 64-bit x86 systems, when "crashkernel=32M,high"
> combined with "crashkernel=128M,low" is used, so some segments
> may have the chance to be loaded into crashk_low_res area. We
> can't fail it as a memory violation in these cases.
> 
> Thus, we add the case to regard the segment as valid if it is
> within crashk_low_res.

crashkernel low is meant for swiotlb, it can be reserved automaticlly
in case there's only crashkernel high specified in cmdline, I'm not
sure it is useful to use crashk_res_low for other purpose and
likely kdump can fail in the case. 

I'm not sure it is really necessary to add this check now, we may
handle it only when there is an actual use case and bug report in
the future.

Thanks
Dave
> 
> Signed-off-by: Xunlei Pang 
> ---
>  kernel/kexec_core.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 707d18e..9012a60 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -248,9 +248,14 @@ int sanity_check_segment_list(struct kimage *image)
>   mstart = image->segment[i].mem;
>   mend = mstart + image->segment[i].memsz - 1;
>   /* Ensure we are within the crash kernel limits */
> - if ((mstart < phys_to_boot_phys(crashk_res.start)) ||
> - (mend > phys_to_boot_phys(crashk_res.end)))
> - return -EADDRNOTAVAIL;
> + if ((mstart >= phys_to_boot_phys(crashk_res.start)) &&
> + (mend <= phys_to_boot_phys(crashk_res.end)))
> + continue;
> + if ((mstart >= phys_to_boot_phys(crashk_low_res.start)) 
> &&
> + (mend <= phys_to_boot_phys(crashk_low_res.end)))
> + continue;
> +
> + return -EADDRNOTAVAIL;
>   }
>   }
>  
> -- 
> 1.8.3.1
>

Re: [PATCH v2 2/2] kexec: Consider crashk_low_res in sanity_check_segment_list()

2016-08-17 Thread Dave Young

Hi, Xunlei,

On 08/17/16 at 09:50am, Xunlei Pang wrote:
> We have crashk_res only in most cases, but sometimes we have
> crashk_low_res.
> 
> For example, on 64-bit x86 systems, when "crashkernel=32M,high"
> combined with "crashkernel=128M,low" is used, so some segments
> may have the chance to be loaded into crashk_low_res area. We
> can't fail it as a memory violation in these cases.
> 
> Thus, we add the case to regard the segment as valid if it is
> within crashk_low_res.

crashkernel low is meant for swiotlb, it can be reserved automaticlly
in case there's only crashkernel high specified in cmdline, I'm not
sure it is useful to use crashk_res_low for other purpose and
likely kdump can fail in the case. 

I'm not sure it is really necessary to add this check now, we may
handle it only when there is an actual use case and bug report in
the future.

Thanks
Dave
> 
> Signed-off-by: Xunlei Pang 
> ---
>  kernel/kexec_core.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 707d18e..9012a60 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -248,9 +248,14 @@ int sanity_check_segment_list(struct kimage *image)
>   mstart = image->segment[i].mem;
>   mend = mstart + image->segment[i].memsz - 1;
>   /* Ensure we are within the crash kernel limits */
> - if ((mstart < phys_to_boot_phys(crashk_res.start)) ||
> - (mend > phys_to_boot_phys(crashk_res.end)))
> - return -EADDRNOTAVAIL;
> + if ((mstart >= phys_to_boot_phys(crashk_res.start)) &&
> + (mend <= phys_to_boot_phys(crashk_res.end)))
> + continue;
> + if ((mstart >= phys_to_boot_phys(crashk_low_res.start)) 
> &&
> + (mend <= phys_to_boot_phys(crashk_low_res.end)))
> + continue;
> +
> + return -EADDRNOTAVAIL;
>   }
>   }
>  
> -- 
> 1.8.3.1
>

Re: [PATCH] Map in physical addresses in efi_map_region_fixed

2016-08-17 Thread Dave Young

> > Why do you guys need the physical mapping all of a sudden?
> 
> It's not that we need it all of the sudden, necessarily, it's just that
> we've had to make other changes to make things work with the new,
> (almost) completely isolated, EFI page tables.  We ended up choosing the
> lesser of two evils, and have decided to temporarily rely on the
> physical address of our runtime code, instead of continuing to rely on
> EFI_OLD_MEMMAP.

In efi_map_region, there is already mapped md->phys_addr for broken
firmware. SGI still need EFI_OLD_MEMMAP? I means in 1st kernel instead
of kexec kernel.

void __init efi_map_region(efi_memory_desc_t *md)
{
unsigned long size = md->num_pages << PAGE_SHIFT;
u64 pa = md->phys_addr;

if (efi_enabled(EFI_OLD_MEMMAP))
return old_map_region(md);

/*
 * Make sure the 1:1 mappings are present as a catch-all for
 * b0rked
 * firmware which doesn't update all internal pointers after
 * switching
 * to virtual mode and would otherwise crap on us.
 */
__map_region(md, md->phys_addr);

[snip]

Thanks
Dave

Re: [PATCH] Map in physical addresses in efi_map_region_fixed

2016-08-17 Thread Dave Young

> > Why do you guys need the physical mapping all of a sudden?
> 
> It's not that we need it all of the sudden, necessarily, it's just that
> we've had to make other changes to make things work with the new,
> (almost) completely isolated, EFI page tables.  We ended up choosing the
> lesser of two evils, and have decided to temporarily rely on the
> physical address of our runtime code, instead of continuing to rely on
> EFI_OLD_MEMMAP.

In efi_map_region, there is already mapped md->phys_addr for broken
firmware. SGI still need EFI_OLD_MEMMAP? I means in 1st kernel instead
of kexec kernel.

void __init efi_map_region(efi_memory_desc_t *md)
{
unsigned long size = md->num_pages << PAGE_SHIFT;
u64 pa = md->phys_addr;

if (efi_enabled(EFI_OLD_MEMMAP))
return old_map_region(md);

/*
 * Make sure the 1:1 mappings are present as a catch-all for
 * b0rked
 * firmware which doesn't update all internal pointers after
 * switching
 * to virtual mode and would otherwise crap on us.
 */
__map_region(md, md->phys_addr);

[snip]

Thanks
Dave

Re: [PATCH] x86/efi-bgrt: remove the check of the version field

2016-08-16 Thread Dave Young

On 08/15/16 at 01:56pm, Matt Fleming wrote:
> On Tue, 09 Aug, at 01:25:46PM, Icenowy Zheng wrote:
> > Some broken firmwares have a wrongly filled version field in BGRT table.
> > (See http://wiki.osdev.org/Broken_UEFI_implementations )
> > 
> > As we know, these firmwares can also provide correct BGRT image, although
> > the table is wrong.
> > 
> > After removing the check of the version field, the kernel can now extract
> > the image correctly, and the information is also correct.
> > 
> > Tested on a Thinkpad E531 (68854UC).
> > 
> > Signed-off-by: Icenowy Zheng 
> > ---
> >  arch/x86/platform/efi/efi-bgrt.c | 5 -
> >  1 file changed, 5 deletions(-)
> > 
> > diff --git a/arch/x86/platform/efi/efi-bgrt.c 
> > b/arch/x86/platform/efi/efi-bgrt.c
> > index 6a2f569..f492ea0 100644
> > --- a/arch/x86/platform/efi/efi-bgrt.c
> > +++ b/arch/x86/platform/efi/efi-bgrt.c
> > @@ -47,11 +47,6 @@ void __init efi_bgrt_init(void)
> >bgrt_tab->header.length, sizeof(*bgrt_tab));
> > return;
> > }
> > -   if (bgrt_tab->version != 1) {
> > -   pr_notice("Ignoring BGRT: invalid version %u (expected 1)\n",
> > -  bgrt_tab->version);
> > -   return;
> > -   }
> > if (bgrt_tab->status & 0xfe) {
> > pr_notice("Ignoring BGRT: reserved status bits are non-zero 
> > %u\n",
> >bgrt_tab->status);
> 
> This would be less scary if we checked for known broken and known good
> version values instead of removing the check altogether, i.e. 0 and 1.

Could we add some quirk for these broken hardware instead of changing
the normal code?

> 
> The whole point of the version field is that it tells us about the
> layout of the BGRT table, so it's not exactly a useless check.

Agreed.

Thanks
Dave

Re: [PATCH] x86/efi-bgrt: remove the check of the version field

2016-08-16 Thread Dave Young

On 08/15/16 at 01:56pm, Matt Fleming wrote:
> On Tue, 09 Aug, at 01:25:46PM, Icenowy Zheng wrote:
> > Some broken firmwares have a wrongly filled version field in BGRT table.
> > (See http://wiki.osdev.org/Broken_UEFI_implementations )
> > 
> > As we know, these firmwares can also provide correct BGRT image, although
> > the table is wrong.
> > 
> > After removing the check of the version field, the kernel can now extract
> > the image correctly, and the information is also correct.
> > 
> > Tested on a Thinkpad E531 (68854UC).
> > 
> > Signed-off-by: Icenowy Zheng 
> > ---
> >  arch/x86/platform/efi/efi-bgrt.c | 5 -
> >  1 file changed, 5 deletions(-)
> > 
> > diff --git a/arch/x86/platform/efi/efi-bgrt.c 
> > b/arch/x86/platform/efi/efi-bgrt.c
> > index 6a2f569..f492ea0 100644
> > --- a/arch/x86/platform/efi/efi-bgrt.c
> > +++ b/arch/x86/platform/efi/efi-bgrt.c
> > @@ -47,11 +47,6 @@ void __init efi_bgrt_init(void)
> >bgrt_tab->header.length, sizeof(*bgrt_tab));
> > return;
> > }
> > -   if (bgrt_tab->version != 1) {
> > -   pr_notice("Ignoring BGRT: invalid version %u (expected 1)\n",
> > -  bgrt_tab->version);
> > -   return;
> > -   }
> > if (bgrt_tab->status & 0xfe) {
> > pr_notice("Ignoring BGRT: reserved status bits are non-zero 
> > %u\n",
> >bgrt_tab->status);
> 
> This would be less scary if we checked for known broken and known good
> version values instead of removing the check altogether, i.e. 0 and 1.

Could we add some quirk for these broken hardware instead of changing
the normal code?

> 
> The whole point of the version field is that it tells us about the
> layout of the BGRT table, so it's not exactly a useless check.

Agreed.

Thanks
Dave

Re: [PATCH v2 0/6] kexec_file: Add buffer hand-over for the next kernel

2016-08-16 Thread Dave Young

On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> Hello,
> 
> This patch series implements a mechanism which allows the kernel to pass
> on a buffer to the kernel that will be kexec'd. This buffer is passed
> as a segment which is added to the kimage when it is being prepared
> by kexec_file_load.
> 
> How the second kernel is informed of this buffer is architecture-specific.
> On powerpc, this is done via the device tree, by checking
> the properties /chosen/linux,kexec-handover-buffer-start and
> /chosen/linux,kexec-handover-buffer-end, which is analogous to how the
> kernel finds the initrd.
> 
> This is needed because the Integrity Measurement Architecture subsystem
> needs to preserve its measurement list accross the kexec reboot. The
> following patch series for the IMA subsystem uses this feature for that
> purpose:
> 
> https://lists.infradead.org/pipermail/kexec/2016-August/016745.html
> 
> This is so that IMA can implement trusted boot support on the OpenPower
> platform, because on such systems an intermediary Linux instance running
> as part of the firmware is used to boot the target operating system via
> kexec. Using this mechanism, IMA on this intermediary instance can
> hand over to the target OS the measurements of the components that were
> used to boot it.
> 
> Because there could be additional measurement events between the
> kexec_file_load call and the actual reboot, IMA needs a way to update the
> buffer with those additional events before rebooting. One can minimize
> the interval between the kexec_file_load and the reboot syscalls, but as
> small as it can be, there is always the possibility that the measurement
> list will be out of date at the time of reboot.
> 
> To address this issue, this patch series also introduces
> kexec_update_segment, which allows a reboot notifier to change the
> contents of the image segment during the reboot process.
> 
> Patch 5 makes kimage_load_normal_segment and kexec_update_segment share
> code. It's not much code that they can share though, so I'm not sure if
> the result is actually better.
> 
> The last patch is not intended to be merged, it just demonstrates how
> this feature can be used.
> 
> This series applies on top of v5 of the "kexec_file_load implementation
> for PowerPC" patch series (which applies on top of v4.8-rc1):
> 
> https://lists.infradead.org/pipermail/kexec/2016-August/016843.html

I'm trying to review your patches, but seems I can not apply them
cleanly to mainline kernel or v4.8-rc1

Apply the kexec_file_load series failed as below on v4.8-rc1:

Applying: kexec_file: Allow arch-specific memory walking for
kexec_add_buffer
error: patch failed: include/linux/kexec.h:149
error: include/linux/kexec.h: patch does not apply
Patch failed at 0001 kexec_file: Allow arch-specific memory walking for
kexec_add_buffer
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

What is the order of your patch series of the three patchset?

[PATCH v2 0/2] extend kexec_file_load system call
[PATCH v5 00/13] kexec_file_load implementation for PowerPC
[PATCH v2 0/6] kexec_file: Add buffer hand-over for the next kernel

Do they depend on other patches?

Thanks
Dave

Re: [PATCH v2 0/6] kexec_file: Add buffer hand-over for the next kernel

2016-08-16 Thread Dave Young

On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> Hello,
> 
> This patch series implements a mechanism which allows the kernel to pass
> on a buffer to the kernel that will be kexec'd. This buffer is passed
> as a segment which is added to the kimage when it is being prepared
> by kexec_file_load.
> 
> How the second kernel is informed of this buffer is architecture-specific.
> On powerpc, this is done via the device tree, by checking
> the properties /chosen/linux,kexec-handover-buffer-start and
> /chosen/linux,kexec-handover-buffer-end, which is analogous to how the
> kernel finds the initrd.
> 
> This is needed because the Integrity Measurement Architecture subsystem
> needs to preserve its measurement list accross the kexec reboot. The
> following patch series for the IMA subsystem uses this feature for that
> purpose:
> 
> https://lists.infradead.org/pipermail/kexec/2016-August/016745.html
> 
> This is so that IMA can implement trusted boot support on the OpenPower
> platform, because on such systems an intermediary Linux instance running
> as part of the firmware is used to boot the target operating system via
> kexec. Using this mechanism, IMA on this intermediary instance can
> hand over to the target OS the measurements of the components that were
> used to boot it.
> 
> Because there could be additional measurement events between the
> kexec_file_load call and the actual reboot, IMA needs a way to update the
> buffer with those additional events before rebooting. One can minimize
> the interval between the kexec_file_load and the reboot syscalls, but as
> small as it can be, there is always the possibility that the measurement
> list will be out of date at the time of reboot.
> 
> To address this issue, this patch series also introduces
> kexec_update_segment, which allows a reboot notifier to change the
> contents of the image segment during the reboot process.
> 
> Patch 5 makes kimage_load_normal_segment and kexec_update_segment share
> code. It's not much code that they can share though, so I'm not sure if
> the result is actually better.
> 
> The last patch is not intended to be merged, it just demonstrates how
> this feature can be used.
> 
> This series applies on top of v5 of the "kexec_file_load implementation
> for PowerPC" patch series (which applies on top of v4.8-rc1):
> 
> https://lists.infradead.org/pipermail/kexec/2016-August/016843.html

I'm trying to review your patches, but seems I can not apply them
cleanly to mainline kernel or v4.8-rc1

Apply the kexec_file_load series failed as below on v4.8-rc1:

Applying: kexec_file: Allow arch-specific memory walking for
kexec_add_buffer
error: patch failed: include/linux/kexec.h:149
error: include/linux/kexec.h: patch does not apply
Patch failed at 0001 kexec_file: Allow arch-specific memory walking for
kexec_add_buffer
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

What is the order of your patch series of the three patchset?

[PATCH v2 0/2] extend kexec_file_load system call
[PATCH v5 00/13] kexec_file_load implementation for PowerPC
[PATCH v2 0/6] kexec_file: Add buffer hand-over for the next kernel

Do they depend on other patches?

Thanks
Dave

Re: [PATCH] kexec: Account crashk_low_res to kexec_crash_size

2016-08-16 Thread Dave Young

Hi,

On 08/15/16 at 04:05pm, Xunlei Pang wrote:
> On 2016/08/15 at 15:17, Dave Young wrote:
> > Hi Xunlei,
> >
> > On 08/13/16 at 04:26pm, Xunlei Pang wrote:
> >> "/sys/kernel/kexec_crash_size" only includes crashk_res, it
> >> is fine in most cases, but sometimes we have crashk_low_res.
> >> For example, when "crashkernel=size[KMG],high" combined with
> >> "crashkernel=size[KMG],low" is used for 64-bit x86.
> >>
> >> Let "/sys/kernel/kexec_crash_size" reflect all the reserved
> >> memory including crashk_low_res, this is more understandable
> >> from its naming.
> > Maybe export another file for the kexec_crash_low_size so that
> > we can clearly get how much the low area is.
> 
> I'm fine with it.
> 
> >> Although we can get all the crash memory from "/proc/iomem"
> >> by filtering all "Crash kernel" keyword, it is more convenient
> >> to utilize this file, and the two ways should stay consistent.
> > Shrink low area does not make much sense, one may either use it or
> > shrink it to 0.
> >
> > Actually think more about it, the crashk_low is only for x86,
> > it might be even better to move it to x86 code instead of in
> > common code.
> >
> > Opinion?
> 
> crashk_low is defined in kernel/kexec_core.c, it's an architecture 
> independent definition
> though it's only used by x86 currently, maybe it can be used by others in the 
> future.
> It's why I'm not handling it specifically for x86.

Ok, we can leave with it since it is in common code from the very
beginning but I doubt that any other arches will use it.

> 
> I just tested the original proc interface further, and it can be shrinked to 
> be zero.
> So I guess we can ease the restriction on shrinking the low area as well.
> 
> What do you think?

Ok, agreed.

Thanks
Dave

> 
> Regards,
> Xunlei
> 
> >
> > Thanks
> > Dave
> >> Note that write to "/sys/kernel/kexec_crash_size" is to shrink
> >> the reserved memory, and we want to shrink crashk_res only.
> >> So we add some additional check in crash_shrink_memory() since
> >> crashk_low_res now is involved.
> >>
> >> Signed-off-by: Xunlei Pang <xlp...@redhat.com>
> >> ---
> >>  kernel/kexec_core.c | 15 ++-
> >>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> >> index 5616755..d5ae780 100644
> >> --- a/kernel/kexec_core.c
> >> +++ b/kernel/kexec_core.c
> >> @@ -932,6 +932,8 @@ size_t crash_get_memory_size(void)
> >>mutex_lock(_mutex);
> >>if (crashk_res.end != crashk_res.start)
> >>size = resource_size(_res);
> >> +  if (crashk_low_res.end != crashk_low_res.start)
> >> +  size += resource_size(_low_res);
> >>mutex_unlock(_mutex);
> >>return size;
> >>  }
> >> @@ -949,7 +951,7 @@ int crash_shrink_memory(unsigned long new_size)
> >>  {
> >>int ret = 0;
> >>unsigned long start, end;
> >> -  unsigned long old_size;
> >> +  unsigned long low_size, old_size;
> >>struct resource *ram_res;
> >>  
> >>mutex_lock(_mutex);
> >> @@ -958,6 +960,17 @@ int crash_shrink_memory(unsigned long new_size)
> >>ret = -ENOENT;
> >>goto unlock;
> >>}
> >> +
> >> +  start = crashk_low_res.start;
> >> +  end = crashk_low_res.end;
> >> +  low_size = (end == 0) ? 0 : end - start + 1;
> >> +  /* Do not shrink crashk_low_res. */
> >> +  if (new_size <= low_size) {
> >> +  ret = -EINVAL;
> >> +  goto unlock;
> >> +  }
> >> +
> >> +  new_size -= low_size;
> >>start = crashk_res.start;
> >>end = crashk_res.end;
> >>old_size = (end == 0) ? 0 : end - start + 1;
> >> -- 
> >> 1.8.3.1
> >>
> >>
> >> ___
> >> kexec mailing list
> >> ke...@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec
> > ___
> > kexec mailing list
> > ke...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
>

Re: [PATCH] kexec: Account crashk_low_res to kexec_crash_size

2016-08-16 Thread Dave Young

Hi,

On 08/15/16 at 04:05pm, Xunlei Pang wrote:
> On 2016/08/15 at 15:17, Dave Young wrote:
> > Hi Xunlei,
> >
> > On 08/13/16 at 04:26pm, Xunlei Pang wrote:
> >> "/sys/kernel/kexec_crash_size" only includes crashk_res, it
> >> is fine in most cases, but sometimes we have crashk_low_res.
> >> For example, when "crashkernel=size[KMG],high" combined with
> >> "crashkernel=size[KMG],low" is used for 64-bit x86.
> >>
> >> Let "/sys/kernel/kexec_crash_size" reflect all the reserved
> >> memory including crashk_low_res, this is more understandable
> >> from its naming.
> > Maybe export another file for the kexec_crash_low_size so that
> > we can clearly get how much the low area is.
> 
> I'm fine with it.
> 
> >> Although we can get all the crash memory from "/proc/iomem"
> >> by filtering all "Crash kernel" keyword, it is more convenient
> >> to utilize this file, and the two ways should stay consistent.
> > Shrink low area does not make much sense, one may either use it or
> > shrink it to 0.
> >
> > Actually think more about it, the crashk_low is only for x86,
> > it might be even better to move it to x86 code instead of in
> > common code.
> >
> > Opinion?
> 
> crashk_low is defined in kernel/kexec_core.c, it's an architecture 
> independent definition
> though it's only used by x86 currently, maybe it can be used by others in the 
> future.
> It's why I'm not handling it specifically for x86.

Ok, we can leave with it since it is in common code from the very
beginning but I doubt that any other arches will use it.

> 
> I just tested the original proc interface further, and it can be shrinked to 
> be zero.
> So I guess we can ease the restriction on shrinking the low area as well.
> 
> What do you think?

Ok, agreed.

Thanks
Dave

> 
> Regards,
> Xunlei
> 
> >
> > Thanks
> > Dave
> >> Note that write to "/sys/kernel/kexec_crash_size" is to shrink
> >> the reserved memory, and we want to shrink crashk_res only.
> >> So we add some additional check in crash_shrink_memory() since
> >> crashk_low_res now is involved.
> >>
> >> Signed-off-by: Xunlei Pang 
> >> ---
> >>  kernel/kexec_core.c | 15 ++-
> >>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> >> index 5616755..d5ae780 100644
> >> --- a/kernel/kexec_core.c
> >> +++ b/kernel/kexec_core.c
> >> @@ -932,6 +932,8 @@ size_t crash_get_memory_size(void)
> >>mutex_lock(_mutex);
> >>if (crashk_res.end != crashk_res.start)
> >>size = resource_size(_res);
> >> +  if (crashk_low_res.end != crashk_low_res.start)
> >> +  size += resource_size(_low_res);
> >>mutex_unlock(_mutex);
> >>return size;
> >>  }
> >> @@ -949,7 +951,7 @@ int crash_shrink_memory(unsigned long new_size)
> >>  {
> >>int ret = 0;
> >>unsigned long start, end;
> >> -  unsigned long old_size;
> >> +  unsigned long low_size, old_size;
> >>struct resource *ram_res;
> >>  
> >>mutex_lock(_mutex);
> >> @@ -958,6 +960,17 @@ int crash_shrink_memory(unsigned long new_size)
> >>ret = -ENOENT;
> >>goto unlock;
> >>}
> >> +
> >> +  start = crashk_low_res.start;
> >> +  end = crashk_low_res.end;
> >> +  low_size = (end == 0) ? 0 : end - start + 1;
> >> +  /* Do not shrink crashk_low_res. */
> >> +  if (new_size <= low_size) {
> >> +  ret = -EINVAL;
> >> +  goto unlock;
> >> +  }
> >> +
> >> +  new_size -= low_size;
> >>start = crashk_res.start;
> >>end = crashk_res.end;
> >>old_size = (end == 0) ? 0 : end - start + 1;
> >> -- 
> >> 1.8.3.1
> >>
> >>
> >> ___
> >> kexec mailing list
> >> ke...@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec
> > ___
> > kexec mailing list
> > ke...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
>

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 2643 matches

Mail list logo