Re: [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range

2016-07-20 Thread AKASHI Takahiro
James,

On Mon, Jul 18, 2016 at 07:04:33PM +0100, James Morse wrote:
> Hi!
> 
> (CC: Dennis Chen)
> 
> On 12/07/16 06:05, AKASHI Takahiro wrote:
> > Crash dump kernel will be run with a limited range of memory as System
> > RAM.
> > 
> > On arm64, we will use a device-tree property under /chosen,
> >linux,usable-memory-range = 
> > in order for primary kernel either on uefi or non-uefi (device tree only)
> > system to hand over the information about usable memory region to crash
> > dump kernel. This property will supercede entries in uefi memory map table
> > and "memory" nodes in a device tree.
> 
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 51b1302..d8b296f 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -300,10 +300,48 @@ static int __init early_mem(char *p)
> >  }
> >  early_param("mem", early_mem);
> >  
> > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > +   const char *uname, int depth, void *data)
> > +{
> > +   struct memblock_region *usablemem = (struct memblock_region *)data;
> > +   const __be32 *reg;
> > +   int len;
> > +
> > +   usablemem->size = 0;
> > +
> > +   if (depth != 1 || strcmp(uname, "chosen") != 0)
> > +   return 0;
> > +
> > +   reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", );
> > +   if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > +   return 1;
> > +
> > +   usablemem->base = dt_mem_next_cell(dt_root_addr_cells, );
> > +   usablemem->size = dt_mem_next_cell(dt_root_size_cells, );
> > +
> > +   return 1;
> > +}
> > +
> > +static void __init fdt_enforce_memory_region(void)
> > +{
> > +   struct memblock_region reg;
> > +
> > +   of_scan_flat_dt(early_init_dt_scan_usablemem, );
> > +
> > +   if (reg.size) {
> > +   memblock_remove(0, PAGE_ALIGN(reg.base));
> > +   memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
> > +   ULLONG_MAX);
> 
> I think this is a new way to trip the problem Dennis Chen has been working on
> [0]. If I kdump with --reuse-cmdline on a kernel booted with 'acpi=on', I get
> the panic below [1]...
> 
> It looks like Dennis's fix involves changes in mm/memblock.c, maybe they can 
> be
> extended to support a range instead of just a limit?

Could you please apply the diff attached below and confirm that
kdump works in your environment?
I can't test it by myself since my hikey board seems to be broken now.

Thanks,
-Takahiro AKASHI

> (It looks like x86 explicitly adds the acpi regions to the crash-kernels 
> memory
> map in crash_setup_memmap_entries()).
> 
> 
> 
> Is it possible for the kernel text to be outside this range? (a bug in
> kexec-tools, or another user of the DT property) If we haven't already failed 
> in
> this case, it may be worth printing a warning, or refusing to
> restrict-memory/expose-vmcore.
> 
> 
> 
> Thanks,
> 
> James
> 
> 
> [0] 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/443356.html
> [1]
> [0.00] efi: Getting EFI parameters from FDT:
> [0.00] efi: EFI v2.50 by ARM Juno EFI Nov 24 2015 12:36:35
> [0.00] efi:  ACPI=0xf95b  ACPI 2.0=0xf95b0014  PROP=0xfe8db4d8
> [0.00] Reserving 1KB of memory at 0x9f000 for elfcorehdr
> [0.00] cma: Reserved 16 MiB at 0x0009fec0
> [0.00] ACPI: Early table checksum verification disabled
> [0.00] ACPI: RSDP 0xF95B0014 24 (v02 ARMLTD)
> [0.00] ACPI: XSDT 0xF95A00E8 4C (v01 ARMLTD ARM-JUNO 
> 2014072
> 7  0113)
> [0.00] ACPI: FACP 0xF950 00010C (v05 ARMLTD ARM-JUNO 
> 2014072
> 7 ARM  0099)
> [0.00] ACPI: DSDT 0xF94C 000396 (v01 ARMLTD ARM-JUNO 
> 2014072
> 7 INTL 20150619)
> [0.00] ACPI: GTDT 0xF94F 60 (v02 ARMLTD ARM-JUNO 
> 2014072
> 7 ARM  0099)
> [0.00] ACPI: APIC 0xF94E 000224 (v03 ARMLTD ARM-JUNO 
> 2014072
> 7 ARM  0099)
> [0.00] ACPI: SSDT 0xF94D 0001E3 (v01 ARMLTD ARM-JUNO 
> 2014072
> 7 INTL 20150619)
> [0.00] ACPI: MCFG 0xF94B 3C (v01 ARMLTD ARM-JUNO 
> 2014072
> 7 ARM  0099)
> ...
> [0.737577] Serial: AMBA PL011 UART driver
> [0.786086] HugeTLB registered 2 MB page size, pre-allocated 0 pages
> [0.794203] ACPI: Added _OSI(Module Device)
> [0.798659] ACPI: Added _OSI(Processor Device)
> [0.803190] ACPI: Added _OSI(3.0 _SCP Extensions)
> [0.807973] ACPI: Added _OSI(Processor Aggregator Device)
> [0.813653] Unable to handle kernel paging request at virtual address 
> 000
> 00804e027
> [0.821704] pgd = 08cce000
> [0.825155] [0804e027] *pgd=0009d003, 
> *pud=0009c003,
> *pmd=0009b003, *pte=00e8f94c0707
> [0.836319] Internal error: Oops: 9621 [#1] PREEMPT SMP
> [0.841972] Modules linked in:
> [0.845073] CPU: 2 PID: 1 Comm: swapper/0 Tainted: 

Re: [PATCH v1 3/4] arm64: Add arm64 kexec support

2016-07-20 Thread Pratyush Anand
Hi Geoff,

On 19/07/2016:11:28:13 PM, Geoff Levand wrote:
> Add kexec reboot support for ARM64 platforms.
> 
> Signed-off-by: Geoff Levand 
> ---
>  configure.ac|   3 +
>  kexec/Makefile  |   1 +
>  kexec/arch/arm64/Makefile   |  40 ++
>  kexec/arch/arm64/crashdump-arm64.c  |  21 +
>  kexec/arch/arm64/crashdump-arm64.h  |  12 +
>  kexec/arch/arm64/image-header.h |  94 +++
>  kexec/arch/arm64/include/arch/options.h |  43 ++
>  kexec/arch/arm64/kexec-arm64.c  | 995 
> 
>  kexec/arch/arm64/kexec-arm64.h  |  58 ++
>  kexec/arch/arm64/kexec-elf-arm64.c  | 130 +
>  kexec/arch/arm64/kexec-image-arm64.c|  44 ++
>  kexec/kexec-syscall.h   |   8 +-
>  purgatory/Makefile  |   1 +
>  purgatory/arch/arm64/Makefile   |  18 +
>  purgatory/arch/arm64/entry.S|  59 ++
>  purgatory/arch/arm64/purgatory-arm64.c  |  35 ++
>  16 files changed, 1560 insertions(+), 2 deletions(-)
>  create mode 100644 kexec/arch/arm64/Makefile
>  create mode 100644 kexec/arch/arm64/crashdump-arm64.c
>  create mode 100644 kexec/arch/arm64/crashdump-arm64.h
>  create mode 100644 kexec/arch/arm64/image-header.h
>  create mode 100644 kexec/arch/arm64/include/arch/options.h
>  create mode 100644 kexec/arch/arm64/kexec-arm64.c
>  create mode 100644 kexec/arch/arm64/kexec-arm64.h
>  create mode 100644 kexec/arch/arm64/kexec-elf-arm64.c
>  create mode 100644 kexec/arch/arm64/kexec-image-arm64.c
>  create mode 100644 purgatory/arch/arm64/Makefile
>  create mode 100644 purgatory/arch/arm64/entry.S
>  create mode 100644 purgatory/arch/arm64/purgatory-arm64.c
> 
> diff --git a/configure.ac b/configure.ac
> index ca3a9d5..8858c94 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -34,6 +34,9 @@ case $target_cpu in
>   ARCH="ppc64"
>   SUBARCH="LE"
>   ;;
> + aarch64* )
> + ARCH="arm64"
> + ;;
>   arm* )
>   ARCH="arm"
>   ;;
> diff --git a/kexec/Makefile b/kexec/Makefile
> index cc3f08b..39f365f 100644
> --- a/kexec/Makefile
> +++ b/kexec/Makefile
> @@ -79,6 +79,7 @@ KEXEC_SRCS  += $($(ARCH)_DT_OPS)
>  
>  include $(srcdir)/kexec/arch/alpha/Makefile
>  include $(srcdir)/kexec/arch/arm/Makefile
> +include $(srcdir)/kexec/arch/arm64/Makefile
>  include $(srcdir)/kexec/arch/i386/Makefile
>  include $(srcdir)/kexec/arch/ia64/Makefile
>  include $(srcdir)/kexec/arch/m68k/Makefile
> diff --git a/kexec/arch/arm64/Makefile b/kexec/arch/arm64/Makefile
> new file mode 100644
> index 000..37414dc
> --- /dev/null
> +++ b/kexec/arch/arm64/Makefile
> @@ -0,0 +1,40 @@
> +
> +arm64_FS2DT += kexec/fs2dt.c
> +arm64_FS2DT_INCLUDE += -include $(srcdir)/kexec/arch/arm64/kexec-arm64.h \
> + -include $(srcdir)/kexec/arch/arm64/crashdump-arm64.h
> +
> +arm64_DT_OPS += kexec/dt-ops.c
> +
> +arm64_CPPFLAGS += -I $(srcdir)/kexec/
> +
> +arm64_KEXEC_SRCS += \
> + kexec/arch/arm64/kexec-arm64.c \
> + kexec/arch/arm64/kexec-image-arm64.c \
> + kexec/arch/arm64/kexec-elf-arm64.c \
> + kexec/arch/arm64/crashdump-arm64.c
> +
> +arm64_ARCH_REUSE_INITRD =
> +arm64_ADD_SEGMENT =
> +arm64_VIRT_TO_PHYS =
> +arm64_PHYS_TO_VIRT =
> +
> +dist += $(arm64_KEXEC_SRCS) \
> + kexec/arch/arm64/Makefile \
> + kexec/arch/arm64/kexec-arm64.h \
> + kexec/arch/arm64/crashdump-arm64.h
> +
> +ifdef HAVE_LIBFDT
> +
> +LIBS += -lfdt
> +
> +else
> +
> +include $(srcdir)/kexec/libfdt/Makefile.libfdt
> +
> +libfdt_SRCS += $(LIBFDT_SRCS:%=kexec/libfdt/%)
> +
> +arm64_CPPFLAGS += -I$(srcdir)/kexec/libfdt
> +
> +arm64_KEXEC_SRCS += $(libfdt_SRCS)
> +
> +endif
> diff --git a/kexec/arch/arm64/crashdump-arm64.c 
> b/kexec/arch/arm64/crashdump-arm64.c
> new file mode 100644
> index 000..d2272c8
> --- /dev/null
> +++ b/kexec/arch/arm64/crashdump-arm64.c
> @@ -0,0 +1,21 @@
> +/*
> + * ARM64 crashdump.
> + */
> +
> +#define _GNU_SOURCE
> +
> +#include 
> +#include 
> +
> +#include "kexec.h"
> +#include "crashdump.h"
> +#include "crashdump-arm64.h"
> +#include "kexec-arm64.h"
> +#include "kexec-elf.h"
> +
> +struct memory_ranges usablemem_rgns = {};
> +
> +int is_crashkernel_mem_reserved(void)
> +{
> + return 0;
> +}
> diff --git a/kexec/arch/arm64/crashdump-arm64.h 
> b/kexec/arch/arm64/crashdump-arm64.h
> new file mode 100644
> index 000..f33c7a2
> --- /dev/null
> +++ b/kexec/arch/arm64/crashdump-arm64.h
> @@ -0,0 +1,12 @@
> +/*
> + * ARM64 crashdump.
> + */
> +
> +#if !defined(CRASHDUMP_ARM64_H)

Although its a personal choice, but IMHO using #ifndef is better when we check
single define.

> +#define CRASHDUMP_ARM64_H
> +
> +#include "kexec.h"
> +
> +extern struct memory_ranges usablemem_rgns;
> +
> +#endif
> diff --git a/kexec/arch/arm64/image-header.h b/kexec/arch/arm64/image-header.h
> new file mode 100644
> index 000..d766f18
> --- /dev/null
> +++ 

[PATCH v1.2 3/4] arm64: Add arm64 kexec support

2016-07-20 Thread Geoff Levand
Add kexec reboot support for ARM64 platforms.

Signed-off-by: Geoff Levand 
---
 configure.ac|   3 +
 kexec/Makefile  |   1 +
 kexec/arch/arm64/Makefile   |  40 ++
 kexec/arch/arm64/crashdump-arm64.c  |  21 +
 kexec/arch/arm64/crashdump-arm64.h  |  12 +
 kexec/arch/arm64/image-header.h | 104 
 kexec/arch/arm64/include/arch/options.h |  43 ++
 kexec/arch/arm64/kexec-arm64.c  | 979 
 kexec/arch/arm64/kexec-arm64.h  |  70 +++
 kexec/arch/arm64/kexec-elf-arm64.c  | 130 +
 kexec/arch/arm64/kexec-image-arm64.c|  44 ++
 kexec/kexec-syscall.h   |   8 +-
 purgatory/Makefile  |   1 +
 purgatory/arch/arm64/Makefile   |  18 +
 purgatory/arch/arm64/entry.S|  52 ++
 purgatory/arch/arm64/purgatory-arm64.c  |  35 ++
 16 files changed, 1559 insertions(+), 2 deletions(-)
 create mode 100644 kexec/arch/arm64/Makefile
 create mode 100644 kexec/arch/arm64/crashdump-arm64.c
 create mode 100644 kexec/arch/arm64/crashdump-arm64.h
 create mode 100644 kexec/arch/arm64/image-header.h
 create mode 100644 kexec/arch/arm64/include/arch/options.h
 create mode 100644 kexec/arch/arm64/kexec-arm64.c
 create mode 100644 kexec/arch/arm64/kexec-arm64.h
 create mode 100644 kexec/arch/arm64/kexec-elf-arm64.c
 create mode 100644 kexec/arch/arm64/kexec-image-arm64.c
 create mode 100644 purgatory/arch/arm64/Makefile
 create mode 100644 purgatory/arch/arm64/entry.S
 create mode 100644 purgatory/arch/arm64/purgatory-arm64.c

diff --git a/configure.ac b/configure.ac
index ca3a9d5..8858c94 100644
--- a/configure.ac
+++ b/configure.ac
@@ -34,6 +34,9 @@ case $target_cpu in
ARCH="ppc64"
SUBARCH="LE"
;;
+   aarch64* )
+   ARCH="arm64"
+   ;;
arm* )
ARCH="arm"
;;
diff --git a/kexec/Makefile b/kexec/Makefile
index cc3f08b..39f365f 100644
--- a/kexec/Makefile
+++ b/kexec/Makefile
@@ -79,6 +79,7 @@ KEXEC_SRCS+= $($(ARCH)_DT_OPS)
 
 include $(srcdir)/kexec/arch/alpha/Makefile
 include $(srcdir)/kexec/arch/arm/Makefile
+include $(srcdir)/kexec/arch/arm64/Makefile
 include $(srcdir)/kexec/arch/i386/Makefile
 include $(srcdir)/kexec/arch/ia64/Makefile
 include $(srcdir)/kexec/arch/m68k/Makefile
diff --git a/kexec/arch/arm64/Makefile b/kexec/arch/arm64/Makefile
new file mode 100644
index 000..37414dc
--- /dev/null
+++ b/kexec/arch/arm64/Makefile
@@ -0,0 +1,40 @@
+
+arm64_FS2DT += kexec/fs2dt.c
+arm64_FS2DT_INCLUDE += -include $(srcdir)/kexec/arch/arm64/kexec-arm64.h \
+   -include $(srcdir)/kexec/arch/arm64/crashdump-arm64.h
+
+arm64_DT_OPS += kexec/dt-ops.c
+
+arm64_CPPFLAGS += -I $(srcdir)/kexec/
+
+arm64_KEXEC_SRCS += \
+   kexec/arch/arm64/kexec-arm64.c \
+   kexec/arch/arm64/kexec-image-arm64.c \
+   kexec/arch/arm64/kexec-elf-arm64.c \
+   kexec/arch/arm64/crashdump-arm64.c
+
+arm64_ARCH_REUSE_INITRD =
+arm64_ADD_SEGMENT =
+arm64_VIRT_TO_PHYS =
+arm64_PHYS_TO_VIRT =
+
+dist += $(arm64_KEXEC_SRCS) \
+   kexec/arch/arm64/Makefile \
+   kexec/arch/arm64/kexec-arm64.h \
+   kexec/arch/arm64/crashdump-arm64.h
+
+ifdef HAVE_LIBFDT
+
+LIBS += -lfdt
+
+else
+
+include $(srcdir)/kexec/libfdt/Makefile.libfdt
+
+libfdt_SRCS += $(LIBFDT_SRCS:%=kexec/libfdt/%)
+
+arm64_CPPFLAGS += -I$(srcdir)/kexec/libfdt
+
+arm64_KEXEC_SRCS += $(libfdt_SRCS)
+
+endif
diff --git a/kexec/arch/arm64/crashdump-arm64.c 
b/kexec/arch/arm64/crashdump-arm64.c
new file mode 100644
index 000..d2272c8
--- /dev/null
+++ b/kexec/arch/arm64/crashdump-arm64.c
@@ -0,0 +1,21 @@
+/*
+ * ARM64 crashdump.
+ */
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+
+#include "kexec.h"
+#include "crashdump.h"
+#include "crashdump-arm64.h"
+#include "kexec-arm64.h"
+#include "kexec-elf.h"
+
+struct memory_ranges usablemem_rgns = {};
+
+int is_crashkernel_mem_reserved(void)
+{
+   return 0;
+}
diff --git a/kexec/arch/arm64/crashdump-arm64.h 
b/kexec/arch/arm64/crashdump-arm64.h
new file mode 100644
index 000..f33c7a2
--- /dev/null
+++ b/kexec/arch/arm64/crashdump-arm64.h
@@ -0,0 +1,12 @@
+/*
+ * ARM64 crashdump.
+ */
+
+#if !defined(CRASHDUMP_ARM64_H)
+#define CRASHDUMP_ARM64_H
+
+#include "kexec.h"
+
+extern struct memory_ranges usablemem_rgns;
+
+#endif
diff --git a/kexec/arch/arm64/image-header.h b/kexec/arch/arm64/image-header.h
new file mode 100644
index 000..126ca15
--- /dev/null
+++ b/kexec/arch/arm64/image-header.h
@@ -0,0 +1,104 @@
+/*
+ * ARM64 binary image support.
+ */
+
+#if !defined(__ARM64_IMAGE_HEADER_H)
+#define __ARM64_IMAGE_HEADER_H
+
+#if !defined(__KERNEL__)
+#include 
+#include 
+#endif
+
+#if !defined(__ASSEMBLY__)
+
+/**
+ * struct arm64_image_header - arm64 kernel image header.
+ *
+ * @pe_sig: Optional PE format 'MZ' signature.
+ * @branch_code: Reserved for instructions to branch to 

Re: [PATCH v1 3/4] arm64: Add arm64 kexec support

2016-07-20 Thread Geoff Levand
Hi Pratyush,

On Wed, 2016-07-20 at 23:23 +0530, Pratyush Anand wrote:
> On 19/07/2016:11:28:13 PM, Geoff Levand wrote:

> > +++ b/kexec/arch/arm64/include/arch/options.h
> > @@ -0,0 +1,43 @@
> > +#if !defined(KEXEC_ARCH_ARM64_OPTIONS_H)
> > +#define KEXEC_ARCH_ARM64_OPTIONS_H
> > +
> > +#define OPT_APPEND> >  > > > > ((OPT_MAX)+0)
> > +#define OPT_DTB> > > > > > > > ((OPT_MAX)+1)
> > +#define OPT_INITRD> >  > > > > ((OPT_MAX)+2)
> > +#define OPT_PORT> >> > > > ((OPT_MAX)+3)
> > +#define OPT_REUSE_CMDLINE> >   > > ((OPT_MAX)+4)
> > +#define OPT_ARCH_MAX> >> > > > ((OPT_MAX)+5)
> > +
> > +#define KEXEC_ARCH_OPTIONS \
> > +> >> > KEXEC_OPTIONS \
> > +> >> > { "append",1, NULL, OPT_APPEND }, \
> > +> >> > { "command-line",  1, NULL, OPT_APPEND }, \
> > +> >> > { "dtb",   1, NULL, OPT_DTB }, \
> > +> >> > { "initrd",1, NULL, OPT_INITRD }, \
> > +> >> > { "port",  1, NULL, OPT_PORT }, \
> 
> I still think that we should have a way to check TX buffer overflow..Anyway, I
> will send top up patch for that when these patch set are merged.

I was very tempted to just not support the purgatory printing,
as other arches do, because of all the trouble with different
UARTs.  Is it really so important to see the 'I'm in purgatory'
message?

> > +> >> > { "ramdisk",   1, NULL, OPT_INITRD }, \
> > +> >> > { "reuse-cmdline", 0, NULL, OPT_REUSE_CMDLINE }, \
> > +
> > +#define KEXEC_ARCH_OPT_STR KEXEC_OPT_STR /* Only accept long arch options. 
> > */
> > +#define KEXEC_ALL_OPTIONS KEXEC_ARCH_OPTIONS
> > +#define KEXEC_ALL_OPT_STR KEXEC_ARCH_OPT_STR
> > +
> > +static const char arm64_opts_usage[] __attribute__ ((unused)) =
> > +" --append=STRING   Set the kernel command line to STRING.\n"
> 
> "Update the kernel command line with STRING.\n" may be a better description.

That is the text the other arches use, so I'll keep it for consistency.

> 
> > +" --command-line=STRING Set the kernel command line to STRING.\n"
> > +" --dtb=FILEUse FILE as the device tree blob.\n"
> > +" --initrd=FILE Use FILE as the kernel initial ramdisk.\n"
> > +" --port=ADDRESSPurgatory output to port ADDRESS.\n"
> > +" --ramdisk=FILEUse FILE as the kernel initial ramdisk.\n"
> > +" --reuse-cmdline   Use kernel command line from running 
> > system.\n";
> > +
> > +
> > +static int check_cpu_properties(const struct cpu_properties *cp_1,
> > +> >> > const struct cpu_properties *cp_2)
> > +{
> > +> >> > assert(cp_1->hwid == cp_2->hwid);
> > +
> > +> >> > if (cp_1->method != cp_2->method) {
> > +> >> > > > fprintf(stderr,
> > +> >> > > > > > "%s:%d: hwid-%" PRIx64 ": Error: Different 
> > cpu enable methods: %s -> %s\n",
> > +> >> > > > > > __func__, __LINE__, cp_1->hwid,
> > +> >> > > > > > cpu_enable_method_str(cp_1->method),
> > +> >> > > > > > cpu_enable_method_str(cp_2->method));
> > +> >> > > > return -EINVAL;
> > +> >> > }
> > +
> > +> >> > if (cp_2->method != cpu_enable_method_psci) {
> > +> >> > > > fprintf(stderr,
> > +> >> > > > > > "%s:%d: hwid-%" PRIx64 ": Error: 
> > Unsupported cpu enable method: %s\n",
> > +> >> > > > > > __func__, __LINE__, cp_1->hwid,
> > +> >> > > > > > cpu_enable_method_str(cp_1->method));
> > +> >> > > > return -EINVAL;
> > +> >> > }
> 
> What if cp_1->method = cp_2->method = cpu_enable_method_spin_table?
> 
> I think, second if loop should be within 1st loop's scope.

There is no way to get a cpu back into a spin, so spin_table is not supported.

> +>> for (cpu_1 = 0; cpu_1 < info_1.cpu_count; cpu_1++) {
> > +> >> > > > struct cpu_properties *cp_1 = _1.cp[cpu_1];
> > +> >> > > > unsigned int cpu_2;
> > +
> > +> >> > > > for (cpu_2 = 0; cpu_2 < info_2.cpu_count; cpu_2++) {
> > +> >> > > > > > struct cpu_properties *cp_2 = 
> > _2.cp[cpu_2];
> > +
> > +> >> > > > > > if (cp_1->hwid != cp_2->hwid)
> > +> >> > > > > > > > continue;
> > +
> > +> >> > > > > > to_process--;
> > +
> > +> >> > > > > > result = check_cpu_properties(cp_1, cp_2);
> > +
> > +> >> > > > > > if (result)
> > +> >> > > > > > > > goto on_exit;
> 
> I think, you can break the loop when cp_1->hwid and cp_2->hwid matches.

OK.

> +int arm64_load_other_segments(struct kexec_info *info,
> > +> >> > uint64_t kernel_entry)
> > +{
> > +> >> > int result;
> > +> >> > uint64_t dtb_base;
> > +> >> > uint64_t image_base;
> > +> >> > unsigned long hole_min;
> > +> >> > unsigned long hole_max;
> > +> >> > uint64_t 

Re: [PATCH v1 3/4] arm64: Add arm64 kexec support

2016-07-20 Thread Geoff Levand
Hi Mark,

On Wed, 2016-07-20 at 16:39 +0100, Mark Rutland wrote:
> On Tue, Jul 19, 2016 at 11:28:13PM +, Geoff Levand wrote:
> > +/**
> > + * struct arm64_image_header - arm64 kernel image header.
> > + *
> > + * @pe_sig: Optional PE format 'MZ' signature.
> > + * @branch_code: Reserved for instructions to branch to stext.
> > + * @text_offset: The image load offset in LSB byte order.
> > + * @image_size: An estimated size of the memory image size in LSB byte 
> > order.
> > + * @flags: Bit flags:
> > + *  Bit 7.0: Image byte order, 1=MSB.
> > + * @reserved_1: Reserved.
> > + * @magic: Magic number, "ARM\x64".
> > + * @pe_header: Optional offset to a PE format header.
> > + **/
> > +
> > +struct arm64_image_header {
> > +> >> > uint8_t pe_sig[2];
> > +> >> > uint16_t branch_code[3];
> > +> >> > uint64_t text_offset;
> > +> >> > uint64_t image_size;
> > +> >> > uint8_t flags[8];
> 
> The flags field is a 64-bit quantity, and it's rather confusing to treat
> it as something else.
> 
> I think it would be better to have it as a uint64_t, and use explicit
> endianness conversion as necessary to swizzle it. I beleive that's less
> confusing than grabbing individual bytes.
> 
> > +static const uint64_t arm64_image_flag_7_be = 0x01U;
> 
> For this we could have:
> 
> #define ARM64_IMAGE_FLAG_BE>  >   > (1UL << 0)

Sure, we can do it that way.

> > +static inline int arm64_header_check_magic(const struct arm64_image_header 
> > *h)
> > +{
> > +> >> > if (!h)
> > +> >> > > > return 0;
> > +
> > +> >> > if (!h->text_offset)
> > +> >> > > > return 0;
> 
> I believe that with CONFIG_RANDOMIZE_TEXT_OFFSET, it is possible that
> text_offset is 0.
> Regardless, I'm not sure I follow the point of this check; why isn't
> checking the magic sufficient?

I'll remove it.

> > +
> > +> >> > return (h->magic[0] == arm64_image_magic[0]
> > +> >> > > > && h->magic[1] == arm64_image_magic[1]
> > +> >> > > > && h->magic[2] == arm64_image_magic[2]
> > +> >> > > > && h->magic[3] == arm64_image_magic[3]);
> > +}
> 
> > +static inline int arm64_header_check_msb(const struct arm64_image_header 
> > *h)
> > +{
> > +> >> > if (!h)
> > +> >> > > > return 0;
> > +
> > +> >> > return !!(h->flags[7] & arm64_image_flag_7_be);
> > +}
> 
> As above, I think this would be better as the below, perhaps wrapped
> with !! if people don't like implicit bool conversion.
> 
> static inline bool arm64_header_is_be(const struct arm64_image_header *h)
> {
>   > return le64_to_cpu(h->flags) & ARM64_IMAGE_FLAG_BE;
> }
> 
> > +static int check_cpu_properties(const struct cpu_properties *cp_1,
> > +> >> > const struct cpu_properties *cp_2)
> > +{
> > +> >> > assert(cp_1->hwid == cp_2->hwid);
> > +
> > +> >> > if (cp_1->method != cp_2->method) {
> > +> >> > > > fprintf(stderr,
> > +> >> > > > > > "%s:%d: hwid-%" PRIx64 ": Error: Different 
> > cpu enable methods: %s -> %s\n",
> > +> >> > > > > > __func__, __LINE__, cp_1->hwid,
> > +> >> > > > > > cpu_enable_method_str(cp_1->method),
> > +> >> > > > > > cpu_enable_method_str(cp_2->method));
> > +> >> > > > return -EINVAL;
> > +> >> > }
> > +
> > +> >> > if (cp_2->method != cpu_enable_method_psci) {
> > +> >> > > > fprintf(stderr,
> > +> >> > > > > > "%s:%d: hwid-%" PRIx64 ": Error: 
> > Unsupported cpu enable method: %s\n",
> > +> >> > > > > > __func__, __LINE__, cp_1->hwid,
> > +> >> > > > > > cpu_enable_method_str(cp_1->method));
> > +> >> > > > return -EINVAL;
> > +> >> > }
> > +
> > +> >> > dbgprintf("%s: hwid-%" PRIx64 ": OK\n", __func__, 
> > cp_1->hwid);
> > +
> > +> >> > return 0;
> > +}
> 
> Does this really matter to userspace?
>
> I agree that it makes sense to warn the user that kexec might not be
> possible, but producing an error and failing doesn't seem right. Who
> knows what the kernel might support in future?

As of now, we just ignore the return values of the call and
continue on. 
 
> > +static uint64_t read_sink(const char *command_line)
> > +{
> > +> >> > uint64_t v;
> > +> >> > const char *p;
> > +
> > +> >> > if (arm64_opts.port)
> > +> >> > > > return arm64_opts.port;
> > +
> > +#if defined(ARM64_DEBUG_PORT)
> > +> >> > return (uint64_t)(ARM64_DEBUG_PORT);
> > +#endif
> > +> >> > if (!command_line)
> > +> >> > > > return 0;
> > +
> > +> >> > if (!(p = strstr(command_line, "earlyprintk=")) &&
> > +> >> > > > !(p = strstr(command_line, "earlycon=")))
> > +> >> > > > return 0;
> > +
> > +> >> > while (*p != ',')
> > +> >> > > > p++;
> > +
> > +> >> > p++;
> > +
> > +> >> > while 

Re: IO memory read from /proc/vmcore leads to hang.

2016-07-20 Thread Daniel Walker


Mahesh, I didn't get your email for some reason . I saw it in the Archives.

makedumpfile doesn't appear to have a way to drop free form memory 
areas. So I need to drop 0080 to 00807fff , but I don't see a way to 
do that. Any other suggestions on how to prevent this hang ?




On 07/11/2016 02:46 PM, Daniel Walker wrote:


Hi,

I found found that on my Powerpc machine there is some IO memory which 
will cause the box to hang if I read it. It's a custom device that was 
added to the board for a special purpose.


I was looking for a way to exclude this memory from the dump, and 
while doing that I found that kexec makes a list of memory segments 
that go into the core file. I was wondering why most of the kexec 
architecture don't appear to exclude device memory like what's listed 
in /proc/iomem.


Is there a good reason why that's not the case?

Daniel



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Thiago Jung Bauermann
Am Mittwoch, 20 Juli 2016, 13:12:20 schrieb Arnd Bergmann:
> On Wednesday, July 20, 2016 8:47:45 PM CEST Michael Ellerman wrote:
> > At least for stdout-path, I can't really see how that would
> > significantly help an attacker, but I'm all ears if anyone has ideas.
> 
> That's actually an easy one that came up before: If an attacker controls
> a tty device (e.g. network console) that can be used to enter a debugger
> (kdb, kgdb, xmon, ...), enabling that to be the console device
> gives you a direct attack vector. The same thing will happen if you
> have a piece of software that intentially gives extra rights to the
> owner of the console device by treating it as "physical presence".

I think people are talking past each other a bit in these arguments about 
what is relevant to security or not.

For the kexec maintainers, kexec_file_load has one very specific and narrow 
purpose: enable Secure Boot as defined by UEFI.

And from what I understand of their arguments so far, there is one and only 
one security concern: when in Secure Boot mode, a system must not allow 
execution of unsigned code with kernel privileges. So even if one can 
specify a different root filesystem and do a lot of nasty things to the 
system with a rogue userspace in that root filesystem, as long as the kernel 
won't load unsigned modules that's not a problem as far as they're 
concerned.

Also, AFAIK attacks requiring "physical presence" are out of scope for the 
UEFI Secure Boot security model. Thus an attack that involves control of a 
console of plugging an USB device is also not a concern.

One thing I don't know is whether an attack involving a networked IPMI 
console or a USB device that can be "plugged" virtually by a managing system 
(BMC) is considered a physical attack or a remote attack in the context of 
UEFI Secure Boot.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 3/4] arm64: Add arm64 kexec support

2016-07-20 Thread Mark Rutland
Hi,

On Tue, Jul 19, 2016 at 11:28:13PM +, Geoff Levand wrote:
> +/**
> + * struct arm64_image_header - arm64 kernel image header.
> + *
> + * @pe_sig: Optional PE format 'MZ' signature.
> + * @branch_code: Reserved for instructions to branch to stext.
> + * @text_offset: The image load offset in LSB byte order.
> + * @image_size: An estimated size of the memory image size in LSB byte order.
> + * @flags: Bit flags:
> + *  Bit 7.0: Image byte order, 1=MSB.
> + * @reserved_1: Reserved.
> + * @magic: Magic number, "ARM\x64".
> + * @pe_header: Optional offset to a PE format header.
> + **/
> +
> +struct arm64_image_header {
> + uint8_t pe_sig[2];
> + uint16_t branch_code[3];
> + uint64_t text_offset;
> + uint64_t image_size;
> + uint8_t flags[8];

The flags field is a 64-bit quantity, and it's rather confusing to treat
it as something else.

I think it would be better to have it as a uint64_t, and use explicit
endianness conversion as necessary to swizzle it. I beleive that's less
confusing than grabbing individual bytes.

> +static const uint64_t arm64_image_flag_7_be = 0x01U;

For this we could have:

#define ARM64_IMAGE_FLAG_BE (1UL << 0)

> +static inline int arm64_header_check_magic(const struct arm64_image_header 
> *h)
> +{
> + if (!h)
> + return 0;
> +
> + if (!h->text_offset)
> + return 0;

I believe that with CONFIG_RANDOMIZE_TEXT_OFFSET, it is possible that
text_offset is 0.

Regardless, I'm not sure I follow the point of this check; why isn't
checking the magic sufficient?

> +
> + return (h->magic[0] == arm64_image_magic[0]
> + && h->magic[1] == arm64_image_magic[1]
> + && h->magic[2] == arm64_image_magic[2]
> + && h->magic[3] == arm64_image_magic[3]);
> +}

> +static inline int arm64_header_check_msb(const struct arm64_image_header *h)
> +{
> + if (!h)
> + return 0;
> +
> + return !!(h->flags[7] & arm64_image_flag_7_be);
> +}

As above, I think this would be better as the below, perhaps wrapped
with !! if people don't like implicit bool conversion.

static inline bool arm64_header_is_be(const struct arm64_image_header *h)
{
return le64_to_cpu(h->flags) & ARM64_IMAGE_FLAG_BE;
}

> +static int check_cpu_properties(const struct cpu_properties *cp_1,
> + const struct cpu_properties *cp_2)
> +{
> + assert(cp_1->hwid == cp_2->hwid);
> +
> + if (cp_1->method != cp_2->method) {
> + fprintf(stderr,
> + "%s:%d: hwid-%" PRIx64 ": Error: Different cpu enable 
> methods: %s -> %s\n",
> + __func__, __LINE__, cp_1->hwid,
> + cpu_enable_method_str(cp_1->method),
> + cpu_enable_method_str(cp_2->method));
> + return -EINVAL;
> + }
> +
> + if (cp_2->method != cpu_enable_method_psci) {
> + fprintf(stderr,
> + "%s:%d: hwid-%" PRIx64 ": Error: Unsupported cpu enable 
> method: %s\n",
> + __func__, __LINE__, cp_1->hwid,
> + cpu_enable_method_str(cp_1->method));
> + return -EINVAL;
> + }
> +
> + dbgprintf("%s: hwid-%" PRIx64 ": OK\n", __func__, cp_1->hwid);
> +
> + return 0;
> +}

Does this really matter to userspace?

I agree that it makes sense to warn the user that kexec might not be
possible, but producing an error and failing doesn't seem right. Who
knows what the kernel might support in future?

> +static uint64_t read_sink(const char *command_line)
> +{
> + uint64_t v;
> + const char *p;
> +
> + if (arm64_opts.port)
> + return arm64_opts.port;
> +
> +#if defined(ARM64_DEBUG_PORT)
> + return (uint64_t)(ARM64_DEBUG_PORT);
> +#endif
> + if (!command_line)
> + return 0;
> +
> + if (!(p = strstr(command_line, "earlyprintk=")) &&
> + !(p = strstr(command_line, "earlycon=")))
> + return 0;
> +
> + while (*p != ',')
> + p++;
> +
> + p++;
> +
> + while (isspace(*p))
> + p++;

Why do we skip spaces? As far as I am aware, there should not be any
spaces in the option.

> +
> + if (*p == 0)
> + return 0;
> +
> + errno = 0;
> +
> + v = strtoull(p, NULL, 0);
> +
> + if (errno)
> + return 0;
> +
> + return v;
> +}

It looks like the purgatory code expects angel SWI as the earlycon,
whereas many other earlycons exist (with pl011 being extremely popular).

Regardless, if we assume a particular UART type, we should explicitly
verify that here. Otherwise the purgatory code will likely bring down
the system, and it will be very painful to debug.

Please explicitly check for the supported earlycon name.

> +
> +/**
> + * arm64_load_other_segments - Prepare the dtb, initrd and purgatory 
> segments.
> + */
> +
> +int arm64_load_other_segments(struct kexec_info *info,
> + uint64_t kernel_entry)
> +{
> + int result;
> + 

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Vivek Goyal
On Wed, Jul 20, 2016 at 09:35:30AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote:
> > > IOW, if your kernel forced signature verification, you should not be
> > > able to do sig_enforce=0. If you kernel did not have
> > > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
> > > and you are not making it worse using command line.
> > 
> > OK.. I checked and you are right, but that is an example and there are
> > other things like security=, thermal.*, nosmep, nosmap that need auditing
> > for safety and might hurt the system security if used. I still think
> > think that assuming you can pass any command line without breaking security
> > is a broken argument.
> 
> Quite, and you don't need to run code in a privileged environment to do
> any of that.
> 
> It's also not trivial to protect against: new kernels gain new arguments
> which older kernels may not know about.  No matter how much protection
> is built into older kernels, newer kernels can become vulnerable through
> the addition of further arguments.

If a new kernel command line option becomes an issue, new kernel can
block that in secureboot environment. That way it helps kexec
boot as well as regular boot.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Vivek Goyal
On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote:
> >  
> > Command line options are not signed. I thought idea behind secureboot
> > was to execute only trusted code and command line options don't enforce
> > you to execute unsigned code.
> >  
> >>
> >> You can set module.sig_enforce=0 and open up the system a bit assuming
> >> that you can get a module to load with another attack
> > 
> > IIUC, sig_enforce bool_enable_only so it can only be enabled. Default
> > value of it is 0 if CONFIG_MODULE_SIG_FORCE=n.
> > 
> > IOW, if your kernel forced signature verification, you should not be
> > able to do sig_enforce=0. If you kernel did not have
> > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
> > and you are not making it worse using command line.
> > 
> 
> OK.. I checked and you are right, but that is an example and there are
> other things like security=, thermal.*, nosmep, nosmap that need auditing
> for safety and might hurt the system security if used. I still think
> think that assuming you can pass any command line without breaking security
> is a broken argument.

I agree that if some command line option allows running unsigned code
at ring 0, then we probably should disable that on secureboot enabled
boot.

In fact, there were bunch of patches which made things tighter on
secureboot enabled machines from matthew garrett. AFAIK, these patches
never went upstream.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Arnd Bergmann
On Wednesday, July 20, 2016 8:47:45 PM CEST Michael Ellerman wrote:
> At least for stdout-path, I can't really see how that would significantly help
> an attacker, but I'm all ears if anyone has ideas.

That's actually an easy one that came up before: If an attacker controls
a tty device (e.g. network console) that can be used to enter a debugger
(kdb, kgdb, xmon, ...), enabling that to be the console device
gives you a direct attack vector. The same thing will happen if you
have a piece of software that intentially gives extra rights to the
owner of the console device by treating it as "physical presence".

Arnd


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Michael Ellerman
Russell King - ARM Linux  writes:

> On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote:
>> > IOW, if your kernel forced signature verification, you should not be
>> > able to do sig_enforce=0. If you kernel did not have
>> > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
>> > and you are not making it worse using command line.
>> 
>> OK.. I checked and you are right, but that is an example and there are
>> other things like security=, thermal.*, nosmep, nosmap that need auditing
>> for safety and might hurt the system security if used. I still think
>> think that assuming you can pass any command line without breaking security
>> is a broken argument.
>
> Quite, and you don't need to run code in a privileged environment to do
> any of that.
>
> It's also not trivial to protect against: new kernels gain new arguments
> which older kernels may not know about.  No matter how much protection
> is built into older kernels, newer kernels can become vulnerable through
> the addition of further arguments.

Indeed. A whitelist of allowed command line arguments is the only option.

But given the existing syscall has shipped without a whitelist of command line
arguments, you can't add a whitelist now without potentially breaking someone's
setup.

Getting back to the device tree, we could similarly have a whitelist of
nodes/properties that we allow to be passed in.

At least for stdout-path, I can't really see how that would significantly help
an attacker, but I'm all ears if anyone has ideas.

> Also, how sure are we that there are no stack overflow issues with kernel
> command line parsing?  Can we be sure that there's none?  This is
> something which happens early in the kernel boot, before the full memory
> protections have been set up.

Yeah that's also a good point. More so for the device tree, because the parsing
is more complicated. I think there has been some work done on fuzzing libfdt,
but we should probably do more.

cheers

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Russell King - ARM Linux
On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote:
> > IOW, if your kernel forced signature verification, you should not be
> > able to do sig_enforce=0. If you kernel did not have
> > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
> > and you are not making it worse using command line.
> 
> OK.. I checked and you are right, but that is an example and there are
> other things like security=, thermal.*, nosmep, nosmap that need auditing
> for safety and might hurt the system security if used. I still think
> think that assuming you can pass any command line without breaking security
> is a broken argument.

Quite, and you don't need to run code in a privileged environment to do
any of that.

It's also not trivial to protect against: new kernels gain new arguments
which older kernels may not know about.  No matter how much protection
is built into older kernels, newer kernels can become vulnerable through
the addition of further arguments.

Also, how sure are we that there are no stack overflow issues with kernel
command line parsing?  Can we be sure that there's none?  This is
something which happens early in the kernel boot, before the full memory
protections have been set up.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[ANNOUNCE] kexec-tools v2.0.13 preparation

2016-07-20 Thread Simon Horman
Hi all,

I am planning to release kexec-tools v2.0.13 around the time that
the v4.7 kernel is released.

I would like to ask interested parties to send any patches they would like
included in v2.0.13 within the next week so that I can make an rc release.

For reference the patches queued up since v2.0.12 are as follows:

8d614008609f arm: use zImage size from header
f11d833efcbf arm: plug a zImage corner case
4db7f295d596 arm: fix kernel image size
fe9b19d49be8 arm: take account of TEXT_OFFSET for subsequent images
bf759f2aa662 Remove "max" parameter comment
dfd1de046099 arm: report which ELF core format we will use
9eb4a681fde5 arm: clean up phys/page offset debug
2f10ffb193c2 arm: fix type of phys_offset
fbee2f01cddf arm: add debug of reserved and coredump memory ranges
921a8d1d5bb2 arm: add support for boot-time crash kernel resource
db8209860120 arm: add support for multiple reserved regions
3e6bd29e7b02 arm: rename crash_reserved_mem to crash_kernel_mem
e401f4732157 arm: crashdump needs boot alias of crash kernel region
92e120d8ca5e arm: add support for platforms with boot memory aliases
103656d85960 arm: move crash system RAM parsing earlier
11c52266a2e3 arm: use generic mem_region sorting implementation
a40977d6d7ed arm: parse crash_reserved_mem early
d4b51af1719d arm: add maximum number of memory ranges
66d941fccb47 arm: add memory ranges debug
ccb582a4399f arm: report if crash kernel is out of bounds
8431da92d965 arm: return proper error for missing crash kernel
a5eacd16516f arm: fix ELF32/ELF64 check
38c18bb7d1e2 arm: fix get_kernel_stext_sym() to close its file
2185c79d3bf4 arm: fix off-by-one on memory end
45251676e562 arm: fix pointer signedness warning in kexec-uImage-arm.c
61ac72aefb9e kexec: add helper to exlude a region from a set of memory ranges
7dc06a544ffd kexec: add mem_regions sorting implementation
ba1e37a8ede8 kexec: add generic helper to add to memory_regions
8d8d2229a3af kexec: phys_to_virt() must take unsigned long long
5450d34a8864 kexec: add max_size to memory_ranges
58a5eb73678e kexec: fix warnings caused by selecting 64-bit file IO on 32-bit 
platforms
1c28b55604f3 kdump: print mmap() offset in hex
cb9056e009a9 arm: fix kdump to work on LPAE systems
58afa4991bf8 kdump: fix kdump mapping
e11f8002ce69 kdump: actually write out the memory
5a5ef3b9b71c kdump: fix multiple program header entries
bab7cb1ce778 kdump: mmap() and munmap() only work on page-aligned quantites
2a36d76bbbc3 arm: Change setup_dtb_prop to create nodes by offset and node name
a4360e730675 zImage-arm: Add support for booting android images
977f0ced4343 zImage-arm: Fix a return value check that use the wrong variable
3debb8cf3272 Properly align powerpc64 .toc
1e423dc297d1 ppc64: purgatory: Handle local symbols in ELF ABIv2
4a2ae3a39c64 Pass struct mem_sym into machine_apply_elf_rel()
75f7bc488432 kexec-tools 2.0.12.git

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table

2016-07-20 Thread zhong jiang
On 2016/7/14 21:19, Eric W. Biederman wrote:
> zhong jiang  writes:
>
>> On 2016/7/12 23:46, Eric W. Biederman wrote:
>>> zhongjiang  writes:
>>>
 From: zhong jiang 

 when image is loaded into kernel, we need set up page table for it. and 
 all valid pfn also set up new mapping. it will tend to establish a pmd 
 page table in the form of a large page if pud_present is true. 
 relocate_kernel 
 points to code segment can locate in the pmd huge entry in 
 init_transtion_pgtable. 
 therefore, we need to take the situation into account.
>>> I can see how in theory this might be necessary but when is a kernel virtual
>>> address on x86_64 that is above 0x8000 in conflict with an
>>> identity mapped physicall address that are all below 0x8000?
>>>
>>> If anything the code could be simplified to always assume those mappings
>>> are unoccupied.
>>>
>>> Did you run into an actual failure somewhere?
>>>
>>> Eric
>>>
>>I  do not understand what you trying to say,  Maybe I miss your point.
>>   
>>   The key is how to ensure that relocate_kernel points to the pmd
>>   entry is not huge page.
> Kernel virtual addresses are in the negative half of the address space.
> Identity mapped physical addresses are in the positive half of the
> address space.
>
> As the entire negative half of the address space at the time that page
> table entry is being created the are no huge pages present.
>
> Even testing pmd_present is a redundant, and that is probably the bug.
>
> Eric
>
> .
  ok , I know your mean.  we allocate new pgd page, that is  control_code_page,
  to rebuild new mapping machanism in init_pgtable.  because the relocate_kernel
  is in the negative half of the address space.   and The page table is not 
establise
  for the new pgd.  To my surprise,  if the page table is not exist, why we 
need check
  p(g,u,m)d_present() . if not , I still think that it can exist a pmd huge .

 or Maybe I misunderstand its meaning.

  Thanks
  zhongjiang


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/3] Enable legacy irq mode before jump to kexec/kdump kernel

2016-07-20 Thread b...@redhat.com
On 07/20/16 at 08:32am, Thomas Gleixner wrote:
> On Wed, 20 Jul 2016, b...@redhat.com wrote:
> > On 07/20/16 at 03:54am, Wei, Jiangang wrote:
> >
> > >  In fact, Eric and Ingo suggested that "it should be fixed in the bootup
> > > path of the dump kernel, not the crash kernel reboot path", which is
> > > convincing and reasonable.
> > 
> > Well this patch doesn't do differently with Eric's original implemention
> > in kexec/kdump path.
> > By taking out clear_IO_APIC from disable_IO_APIC, the left code of
> > disable_IO_APIC will only do the virtual wire setting. So for
> > kexec/kdump path, code basically is the same as Eric's method. But for
> > poweroff/halt/reboot, it's enough to call clear_IO_APIC to disable
> > IO-APIC.
> 
> You're completely ignoring what Jiangang said: 
> 
>  "it should be fixed in the bootup path of the dump kernel, not the crash
>   kernel reboot path"
> 
> and that's the right way to do it. End of story.

Thanks, tglx. What I did is like reverting commit 522e6646446. But it
would be great if we can change to fix it in bootup path.

Thanks
Baoquan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/3] Enable legacy irq mode before jump to kexec/kdump kernel

2016-07-20 Thread Thomas Gleixner
On Wed, 20 Jul 2016, b...@redhat.com wrote:
> On 07/20/16 at 03:54am, Wei, Jiangang wrote:
>
> >  In fact, Eric and Ingo suggested that "it should be fixed in the bootup
> > path of the dump kernel, not the crash kernel reboot path", which is
> > convincing and reasonable.
> 
> Well this patch doesn't do differently with Eric's original implemention
> in kexec/kdump path.
> By taking out clear_IO_APIC from disable_IO_APIC, the left code of
> disable_IO_APIC will only do the virtual wire setting. So for
> kexec/kdump path, code basically is the same as Eric's method. But for
> poweroff/halt/reboot, it's enough to call clear_IO_APIC to disable
> IO-APIC.

You're completely ignoring what Jiangang said: 

 "it should be fixed in the bootup path of the dump kernel, not the crash
  kernel reboot path"

and that's the right way to do it. End of story.

Thanks,

tglx

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/3] Enable legacy irq mode before jump to kexec/kdump kernel

2016-07-20 Thread b...@redhat.com
On 07/20/16 at 03:54am, Wei, Jiangang wrote:
> Hi Baoquan He,
> 
> Well, Indeed there‘s a relationship between the dump-capture hangs in
> calibrate_delay_converge() and the interrupt mode.
> 
> but there‘s no essential difference between your patches and mine that
> calls disable_IO_APIC() again.  
> Actually, disable_IO_APIC will set APIC to virtual wire mode.

Well, about this I want to explain a little bit. Usually people posted a
patch, reviewers can give comments, suggestions or other ideas. During
reviewing stage patch author need answer questions from people
interested. So this is an interactive action, reviewers can also learn
knowledge, meanwhile give comments.

When reviewing your patch, I have many questions, and only get two
pieces of information, kdump kernel hang with notsc, and disable_IO_APIC
can save it. You even can't answer people's question why disable_IO_APIC
need be called twice with your patch. I have to dig code and read intel
arch manual and MP spec and Eric's original patch thread, and add debug
bug to verify all. How can it be like you said "no essential difference
between your patches and mine"? 


> 
> In fact,
> Eric and Ingo suggested that "it should be fixed in the bootup path of
> the dump kernel, not the crash kernel reboot path", which is convincing
> and reasonable.
> 
> And i find a better method can fix the problem.
> It's better to set virtual wire mode for apic in init_bsp_APIC(), which
> in the bootup path of dump kernel.
> But now, init_bsp_APIC doesn't initialize the apic to vitual wire mode
> when smp_found_config is non-zero.

This may be do-able, let's see what Eric and Ingo will say.

> 
> FYI, I'm working on this point. later i will send patches to mail list.
> 
> Wei
> 
> On Wed, 2016-07-20 at 10:58 +0800, Baoquan He wrote:
> > Wei Jiangang reported kdump kernel always hang when "notsc" is specified
> > in boot parameter. After debugging I found there's no timer interrupt
> > in the current kexec/kdump kernel. This is caused by commit 522e66464467
> > ("x86/apic: Disable I/O APIC before shutdown of the local APIC"). Originally
> > Eric posted below patch to make system be virtual wire mode in which 8259-
> > equivalent PIC fields all interrupts and the LAPIC becomes a virtual wire.
> > Like this interrupts can be delivered from PIC to CPU via the LAPIC's local
> > interrupt 0 (LINTIN0). In virtual wire APIC mode is disabled while LAPIC
> > is software enabled and its LINT0 and LINT1 need be programmed specifically.
> > 
> > https://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm1/broken-out/x86_64-apic-virtwire-on-shutdown.patch
> > 
> > But with commit 522e66464 you can see after disable_IO_APIC had setting
> > virtual wire mode, lapic_shutdown disabled LAPIC again. Now virtual wire
> > mode doesn't work, then it cause no timer interrupt during kdump kernel
> > initialization stage until system enter into APIC mode.
> > 
> > So people may be wondering why only kdump kernel hang, the normal kernel
> > with "notsc" can still work. This is because BIOS has already built PIC mode
> > or virtual wire mode while kexec/kdump kernel doesn't go through BIOS
> > initialization. That is why we have to change system to be PIC mode or
> > virtual wire mode before jump to kexec/kdump kernel.
> > 
> > Then why kdump kernel didn't hang when "notsc" is not specified. This is
> > because tsc_init will assign the already calibrated value to lpj_fine.
> > Then kernel doesn't need to count cpu loops between jiffies with the help
> > of timer interrupt. So "notsc" is not victim, but a informer.
> > 
> > In patch 1/3 disable_IO_APIC is changed to only contain code of changeing
> > system to be PIC mode or virtual wire mode and is renamed as
> > switch_to_legacy_irq_mode. Now only call clear_IO_APIC where IO-APIC need
> > be disabled, and call switch_to_legacy_irq_mode before jump to kexe/kdump
> > kernel.
> > 
> > Patch 2/3 and 3/3 are clean up patch.
> > 
> > Baoquan He (3):
> >   x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump
> > kernel
> >   x86/apic: Clean up the names of legacy irq mode setting related
> > functions
> >   x86/apic: Clean up the apic delivery mode macro definition
> > 
> >  arch/x86/include/asm/apic.h|  2 +-
> >  arch/x86/include/asm/apicdef.h |  1 -
> >  arch/x86/include/asm/io_apic.h |  6 +++---
> >  arch/x86/kernel/apic/apic.c| 19 +++
> >  arch/x86/kernel/apic/io_apic.c | 32 +---
> >  arch/x86/kernel/crash.c|  2 +-
> >  arch/x86/kernel/machine_kexec_32.c | 15 +--
> >  arch/x86/kernel/machine_kexec_64.c | 15 +--
> >  arch/x86/kernel/reboot.c   |  2 +-
> >  arch/x86/kernel/x86_init.c |  2 +-
> >  drivers/iommu/irq_remapping.c  |  2 +-
> >  11 files changed, 46 insertions(+), 52 deletions(-)
> > 
> 
> 
> 
> ___
> kexec mailing list
>