Re: [PATCH] crash: use macro to add crashk_res into iomem early for specific arch

2024-03-24 Thread Ingo Molnar


* Baoquan He  wrote:

> On 03/24/24 at 05:06am, Ingo Molnar wrote:
> > 
> > * Baoquan He  wrote:
> > 
> ..snip
> > > ---
> > >  arch/x86/include/asm/crash_reserve.h | 2 ++
> > >  kernel/crash_reserve.c   | 7 +++
> > >  2 files changed, 9 insertions(+)
> > > 
> > > diff --git a/arch/x86/include/asm/crash_reserve.h 
> > > b/arch/x86/include/asm/crash_reserve.h
> > > index 152239f95541..4681a543eba3 100644
> > > --- a/arch/x86/include/asm/crash_reserve.h
> > > +++ b/arch/x86/include/asm/crash_reserve.h
> > > @@ -39,4 +39,6 @@ static inline unsigned long crash_low_size_default(void)
> > >  #endif
> > >  }
> > >  
> > > +# define HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY
> > > +
> > 
> > Any reason for that stray space?
> 
> No clear reason. I saw stray space was added for macro definning when my
> below patch was merged, not sure if this is preferred.

No, it's not preferred - and I don't see any stray spaces added in the 
code added by:

> commit 85fcde402db1 ("kexec: split crashkernel reservation code out from 
> crash_core.c")

Anyway, please just remove it.

> And there are a lot of "# define " when searching with 'git grep "# 
> define " arch/x86/include/'.

The overwhelming majority of those are not standalone defines like 
yours, but nested/conditional defines where the space is justified:

#ifdef CONFIG_X86_32
# define MAX_IO_APICS 64
# define MAX_LOCAL_APIC 256
#else
# define MAX_IO_APICS 128
# define MAX_LOCAL_APIC 32768
#endif

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] crash: use macro to add crashk_res into iomem early for specific arch

2024-03-23 Thread Ingo Molnar


* Baoquan He  wrote:

> There are regression reports[1][2] that crashkernel region on x86_64 can't
> be added into iomem tree sometime. This causes the later failure of kdump
> loading.
> 
> This happened after commit 4a693ce65b18 ("kdump: defer the insertion of
> crashkernel resources") was merged.
> 
> Even though, these reported issues are proved to be related to other
> component, they are just exposed after above commmit applied, I still
> would like to keep crashk_res and crashk_low_res being added into iomem
> early as before because the early adding has been always there on x86_64
> and working very well. For safety of kdump, Let's change it back.
> 
> Here, add a macro HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY to limit that
> only ARCH defining the macro can have the early adding
> crashk_res/_low_res into iomem. Then define
> HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY on x86 to enable it.
> 
> Note: In reserve_crashkernel_low(), there's a remnant of crashk_low_res
> hanlding which was mistakenly added back in commit 85fcde402db1 ("kexec:
> split crashkernel reservation code out from crash_core.c").
> 
> [1]
> [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data
> https://lore.kernel.org/all/zfv8icl6ct2jq...@darkstar.users.ipa.redhat.com/T/#u
> 
> [2]
> Question about Address Range Validation in Crash Kernel Allocation
> https://lore.kernel.org/all/4eeac1f733584855965a2ea62fa4d...@huawei.com/T/#u
> 
> Signed-off-by: Baoquan He 
> ---
>  arch/x86/include/asm/crash_reserve.h | 2 ++
>  kernel/crash_reserve.c   | 7 +++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/arch/x86/include/asm/crash_reserve.h 
> b/arch/x86/include/asm/crash_reserve.h
> index 152239f95541..4681a543eba3 100644
> --- a/arch/x86/include/asm/crash_reserve.h
> +++ b/arch/x86/include/asm/crash_reserve.h
> @@ -39,4 +39,6 @@ static inline unsigned long crash_low_size_default(void)
>  #endif
>  }
>  
> +# define HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY
> +

Any reason for that stray space?

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v11 2/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c

2020-08-06 Thread Ingo Molnar


* Chen Zhou  wrote:

> In preparation for supporting reserve_crashkernel_low in arm64 as
> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
> 
> BTW, move x86_64 CRASH_ALIGN to 2M suggested by Dave. CONFIG_PHYSICAL_ALIGN
> can be selected from 2M to 16M, move to the same as arm64.
> 
> Signed-off-by: Chen Zhou 
> ---
>  arch/x86/include/asm/kexec.h | 24 ++
>  arch/x86/kernel/setup.c  | 86 +++-
>  include/linux/crash_core.h   |  3 ++
>  include/linux/kexec.h|  2 -
>  kernel/crash_core.c  | 74 +++
>  kernel/kexec_core.c  | 17 ---
>  6 files changed, 107 insertions(+), 99 deletions(-)

Since the changes are centered around arm64, I suppose the arm64 tree 
will carry this patchset?

Assuming that this is a 100% invariant moving of code that doesn't 
regress on x86:

  Acked-by: Ingo Molnar 

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] efi/x86: Revert struct layout change to fix kexec boot regression

2020-04-14 Thread Ingo Molnar


* Ard Biesheuvel  wrote:

> On Fri, 10 Apr 2020 at 16:34, Borislav Petkov  wrote:
> >
> > On Fri, Apr 10, 2020 at 04:22:49PM +0200, Ard Biesheuvel wrote:
> > > > BTW, a fixes tag is good to have..
> > >
> > > I usually omit those for patches that fix bugs that were introduced in
> > > the current cycle.
> >
> > A valid use case for having the Fixes: tag anyway are the backporting
> > kernels gangs which might pick up the first patch for whatever reason
> > and would probably be thankful if they find the second one, i.e., the
> > fix for the first one, through grepping or other, automated means.
> >
> 
> Fair point.

I've added:

  Fixes: 0a67361dcdaa29 ("efi/x86: Remove runtime table address from kexec EFI 
setup data")

Out of abundance of caution. :-)

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

2019-12-04 Thread Ingo Molnar


* Dave Young  wrote:

> On 12/04/19 at 03:52pm, Dave Young wrote:
> > Michael Weiser reported he got below error during a kexec rebooting:
> > esrt: Unsupported ESRT version 2904149718861218184.
> > 
> > The ESRT memory stays in EFI boot services data, and it was reserved
> > in kernel via efi_mem_reserve().  The initial purpose of the reservation
> > is to reuse the EFI boot services data across kexec reboot. For example
> > the BGRT image data and some ESRT memory like Michael reported. 
> > 
> > But although the memory is reserved it is not updated in X86 e820 table.
> > And kexec_file_load iterate system ram in io resource list to find places
> > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > initramfs overwritten the ESRT memory and then the failure happened.
> 
> s/overwritten/overwrote :)  If need a repost please let me know..
> 
> > 
> > Since kexec_file_load depends on the e820 to be updated, just fix this
> > by updating the reserved EFI boot services memory as reserved type in e820.
> > 
> > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > bypassed in the reservation code path because they are assumed as reserved.
> > But the reservation is still needed for multiple kexec reboot.
> > And it is the only possible case we come here thus just drop the code
> > chunk then everything works without side effects. 
> > 
> > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > not trigger the problem, but I successfully tested with BGRT instead.
> > both kexec_load and kexec_file_load work and kdump works as well.
> > 
> > Signed-off-by: Dave Young 


So I edited this to:

 From: Dave Young 

 Michael Weiser reported he got this error during a kexec rebooting:

   esrt: Unsupported ESRT version 2904149718861218184.

 The ESRT memory stays in EFI boot services data, and it was reserved
 in kernel via efi_mem_reserve().  The initial purpose of the reservation
 is to reuse the EFI boot services data across kexec reboot. For example
 the BGRT image data and some ESRT memory like Michael reported.

 But although the memory is reserved it is not updated in the X86 E820 table,
 and kexec_file_load() iterates system RAM in the IO resource list to find 
places
 for kernel, initramfs and other stuff. In Michael's case the kexec loaded
 initramfs overwrote the ESRT memory and then the failure happened.

 Since kexec_file_load() depends on the E820 table being updated, just fix this
 by updating the reserved EFI boot services memory as reserved type in E820.

 Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
 bypassed in the reservation code path because they are assumed as reserved.

 But the reservation is still needed for multiple kexec reboots,
 and it is the only possible case we come here thus just drop the code
 chunk, then everything works without side effects.

 On my machine the ESRT memory sits in an EFI runtime data range, it does
 not trigger the problem, but I successfully tested with BGRT instead.
 both kexec_load() and kexec_file_load() work and kdump works as well.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

2019-12-04 Thread Ingo Molnar


* Dave Young  wrote:

> On 12/04/19 at 03:52pm, Dave Young wrote:
> > Michael Weiser reported he got below error during a kexec rebooting:
> > esrt: Unsupported ESRT version 2904149718861218184.
> > 
> > The ESRT memory stays in EFI boot services data, and it was reserved
> > in kernel via efi_mem_reserve().  The initial purpose of the reservation
> > is to reuse the EFI boot services data across kexec reboot. For example
> > the BGRT image data and some ESRT memory like Michael reported. 
> > 
> > But although the memory is reserved it is not updated in X86 e820 table.
> > And kexec_file_load iterate system ram in io resource list to find places
> > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > initramfs overwritten the ESRT memory and then the failure happened.
> 
> s/overwritten/overwrote :)  If need a repost please let me know..

No need, I've edited the typo. :)

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 2/2] x86/kdump: Reserve extra memory when SME or SEV is active

2019-09-10 Thread Ingo Molnar


* Kairui Song  wrote:

> Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> SWIOTLB will be enabled even if there is less than 4G of memory when SME
> is active, to support DMA of devices that not support address with the
> encrypt bit.
> 
> And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> 
> Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> encryption") will always force SWIOTLB to be enabled when SEV is active
> in all cases.
> 
> Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> and this is also true for kdump kernel. As a result kdump kernel will
> run out of already scarce pre-reserved memory easily.
> 
> So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> is specified or any offset is used. As for the high reservation case, an
> extra low memory region will always be reserved and that is enough for
> SWIOTLB. Else if the offset format is used, user should be fully aware
> of any possible kdump kernel memory requirement and have to organize the
> memory usage carefully.
> 
> Signed-off-by: Kairui Song 
> ---
>  arch/x86/kernel/setup.c | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 71f20bb18cb0..ee6a2f1e2226 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long 
> long *crash_base,
> unsigned long long *crash_size,
> bool high)
>  {
> - unsigned long long base, size;
> + unsigned long long base, size, mem_enc_req = 0;
>  
>   base = *crash_base;
>   size = *crash_size;
> @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long 
> long *crash_base,
>   if (high)
>   goto high_reserve;
>  
> + /*
> +  * When SME/SEV is active and not using high reserve,
> +  * it will always required an extra SWIOTLB region.
> +  */
> + if (mem_encrypt_active())
> + mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> +
>   base = memblock_find_in_range(CRASH_ALIGN,
> -   CRASH_ADDR_LOW_MAX, size,
> +   CRASH_ADDR_LOW_MAX,
> +   size + mem_enc_req,
> CRASH_ALIGN);

What sizes are we talking about here?

- What is the possible size range of swiotlb_size_or_default()

- What is the size of CRASH_ADDR_LOW_MAX (the old limit)?

- Why do we replace one fixed limit with another fixed limit instead of 
  accurately sizing the area, with each required feature adding its own 
  requirement to the reservation size?

I.e. please engineer this into a proper solution instead of just 
modifying it around the edges.

For example have you considered adding some sort of 
kdump_memory_reserve(size) facility, which increases the reservation size 
as something like SWIOTLB gets activated? That would avoid the ugly 
mem_encrypt_active() flag, it would just automagically work.

Thanks,

Ingo


Re: [PATCH v6 1/2] x86/kexec: Build identity mapping for EFI systab and ACPI tables

2019-05-14 Thread Ingo Molnar


* Dave Young  wrote:

> On 05/14/19 at 01:38pm, Peter Zijlstra wrote:
> > On Tue, May 14, 2019 at 04:48:41PM +0800, Dave Young wrote:
> > 
> > > > I did some tests on the laptop,  thing is:
> > > > 1. apply the 3 patches (two you posted + Boris's revert commit 
> > > > 52b922c3d49c)
> > > >on latest Linus master branch, everything works fine.
> > > > 
> > > > 2. build and test the tip/next-merge-window branch, kernel hangs early
> > > > without output, (both 1st boot and kexec boot)
> > > 
> > > Update about 2.  It should be not early rsdp related, I got the boot log
> > > Since can not reproduce with Linus master branch it may have been fixed.
> > 
> > Nothing was changed here since PTI.
> > 
> > > [0.685374][T1] rcu: Hierarchical SRCU implementation.
> > > [0.686414][T1] general protection fault:  [#1] SMP PTI
> > > [0.687328][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> > > 5.1.0-rc6+ #877
> > > [0.687328][T1] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 
> > > 83ET82WW (1.52 ) 06/04/2018
> > > [0.687328][T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450
> > 
> > > [0.687328][T1] Call Trace:
> > > [0.687328][T1]  ? hardlockup_detector_event_create+0x50/0x50
> > > [0.687328][T1]  x86_reserve_hardware+0x173/0x180
> > > [0.687328][T1]  x86_pmu_event_init+0x39/0x220
> > 
> > The DS buffers are special in that they're part of cpu_entrt_area. If
> > this comes apart it might mean your pagetables are dodgy.
> 
> Hmm, it seems caused by some WIP branch patches, I suspect below:
> commit 124d6af5a5f559e516ed2c6ea857e889ed293b43
> x86/paravirt: Standardize 'insn_buff' variable names

This commit had a bug which I fixed - could you try the latest -tip?

Thanks,

Ingo


Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels

2019-04-16 Thread Ingo Molnar


* Borislav Petkov  wrote:

> I'll queue the below in the next days if there are no more complaints:

Just a minor style nit, this was inherited from existing code:

> + efi_config_table_64_t *tbl = (efi_config_table_64_t *) 
> config_tables + i;
> + efi_config_table_32_t *tbl = (efi_config_table_32_t *) 
> config_tables + i;

You might want to update that to the canonical form of:

> + efi_config_table_64_t *tbl = (efi_config_table_64_t 
> *)config_tables + i;
> + efi_config_table_32_t *tbl = (efi_config_table_32_t 
> *)config_tables + i;

Looks good otherwise.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/4] x86: kdump: move reserve_crashkernel_low() into kexec_core.c

2019-04-12 Thread Ingo Molnar


* Chen Zhou  wrote:

> Hi Ingo,
> 
> On 2019/4/10 15:09, Ingo Molnar wrote:
> > 
> > * Chen Zhou  wrote:
> > 
> >> In preparation for supporting more than one crash kernel regions
> >> in arm64 as x86_64 does, move reserve_crashkernel_low() into
> >> kexec/kexec_core.c.
> >>
> >> Signed-off-by: Chen Zhou 
> >> ---
> >>  arch/x86/include/asm/kexec.h |  3 ++
> >>  arch/x86/kernel/setup.c  | 66 
> >> +---
> >>  include/linux/kexec.h|  1 +
> >>  kernel/kexec_core.c  | 53 +++
> >>  4 files changed, 64 insertions(+), 59 deletions(-)
> > 
> > No objections for this to be merged via the ARM tree, as long as x86 
> > functionality is kept intact.
> 
> This patch has no affect on x86.

In *principle*.

In practice the series does change x86 code:

> >>  arch/x86/kernel/setup.c  | 66 
> >> +---
> >>  include/linux/kexec.h|  1 +
> >>  kernel/kexec_core.c  | 53 +++

which is, *hopefully*, an identity transformation. :-)

I.e. Ack, but only if it doesn't break anything. :-)

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/4] x86: kdump: move reserve_crashkernel_low() into kexec_core.c

2019-04-10 Thread Ingo Molnar


* Chen Zhou  wrote:

> In preparation for supporting more than one crash kernel regions
> in arm64 as x86_64 does, move reserve_crashkernel_low() into
> kexec/kexec_core.c.
> 
> Signed-off-by: Chen Zhou 
> ---
>  arch/x86/include/asm/kexec.h |  3 ++
>  arch/x86/kernel/setup.c  | 66 
> +---
>  include/linux/kexec.h|  1 +
>  kernel/kexec_core.c  | 53 +++
>  4 files changed, 64 insertions(+), 59 deletions(-)

No objections for this to be merged via the ARM tree, as long as x86 
functionality is kept intact.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86: fix -Wmissing-prototypes warning

2018-11-22 Thread Ingo Molnar

* wang.y...@zte.com.cn  wrote:

> Hi Ingo,
> 
> > * Yi Wang  wrote:
> >
> > > We may get -Wmissing-prototypes warnings when building
> > > kernel with W=1, it's better to fix them as global function
> > > signature can be changed and caller who use the old unchanged
> > > prototypes will be hosed.
> > >
> > > This patch fixes most of -Wmissing-prototypes warnings which
> > > are in the root directory of arch/x86/kernel, not including
> > > the subdirectories.
> > >
> > > These are the warnings fixed in this patch:
> > > arch/x86/kernel/signal.c:865:17: warning: no previous prototype for 
> > > ‘sys32_x32_rt_sigreturn’ [-Wmissing-prototypes]
> 
> ..
> 
> > >  include/linux/ftrace.h | 3 +++
> > >  include/linux/kexec.h  | 1 +
> > >  include/linux/of_fdt.h | 1 +
> > >  include/linux/ptrace.h | 1 +
> > >  24 files changed, 55 insertions(+), 3 deletions(-)
> >
> > Breaks 32-bit compilation:
> >
> > ../arch/x86/include/asm/setup.h: Assembler messages:
> > ../arch/x86/include/asm/setup.h:30: Error: no such instruction: `unsigned 
> > long __startup_64(unsigned long physaddr,struct boot_params *bp)'
> >
> > I'll skip the setup.h bits - let's hope that's the only bug.
> 
> Sorry for the inconvenience, though I have checked carefully :-)
> 
> How do you produce this error? I tested this way but did not reproduce
> that error:
> # make allmodconfig
> # make -j8
> 
> Thanks for your time again!

I don't remember the exact config, but you could try the 32-bit 
allnoconfig and defconfig.

Note that the commands you used above will use the 64-bit kernel, not the 
32-bit kernel.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86: fix -Wmissing-prototypes warning

2018-11-22 Thread Ingo Molnar

* Yi Wang  wrote:

> We may get -Wmissing-prototypes warnings when building
> kernel with W=1, it's better to fix them as global function
> signature can be changed and caller who use the old unchanged
> prototypes will be hosed.
> 
> This patch fixes most of -Wmissing-prototypes warnings which
> are in the root directory of arch/x86/kernel, not including
> the subdirectories.
> 
> These are the warnings fixed in this patch:
> arch/x86/kernel/signal.c:865:17: warning: no previous prototype for 
> ‘sys32_x32_rt_sigreturn’ [-Wmissing-prototypes]
> arch/x86/kernel/signal_compat.c:164:6: warning: no previous prototype for 
> ‘sigaction_compat_abi’ [-Wmissing-prototypes]
> arch/x86/kernel/traps.c:625:46: warning: no previous prototype for 
> ‘sync_regs’ [-Wmissing-prototypes]
> arch/x86/kernel/traps.c:640:24: warning: no previous prototype for 
> ‘fixup_bad_iret’ [-Wmissing-prototypes]
> arch/x86/kernel/traps.c:929:13: warning: no previous prototype for 
> ‘trap_init’ [-Wmissing-prototypes]
> arch/x86/kernel/irq.c:270:28: warning: no previous prototype for 
> ‘smp_x86_platform_ipi’ [-Wmissing-prototypes]
> arch/x86/kernel/irq.c:301:16: warning: no previous prototype for 
> ‘smp_kvm_posted_intr_ipi’ [-Wmissing-prototypes]
> arch/x86/kernel/irq.c:314:16: warning: no previous prototype for 
> ‘smp_kvm_posted_intr_wakeup_ipi’ [-Wmissing-prototypes]
> arch/x86/kernel/irq.c:328:16: warning: no previous prototype for 
> ‘smp_kvm_posted_intr_nested_ipi’ [-Wmissing-prototypes]
> arch/x86/kernel/irq_work.c:16:28: warning: no previous prototype for 
> ‘smp_irq_work_interrupt’ [-Wmissing-prototypes]
> arch/x86/kernel/irqinit.c:79:13: warning: no previous prototype for 
> ‘init_IRQ’ [-Wmissing-prototypes]
> arch/x86/kernel/quirks.c:672:13: warning: no previous prototype for 
> ‘early_platform_quirks’ [-Wmissing-prototypes]
> arch/x86/kernel/tsc.c:1499:15: warning: no previous prototype for 
> ‘calibrate_delay_is_known’ [-Wmissing-prototypes]
> arch/x86/kernel/process.c:653:13: warning: no previous prototype for 
> ‘arch_post_acpi_subsys_init’ [-Wmissing-prototypes]
> arch/x86/kernel/process.c:717:15: warning: no previous prototype for 
> ‘arch_randomize_brk’ [-Wmissing-prototypes]
> arch/x86/kernel/process.c:784:6: warning: no previous prototype for 
> ‘do_arch_prctl_common’ [-Wmissing-prototypes]
> arch/x86/kernel/reboot.c:869:6: warning: no previous prototype for 
> ‘nmi_panic_self_stop’ [-Wmissing-prototypes]
> arch/x86/kernel/smp.c:176:27: warning: no previous prototype for 
> ‘smp_reboot_interrupt’ [-Wmissing-prototypes]
> arch/x86/kernel/smp.c:260:28: warning: no previous prototype for 
> ‘smp_reschedule_interrupt’ [-Wmissing-prototypes]
> arch/x86/kernel/smp.c:281:28: warning: no previous prototype for 
> ‘smp_call_function_interrupt’ [-Wmissing-prototypes]
> arch/x86/kernel/smp.c:291:28: warning: no previous prototype for 
> ‘smp_call_function_single_interrupt’ [-Wmissing-prototypes]
> arch/x86/kernel/ftrace.c:840:6: warning: no previous prototype for 
> ‘arch_ftrace_update_trampoline’ [-Wmissing-prototypes]
> arch/x86/kernel/ftrace.c:934:7: warning: no previous prototype for 
> ‘arch_ftrace_trampoline_func’ [-Wmissing-prototypes]
> arch/x86/kernel/ftrace.c:946:6: warning: no previous prototype for 
> ‘arch_ftrace_trampoline_free’ [-Wmissing-prototypes]
> arch/x86/kernel/crash.c:114:6: warning: no previous prototype for 
> ‘crash_smp_send_stop’ [-Wmissing-prototypes]
> arch/x86/kernel/crash.c:351:5: warning: no previous prototype for 
> ‘crash_setup_memmap_entries’ [-Wmissing-prototypes]
> arch/x86/kernel/crash.c:424:5: warning: no previous prototype for 
> ‘crash_load_segments’ [-Wmissing-prototypes]
> arch/x86/kernel/machine_kexec_64.c:372:7: warning: no previous prototype for 
> ‘arch_kexec_kernel_image_load’ [-Wmissing-prototypes]
> arch/x86/kernel/paravirt-spinlocks.c:12:16: warning: no previous prototype 
> for ‘__native_queued_spin_unlock’ [-Wmissing-prototypes]
> arch/x86/kernel/paravirt-spinlocks.c:18:6: warning: no previous prototype for 
> ‘pv_is_native_spin_unlock’ [-Wmissing-prototypes]
> arch/x86/kernel/paravirt-spinlocks.c:24:16: warning: no previous prototype 
> for ‘__native_vcpu_is_preempted’ [-Wmissing-prototypes]
> arch/x86/kernel/paravirt-spinlocks.c:30:6: warning: no previous prototype for 
> ‘pv_is_native_vcpu_is_preempted’ [-Wmissing-prototypes]
> arch/x86/kernel/kvm.c:258:1: warning: no previous prototype for 
> ‘do_async_page_fault’ [-Wmissing-prototypes]
> arch/x86/kernel/jailhouse.c:200:6: warning: no previous prototype for 
> ‘jailhouse_paravirt’ [-Wmissing-prototypes]
> arch/x86/kernel/check.c:91:13: warning: no previous prototype for 
> ‘setup_bios_corruption_check’ [-Wmissing-prototypes]
> arch/x86/kernel/check.c:139:6: warning: no previous prototype for 
> ‘check_for_bios_corruption’ [-Wmissing-prototypes]
> arch/x86/kernel/devicetree.c:32:13: warning: no previous prototype for 
> ‘early_init_dt_scan_chosen_arch’ [-Wmissing-prototypes]
> arch/x86/kernel/devicetree.c:42:13: warning: 

Re: [GIT PULL] remove in-kernel calls to syscalls

2018-04-03 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Mon, Apr 2, 2018 at 12:04 PM, Dominik Brodowski
>  wrote:
> >
> > This patchset removes all in-kernel calls to syscall functions in the
> > kernel with the exception of arch/.
> 
> Ok, this finished off my arch updates for today, I'll probably move on
> to driver pulls tomorrow.
> 
> Anyway, it's in my tree, will push out once my test build finishes.

Thanks!

Dominik, if you submit the x86 ptregs conversion patches in the next 1-2 days 
on 
top of Linus's tree (642e7fd23353), then I can apply them and if they are 
problem-free I can perhaps tempt Linus with a pull request early next week or 
so.

The Spectre angle does make me want those changes as well.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo

2018-01-13 Thread Ingo Molnar

* Kirill A. Shutemov  wrote:

> Depending on configuration mem_section can now be an array or a pointer
> to an array allocated dynamically. In most cases, we can continue to refer
> to it as 'mem_section' regardless of what it is.
> 
> But there's one exception: '_section' means "address of the array" if
> mem_section is an array, but if mem_section is a pointer, it would mean
> "address of the pointer".
> 
> We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section)
> writes down address of pointer into vmcoreinfo, not array as we wanted.
> 
> Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the
> situation correctly for both cases.
> 
> Signed-off-by: Kirill A. Shutemov 
> Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for 
> CONFIG_SPARSEMEM_EXTREME=y")
> Cc: sta...@vger.kernel.org
> Acked-by: Baoquan He 
> Acked-by: Dave Young 

You forgot the Reported-by - I added that to the commit.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 1/2] x86/mm, kexec: Fix memory corruption with SME on successive kexecs

2017-07-27 Thread Ingo Molnar

* Tom Lendacky  wrote:

> After issuing successive kexecs it was found that the SHA hash failed
> verification when booting the kexec'd kernel.  When SME is enabled, the
> change from using pages that were marked encrypted to now being marked as
> not encrypted (through new identify mapped page tables) results in memory
> corruption if there are any cache entries for the previously encrypted
> pages. This is because separate cache entries can exist for the same
> physical location but tagged both with and without the encryption bit.
> 
> To prevent this, issue a wbinvd before copying the pages from the source
> location to the destination location to clear any possible cache entry
> conflicts.
> 
> Cc: 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/kernel/relocate_kernel_64.S | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S 
> b/arch/x86/kernel/relocate_kernel_64.S
> index 98111b3..c11d8bc 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -132,6 +132,13 @@ identity_mapped:
>   /* Flush the TLB (needed?) */
>   movq%r9, %cr3
>  
> + /*
> +  * If SME is/was active, there could be old encrypted cache line
> +  * entries that will conflict with the now unencrypted memory
> +  * used by kexec. Flush the caches before copying the kernel.
> +  */
> + wbinvd

WBINVD is very expensive IIRC - several milliseconds.

So if we change the page table from encrypted to unencrypted we need to do a 
full 
cache flush sounds pretty broken to me - how can then this be done via an API 
such 
as mmap() without executing WBINVD?

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 00/38] x86: Secure Memory Encryption (AMD)

2017-07-08 Thread Ingo Molnar

* Tom Lendacky  wrote:

> This patch series provides support for AMD's new Secure Memory Encryption 
> (SME)
> feature.

I'm wondering, what's the typical performance hit to DRAM access latency when 
SME 
is enabled?

On that same note, if the performance hit is noticeable I'd expect SME to not 
be 
enabled in native kernels typically - but still it looks like a useful hardware 
feature. Since it's controlled at the page table level, have you considered 
allowing SME-activated vmas via mmap(), even on kernels that are otherwise not 
using encrypted DRAM?

One would think that putting encryption keys into such encrypted RAM regions 
would 
generally improve robustness against various physical space attacks that want 
to 
extract keys but don't have full control of the CPU.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available

2017-05-08 Thread Ingo Molnar

* Xunlei Pang <xp...@redhat.com> wrote:

> On 05/05/2017 at 05:20 PM, Ingo Molnar wrote:
> > * Xunlei Pang <xp...@redhat.com> wrote:
> >
> >> On 05/05/2017 at 02:52 PM, Ingo Molnar wrote:
> >>> * Xunlei Pang <xlp...@redhat.com> wrote:
> >>>
> >>>> @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, 
> >>>> unsigned long start_pgtable)
> >>>>  
> >>>>  level4p = (pgd_t *)__va(start_pgtable);
> >>>>  clear_page(level4p);
> >>>> +
> >>>> +if (direct_gbpages)
> >>>> +info.direct_gbpages = true;
> >>> No, this should be keyed off the CPU feature (X86_FEATURE_GBPAGES) 
> >>> automatically, 
> >>> not set blindly! AFAICS this patch will crash kexec on any CPU that does 
> >>> not 
> >>> support gbpages.
> >> It should be fine, probe_page_size_mask() already takes care of this:
> >> if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
> >> printk(KERN_INFO "Using GB pages for direct mapping\n");
> >> page_size_mask |= 1 << PG_LEVEL_1G;
> >> } else {
> >> direct_gbpages = 0;
> >> }
> >>
> >> So if X86_FEATURE_GBPAGES is not supported, direct_gbpages will be set to 
> >> 0.
> > So why is the introduction of the info.direct_gbpages flag necessary? 
> > AFAICS it 
> > just duplicates the kernel's direct_gbpages flag. One outcome is that 
> > hibernation 
> > won't use gbpages, which is silly.
> 
> boot/compressed/pagetable.c also uses kernel_ident_mapping_init() for kaslr, 
> at 
> the moment we don't have "direct_gbpages" definition or X86_FEATURE_GBPAGES 
> feature detection.
> 
> I thought that we can change the other call sites when found really needed.

Ok, you are right - I'll use the original patches as submitted, with the 
updated 
changelogs.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available

2017-05-05 Thread Ingo Molnar

* Xunlei Pang <xp...@redhat.com> wrote:

> On 05/05/2017 at 02:52 PM, Ingo Molnar wrote:
> > * Xunlei Pang <xlp...@redhat.com> wrote:
> >
> >> @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, 
> >> unsigned long start_pgtable)
> >>  
> >>level4p = (pgd_t *)__va(start_pgtable);
> >>clear_page(level4p);
> >> +
> >> +  if (direct_gbpages)
> >> +  info.direct_gbpages = true;
> > No, this should be keyed off the CPU feature (X86_FEATURE_GBPAGES) 
> > automatically, 
> > not set blindly! AFAICS this patch will crash kexec on any CPU that does 
> > not 
> > support gbpages.
> 
> It should be fine, probe_page_size_mask() already takes care of this:
> if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
> printk(KERN_INFO "Using GB pages for direct mapping\n");
> page_size_mask |= 1 << PG_LEVEL_1G;
> } else {
> direct_gbpages = 0;
> }
> 
> So if X86_FEATURE_GBPAGES is not supported, direct_gbpages will be set to 0.

So why is the introduction of the info.direct_gbpages flag necessary? AFAICS it 
just duplicates the kernel's direct_gbpages flag. One outcome is that 
hibernation 
won't use gbpages, which is silly.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available

2017-05-05 Thread Ingo Molnar

* Xunlei Pang  wrote:

> @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned 
> long start_pgtable)
>  
>   level4p = (pgd_t *)__va(start_pgtable);
>   clear_page(level4p);
> +
> + if (direct_gbpages)
> + info.direct_gbpages = true;

No, this should be keyed off the CPU feature (X86_FEATURE_GBPAGES) 
automatically, 
not set blindly! AFAICS this patch will crash kexec on any CPU that does not 
support gbpages.

I only noticed this problem after having fixed/enhanced all the changelogs - so 
please pick up the new changelog up from the log below.

Thanks,

Ingo


>

Author: Xunlei Pang 

x86/mm: Add support for gbpages to kernel_ident_mapping_init()

Kernel identity mappings on x86-64 kernels are created in two
ways: by the early x86 boot code, or by kernel_ident_mapping_init().

Native kernels (which is the dominant usecase) use the former,
but the kexec and the hibernation code uses kernel_ident_mapping_init().

There's a subtle difference between these two ways of how identity
mappings are created, the current kernel_ident_mapping_init() code
creates identity mappings always using 2MB page(PMD level) - while
the native kernel boot path also utilizes gbpages where available.

This difference is suboptimal both for performance and for memory
usage: kernel_ident_mapping_init() needs to allocate pages for the
page tables when creating the new identity mappings.

This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
to address these concerns.

The primary advantage would be better TLB coverage/performance,
because we'd utilize 1GB TLBs instead of 2MB ones.

It is also useful for machines with large number of memory to
save paging structure allocations(around 4MB/TB using 2MB page)
when setting identity mappings for all the memory, after using
1GB page it will consume only 8KB/TB.

( Note that this change alone does not activate gbpages in kexec,
  we are doing that in a separate patch. )


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] memmap: Parse "Reserved" together with "reserved"

2017-04-26 Thread Ingo Molnar

* Yinghai Lu  wrote:

> For x86 with recent kernel after
>  commit 640e1b38b0 ("x86/boot/e820: Basic cleanup of e820.c")
> change "reserved" to "Reserved" in /sys firmware memmap and /proc/iomem.
> 
> So here, we add handling for that too.
> 
> Signed-off-by: Yinghai Lu 
> 
> ---
>  kexec/arch/i386/crashdump-x86.c |2 ++
>  kexec/arch/ia64/kexec-ia64.c|2 ++
>  kexec/arch/mips/kexec-mips.c|2 ++
>  kexec/firmware_memmap.c |2 ++
>  4 files changed, 8 insertions(+)

I'd rather fix the bug I introduced and undo the reserved->Reserved string 
change 
in e820.c: I didn't realize that it's exposed in sysfs and had quasi-ABI 
consequences for kexec.

Agreed?

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-14 Thread Ingo Molnar

* Thomas Gleixner  wrote:

> Borislav,
> 
> On Mon, 5 Oct 2015, Borislav Petkov wrote:
> > On Mon, Oct 05, 2015 at 02:03:58AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > That's different from my point of view.  I'm not going to pass
> > > some data from the first kernel to the second kernel. I'm just going to
> > > provide a configurable option for the second kernel to users.
> > 
> > Dude, WTF?! You're adding a kernel command line which is supposed to
> > be used *only* by the kdump kernel. But nooo, it is there in the open
> > and visible to people. And anyone can type it in during boot. AND THAT
> > SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> > 
> > This information is strictly for the kdump kernel - it shouldn't be a
> > generic command line option. How hard it is to understand that simple
> > fact?!
> 
> Calm down!
> 
> Disabling that NMI in the first kernel is not going to make the world
> explode. We have enough command line options a user can type in, which
> are way worse than that one. If some "expert" types nonsense on the
> first kernel command line, then it's none of our problems, really.
> 
> If Kawai would have marked that option as a debug feature, this
> discussion would have probably never happened.
> 
> Aside of that, the best way to hand in options for the kdump kernel is
> the command line. We have an existing interface for that.
> 
> Let's move on. Nothing to see here.

So Boris kind of has a point: there are numerous problems with boot options as 
kexec parameter interface:

 - boot option strings are not a well defined programmatic interface:
- failures are not obvious (they are often ignored)
- inserting/setting parameters is awkward as well.

 - boot options are not an ABI, so when options have dual use with kexec it's 
easy
   to break things inadvertently: without that failure being apparent other than
   reintroducing obscure kexec failure modes again.

 - in the booted up kexec kernel it's not really obvious which options got set 
by
   kexec and which got set by the user - as there's no separation of namespaces.

 - likewise, if the user specifies an option in conflict with a kexec 
requirement
   it's not really obvious what's happening: the user setting should probably
   dominate - I'm not sure that's the case with the current kexec code.

Boot options are basically a user interface.

On the other hand the hack of (ab-)using boot parameters as kexec parameter 
passing is an existing, many years old practice and we cannot just stop it 
without 
offering an alternative (and better!) interface first.

We could improve things by either adding a separate kexec-only parameter 
passing 
facility (like programmatic boot parameters are) - or by creating some sort of 
boot parameter alias that clearly identifies kexec parameters.

So for example when introducing 'noextnmi' we'd also add a facility to add a 
'kexec_noextnmi' alias - which clearly identifies this boot parameter as a 
kexec 
related one.

Every 'kexec' inserted parameter would be prefixed by kexec_ - and then the 
separation of namespaces (and the prioritization of user vs. kexec 
requirements) 
becomes well defined as well..

We should also probably print a warning if a kexec_* parameter is passed in 
that 
has no matching handler in the kexec()-ed kernel.

I do concur that this patch is probably OK given existing practices.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-10-02 Thread Ingo Molnar

* Andrew Morton  wrote:

> On Tue, 29 Sep 2015 20:58:57 +0800 "Lee, Chun-Yi"  
> wrote:
> 
> > This patch modified the code in fill_up_crash_elf_data by using
> > walk_system_ram_res instead of walk_system_ram_range to count the max
> > number of crash memory ranges. That's because the walk_system_ram_range
> > filters out small memory regions that are resided in the same page, but
> > walk_system_ram_res does not.
> > 
> > The oringial issue is page fault error that sometimes happened on big 
> > machines
> > when preparing ELF headers:
> > 
> > [  305.291522] BUG: unable to handle kernel paging request at 
> > c90613fc9000
> > [  305.299621] IP: [] 
> > prepare_elf64_ram_headers_callback+0x165/0x260
> > [  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> > [  305.315393] Oops: 0002 [#1] SMP
> > [...snip]
> > [  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
> > 8e1c03ec2000
> > [  305.429292] RIP: 0010:[]  [] 
> > prepare_elf64_ra
> > m_headers_callback+0x165/0x260
> > [...snip]
> > 
> > After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
> > the code uses walk_system_ram_res to fill-in crash memory regions 
> > information
> > to program header, so it counts those small memory regions that are resided 
> > in
> > a page area. But, when kernel was using walk_system_ram_range in
> > fill_up_crash_elf_data to count the number of crash memory regions, it 
> > filters
> > out small regions. I printed those small memory regions, for example:
> > 
> > kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, 
> > sz=0xdc0
> > 
> > Base on the code in walk_system_ram_range, this memory region will be 
> > filtered
> > out:
> > 
> > pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> > end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> > end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
> > 
> > So, the max_nr_ranges that's counted by kernel doesn't include small memory
> > regions. That causes the page fault issue happened in later code path for
> > preparing EFL headers.
> > 
> > This issus is not easy to reproduce on small machines that don't have too
> > many CPUs because the allocated page aligned ELF buffer has more free space
> > to cover those small memory regions' PT_LOAD headers.
> > 
> 
> fyi, I added a cc:stable to my copy of this patch.

Note that I already have it applied, with a much improved changelog:

  e3c41e37b0f4 ("x86/kexec: Fix kexec crash in syscall kexec_file_load()")

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/3 v3] Fix kdump failures with crashkernel=high

2015-06-07 Thread Ingo Molnar

* Borislav Petkov b...@alien8.de wrote:

  Joerg Roedel (3):
swiotlb: Warn on allocation failure in swiotlb_alloc_coherent
x86, swiotlb: Try coherent allocations with __GFP_NOWARN
x86, crash: Allocate enough low-mem when crashkernel=high
  
   arch/x86/kernel/pci-swiotlb.c |  7 +++
   arch/x86/kernel/setup.c   |  5 -
   lib/swiotlb.c | 11 +--
   3 files changed, 20 insertions(+), 3 deletions(-)
 
 Looks simple enough, all applied, thanks.
 
 hpa, Ingo, any concerns?

Modulo a couple of typos it all looks sane.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Re: [PATCH v2] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-24 Thread Ingo Molnar

* Vivek Goyal vgo...@redhat.com wrote:

  Yet the actual bug is in that commit, 'crash_kexec_post_notifiers' 
  was clearly not a no-op in the default case, against expectations.
 
 Hi Ingo,
 
 I did a quick test and in default case crash_kexec() runs before 
 panic notifiers. So it does look like crash_kexec_post_notifiers is 
 a no-op in default case.
 
 What am I missing.

Well, look at f06e5153f4ae:

diff --git a/kernel/panic.c b/kernel/panic.c
index d02fa9fef46a..62e16cef9cc2 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -32,6 +32,7 @@ static unsigned long tainted_mask;
 static int pause_on_oops;
 static int pause_on_oops_flag;
 static DEFINE_SPINLOCK(pause_on_oops_lock);
+static bool crash_kexec_post_notifiers;
 
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
@@ -112,9 +113,11 @@ void panic(const char *fmt, ...)
/*
 * If we have crashed and we have a crash kernel loaded let it handle
 * everything else.
-* Do we want to call this before we try to display a message?
+* If we want to run this after calling panic_notifiers, pass
+* the crash_kexec_post_notifiers option to the kernel.
 */
-   crash_kexec(NULL);
+   if (!crash_kexec_post_notifiers)
+   crash_kexec(NULL);
 
/*
 * Note smp_send_stop is the usual smp shutdown function, which
@@ -131,6 +134,15 @@ void panic(const char *fmt, ...)
 
kmsg_dump(KMSG_DUMP_PANIC);
 
+   /*
+* If you doubt kdump always works fine in any situation,
+* crash_kexec_post_notifiers offers you a chance to run
+* panic_notifiers and dumping kmsg before kdump.
+* Note: since some panic_notifiers can make crashed kernel
+* more unstable, it can increase risks of the kdump failure too.
+*/
+   crash_kexec(NULL);
+
bust_spinlocks(0);
 
if (!panic_blink)


Without knowing what crash_kexec() does, the patch looks buggy: it 
should preserve the old behavior by default, yet it will now execute a 
second crash_kexec() after the kmsg_dump() line.

So the invariant change would have been to do:

-   crash_kexec(NULL);
+   if (!crash_kexec_post_notifiers)
+   crash_kexec(NULL);

...

+   if (crash_kexec_post_notifiers)
+   crash_kexec(NULL);

Which in the !crash_kexec_post_notifiers flag case reduces to:

crash_kexec();

...

/* NOP */

I.e. to exactly what the kernel was doing without the patch 
originally.

Which is what my patch does. Nothing more, nothing less.

There might be other bugs with the patch, I didn't consider that.

A revert would be fine as well.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-23 Thread Ingo Molnar

* Vivek Goyal vgo...@redhat.com wrote:

 On Mon, Mar 23, 2015 at 08:19:43AM +0100, Ingo Molnar wrote:
  
  * Baoquan He b...@redhat.com wrote:
  
   CC more people ...
   
   On 03/07/15 at 01:31am, Hatayama, Daisuke/畑山 大輔 wrote:
The commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 introduced
crash_kexec_post_notifiers kernel boot option, which toggles
wheather panic() calls crash_kexec() before panic_notifiers and dump
kmsg or after.

The problem is that the commit overlooks panic_on_oops kernel boot
option. If it is enabled, crash_kexec() is called directly without
going through panic() in oops path.

To fix this issue, this patch adds a check to
crash_kexec_post_notifiers in the condition of kexec_should_crash().

Also, put a comment in kexec_should_crash() to explain not obvious
things on this patch.

Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
Acked-by: Baoquan He b...@redhat.com
Tested-by: Hidehiro Kawai hidehiro.kawai...@hitachi.com
Reviewed-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
---
 include/linux/kernel.h |  3 +++
 kernel/kexec.c | 11 +++
 kernel/panic.c |  2 +-
 3 files changed, 15 insertions(+), 1 deletion(-)
  
  This is hack upon hack, but why was this crap merged in the first 
  place?
  
  I see two problems just by cursory review:
  
  1)
  
  Firstly, the real bug in:
  
f06e5153f4ae (kernel/panic.c: add crash_kexec_post_notifiers option 
  for kdump after panic_notifers)
  
  Was that crash_kexec() was called unconditionally after notifiers were 
  called, which should be fixed via the simple patch below (untested). 
  Looks much simpler than your fix.
  
 
 Hi Ingo,
 
 Agreed. Your patch looks good.

In case you want that simpler fix and need my SOB:

  Signed-off-by: Ingo Molnar mi...@kernel.org

(but I have not tested it.)

  Secondly, and more importantly, the whole premise of commit 
  f06e5153f4ae is broken IMHO:
  
   This can help rare situations where kdump fails because of unstable
crashed kernel or hardware failure (memory corruption on critical
data/code)
  
  wtf?
  
  If the kernel crashed due to a kernel crash, then the kernel booting 
  up in whatever hardware state should be able to do a clean bootup. The 
  fix for those 'rare situations' should be to fix the real bug (for 
  example by making hardware driver init (or deinit) sequences more 
  robust), not to paper it over by ordering around crash-time sequences 
  ...
  
  If it crashed due to some hardware failure, there's literally an 
  infinite amount of failure modes that may or may not be impacted by 
  kexec crash-time handling ordering. We don't want to put a zillion 
  such flags into the kernel proper just to allow the perturbation of 
  the kernel.
 
 I think one of the motivations behind this patch was call to kmsg_dump().
 Some vendors have been wanting to have the capability to save kernel logs
 to some NVRAM before transition to second kernel happens. Their argument
 is that kdump does not succeed all the time and if kdump does not succeed
 then atleast they have something to work with (kernel logs retrieved
 from pstore interface).

Doesn't pstore attach itself to printk itself? AFAICS it does:

 fs/pstore/platform.c:   register_console(pstore_console);

so the printk log leading up to and including the crash should be 
available, regardless of this patch. What am I missing?

 Not that I agree fully with this as problem might happen while we 
 try to run panic_notifiers or kmsg_dump hooks and never transition 
 into kdump kernel.

btw., this is the big problem with 'notifiers' in general: they are 
opaque with barely any semantics defined, and a source of constant 
confusion.

 And it has been literally years since some developers have been 
 pushing for allowing to run panic notifiers before crash_kexec(). 
 Eric Biederman has been pushing back saying it reduces the 
 reliability of kdump operation so this is not acceptable.

So what do those notifiers do?

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [patch 1/3 v2] Add function get_bootparam

2013-11-11 Thread Ingo Molnar

* H. Peter Anvin h...@zytor.com wrote:

 On 11/11/2013 05:27 PM, Toshi Kani wrote:
  On Tue, 2013-11-05 at 16:29 +0800, dyo...@redhat.com wrote:
  Not only setup_subarch will get data from debugfs file
  boot_params/data, later code for adding efi_info will
  also need do same thing. Thus add a common function here
  for later use.
  
  get_bootparam() calls find_mnt_by_fsname(debugfs), which assumes that
  debugfs is mounted with device name debugfs.  This function fails
  when:
   - debugfs is not mounted, or
   - debugfs is mounted with a different device name, such as nodev [1].
  
  This issue is not introduced by this patch.  But the original code,
  which sets hardware_subarch, seems to work even if it failed to access
  debugfs (which is ignored) since hardware_subarch is zero most of the
  cases anyway.  With this change, however, this failure now makes the 2nd
  kernel unbootable.  So, it needs to be addressed to make this code path
  work reliably (or kexec should fail at least).
  
 
 Greg, Ingo,
 
 Since there is now a legitimate user of this stuff, can we actually 
 export this in sysfs (or something else other than sploit^Wdebugfs)?
 
 kexec-tools can have a fallback to debugfs if we really need it, but 
 making people mount debugfs to have some essential piece of 
 functionality scares the heck out of me.

No principial objections from me: anything that actually turns out to 
prove itself in debugfs and ends up mattering for real ought to move to 
sysfs.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [tip:x86/debug] x86/kdump: No need to disable ioapic/ lapic in crash path

2012-02-12 Thread Ingo Molnar

* Yinghai Lu ying...@kernel.org wrote:

 On Sat, Feb 11, 2012 at 3:09 PM, tip-bot for Don Zickus
 dzic...@redhat.com wrote:
  Commit-ID:  d9bc9be89629445758670220787683e37c93f6c1
  Gitweb:     
  http://git.kernel.org/tip/d9bc9be89629445758670220787683e37c93f6c1
  Author:     Don Zickus dzic...@redhat.com
  AuthorDate: Thu, 9 Feb 2012 16:53:41 -0500
  Committer:  Ingo Molnar mi...@elte.hu
  CommitDate: Sat, 11 Feb 2012 15:38:53 +0100
 
  x86/kdump: No need to disable ioapic/lapic in crash path
 
  A customer of ours noticed when their machine crashed, kdump did
  not work but hung instead.  Using their firmware dumping
  solution they grabbed a vmcore and decoded the stacks on the
  cpus.  What they noticed seemed to be a rare deadlock with the
  ioapic_lock.
 
   CPU4:
   machine_crash_shutdown
   - machine_ops.crash_shutdown
     - native_machine_crash_shutdown
        - kdump_nmi_shootdown_cpus -- Send NMI to other CPUs
        - disable_IO_APIC
           - clear_IO_APIC
              - clear_IO_APIC_pin
                 - ioapic_read_entry
                    - spin_lock_irqsave(ioapic_lock, flags)
                    ---Infinite loop here---
 
   CPU0:
   do_IRQ
   - handle_irq
     - handle_edge_irq
         - ack_apic_edge
            - move_native_irq
                - mask_IO_APIC_irq
                   - mask_IO_APIC_irq_desc
                      - spin_lock_irqsave(ioapic_lock, flags)
                      ---Receive NMI here after getting spinlock---
                         - nmi
                            - do_nmi
                               - crash_nmi_callback
                               ---Infinite loop here---
 
  The problem is that although kdump tries to shutdown minimal
  hardware, it still needs to disable the IO APIC.  This requires
  spinlocks which may be held by another cpu.  This other cpu is
  being held infinitely in an NMI context by kdump in order to
  serialize the crashing path.  Instant deadlock.
 
  Eric brought up a point that because the boot code was
  restructured we may not need to disable the io apic any more in
  the crash path.  The original concern that led to the
  development of disable_IO_APIC, was that the jiffies calibration
  on boot up relied on the PIT timer for reference.  Access to the
  PIT required 8259 interrupts to be working.  This wouldn't work
  if the ioapic needed to be configured.  So on panic path, the
  ioapic was reconfigured to use virtual wire mode to allow the 8259 to 
  passthrough.
 
  Those concerns don't hold true now, thanks to the jiffies
  calibration code not needing the PIT.  As a result, we can
  remove this call and simplify the locking needed in the panic
  path.
 
  The same work allowed us to remove the need to disable the local
  apic on shutdown too.  This should allow us to jump to the
  second a little faster.
 
  I tested kdump on an Ivy Bridge platform, a Pentium4 and an old
  athlon that did not have an ioapic.  All three were successful.
 
  I also tested using lkdtm that would use jprobes to panic the
  system when entering do_IRQ.  The idea was to see how the system
  reacted with an interrupt pending in the second kernel.  My
  core2 quad successfully kdump'd 3 times in a row with no issues.
 
  v2: removed the disable lapic code too
 
 with this commit, kdump is not working anymore on my setups with
 Nehalem, Westmere, sandbridge.
 these setup all have VT-d enabled.
 
 After reverting this commit, kdump is working again.
 
 So assume you need to drop this patch.

Dropped the patch, thanks for reporting this.

Don, Eric?

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] generic-ipi: Initialize call_single_queue before enabling interrupt

2011-06-17 Thread Ingo Molnar

* Takao Indoh indou.ta...@jp.fujitsu.com wrote:

 Just a reminder...
 
 Thanks for acks, but I'm still waiting for inclusion.
 I tested 3.0.0-rc3 and kdump still failed due to pending IPI
 from 1st kernel. Need this patch to fix it.

For any future patch that has a review discussion (like this one) 
please collect Acks and resend the patch with the acks included once 
the discussion results in a final patch, otherwise maintainers might 
skip the patch as still being reviewed.

I've collected the acks here and started testing your patch.

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic

2011-01-26 Thread Ingo Molnar

* Ahmed S. Darwish darwish...@gmail.com wrote:

 - The latest approach (proposed by Linus) is to forget the disk: jump to
   real-mode, but display the kernel log in a fancy format (with scroll
   ups and downs) instead.
 
   Will re-initializing the VGA registers to their POST state be possible?
   If not, what about a fail-safe VGA driver?
 
 I'm most likely going to implement either the second or the third point,
 so I'd really appreciate some input.

The third one suggested by Linus is the most useful and most direct one IMO. 
Such a 
'oops mode' screen would be useful for regular kernel crashes as well.

Also, have you tried BIOS warm reset vector, which is supposed to reboot 
without 
clearing RAM contents - how well does it work in practice on typical laptops? 
If on 
crash we could reboot without memory getting cleared that would open up a vast 
area 
of space to store the kernel log into (RAM).

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic

2011-01-26 Thread Ingo Molnar

* Ahmed S. Darwish darwish...@gmail.com wrote:

  Also, have you tried BIOS warm reset vector, which is supposed to reboot 
  without 
  clearing RAM contents - how well does it work in practice on typical 
  laptops? If 
  on crash we could reboot without memory getting cleared that would open up 
  a 
  vast area of space to store the kernel log into (RAM).
 
 AFAIK, the lmode-rmode transition is more forward-compatible.
 
 It seems the only place warm boot was documented is in the Intel MP spec, a 
 12-years old document long obsoleted by ACPI. Meanwhile, the real-mode 
 transition 
 is rigorously documented in the current Intel and AMD manuals, albeit in kind 
 of a 
 holier-than-thou approach.

I mean, use the warm reset vector to truly reset the box.

Then, once a stable known-good kernel boots, *that* kernel can then recover all 
the 
log data which is sitting in a well-known place in RAM, automatically and 
transparently.

Basically a bit like kexec, just more convenient and it also goes through the 
BIOS 
warm reset, so it might work better than kexec ...

Thanks,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 03/14] x86, config: change defaults PHYSICAL_START and PHYSICAL_ALIGN

2009-05-08 Thread Ingo Molnar

* Sam Ravnborg s...@ravnborg.org wrote:

 On Thu, May 07, 2009 at 03:26:51PM -0700, H. Peter Anvin wrote:
  From: H. Peter Anvin h...@zytor.com
  
  Change the default for CONFIG_PHYSICAL_START to 16 MB; 4 MB if
  EMBEDDED.  Change the default for CONFIG_PHYSICAL_ALIGN to match up
  with a large page alignment datum (4 MB for non-PAE, 2 MB for PAE.)
  
  The default of 16 MB is designed to avoid occupying ZONE_DMA, as well
  as avoiding a 15-16 MB memory hole, which some vendors inexplicably
  still have in their systems.
  
  [ Impact: minor performance improvement, leave ZONE_DMA free ]
  
  Signed-off-by: H. Peter Anvin h...@zytor.com
  ---
   arch/x86/Kconfig |   19 +++
   1 files changed, 11 insertions(+), 8 deletions(-)
  
  diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
  index 039c3f0..13b3a5a 100644
  --- a/arch/x86/Kconfig
  +++ b/arch/x86/Kconfig
  @@ -1455,17 +1455,20 @@ config KEXEC_JUMP
   
   config PHYSICAL_START
  hex Physical address where the kernel is loaded if (EMBEDDED || 
  CRASH_DUMP)
  -   default 0x100 if X86_NUMAQ
  -   default 0x20 if X86_64
  -   default 0x10
  +   default 0x40 if EMBEDDED
  +   default 0x100
  ---help---
This gives the physical address where the kernel is loaded.
   
If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
bzImage will decompress itself to above physical address and
  - run from there. Otherwise, bzImage will run from the address where
  - it has been loaded by the boot loader and will ignore above physical
  - address.
  + run from there. Otherwise, bzImage will run from the either
  + the above physical address or where it has been loaded by
  + the boot loader, whichever is higher.
  +
  + For embedded machines with very small memory (less than
  + about 24 MB) this may need to be adjusted downward from the
  + default value of 0x100 (16 MB).
 
 But in the above you say that default is 0x40 if EMBEDDED
 That is in contradiction with the help - no?

Yes, the help text is wrong.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: kdump hangs up by Sysrq+C trigger in high load.

2008-12-16 Thread Ingo Molnar

* Yoshihiro Takahashi ytakaha...@miraclelinux.com wrote:

 Hi.
 
 kdump hangs up by Sysrq+C trigger once in about 10 times in high load. 
 After the above occurs, kdump cannot collect vmcores with NMI button. 
 When waiting_for_crash_ipi does case more than 0, mdelay seem to make a 
 stall.
 
 While nmi_shootdown_cpus(), mdelay may not work since interruption is 
 disable.  (ex. when crashing kernel by NMI interruption, all 
 interruption may be disabled.).
 
 So I think that it should use other way (TSC) instead of mdelay() in 
 nmi_shootdown_cpus().

mdelay uses either the tsc or CPU loops - neither stop while IRQs are 
disabled.

Could you figure out exactly why it hangs in your case?

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/8] Make nmi_shootdown_cpus() usable by non-kdump code

2008-11-12 Thread Ingo Molnar

* Eduardo Habkost [EMAIL PROTECTED] wrote:

 Hi, Ingo,
 
 As tip/master is a moving target, I am splitting the previous 
 kdump/reboot virtualization-disable code series[1] into smaller 
 series so the simpler parts can be included sooner. This first 
 series is just for making nmi_shootdown_cpus() usable by non-kdump 
 code, by moving their generic parts to reboot.c.

applied them to tip/x86/crashdump, thanks Eduardo!

this branch btw. is an append-only non-rebasing tree that is not a 
moving target, you can access it via:

  http://people.redhat.com/mingo/tip.git/README

and then do:

  git checkout -b x86/crashdump tip/x86/crashdump

it has the following changes queued up currently:

Eduardo Habkost (8):
  x86 kdump: extract kdump-specific code from crash_nmi_callback()
  x86 kdump: move crashing_cpu assignment to nmi_shootdown_cpus()
  x86 kdump: create kdump_nmi_shootdown_cpus()
  x86 kdump: make kdump_nmi_callback() a function ptr on 
crash_nmi_callback()
  x86 kdump: make nmi_shootdown_cpus() non-static
  x86: move nmi_shootdown_cpus() to reboot.c
  x86: make nmi_shootdown_cpus() available on !SMP and !X86_LOCAL_APIC
  x86: disable IRQs before doing anything on nmi_shootdown_cpus()

Huang Ying (3):
  kexec/i386: remove PAGE_SIZE alignment from relocate_kernel
  kexec/i386: allocate page table pages dynamically
  kexec/i386: setup kexec page table in C

if you prefer less flux then you can send patches against that branch, 
we'll sort out interaction with other trees.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 15/15] Revert x86: default to reboot via ACPI

2008-11-06 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

  Andrey Borzenkov's patch, for example, adds a new DMI entry 
  because reboot=acpi breaks his keyboard (even without KVM, I 
  guess). Andrey, was that the case?
 
 hm, IIRC the problem was KVM in his case too.

actually, Andrey's problem seems to be unrelated. So i've queued up 
the revert below in tip/x86/urgent for v2.6.28. Thanks guys!

Ingo


From 8d00450d296dedec9ada38d43b83e79cca6fd5a3 Mon Sep 17 00:00:00 2001
From: Eduardo Habkost [EMAIL PROTECTED]
Date: Tue, 4 Nov 2008 12:52:44 -0200
Subject: [PATCH] Revert x86: default to reboot via ACPI

This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.

the assumptio of this change was that this would not break
any existing machine. Andrey Borzenkov reported troubles with
the ACPI reboot method: the system would hang on reboot, necessiating
a power cycle. Probably more systems are affected as well.

Also, there are patches queued up for v2.6.29 to disable virtualization
on emergency_restart() - which was the original motivation of
this change.

Reported-by: Andrey Borzenkov [EMAIL PROTECTED]
Bisected-by: Andrey Borzenkov [EMAIL PROTECTED]
Signed-off-by: Eduardo Habkost [EMAIL PROTECTED]
Acked-by: Avi Kivity [EMAIL PROTECTED]
Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 arch/x86/kernel/reboot.c |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f4c93f1..724adfc 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -29,11 +29,7 @@ EXPORT_SYMBOL(pm_power_off);
 
 static const struct desc_ptr no_idt = {};
 static int reboot_mode;
-/*
- * Keyboard reset and triple fault may result in INIT, not RESET, which
- * doesn't work when we're in vmx root mode.  Try ACPI first.
- */
-enum reboot_type reboot_type = BOOT_ACPI;
+enum reboot_type reboot_type = BOOT_KBD;
 int reboot_force;
 
 #if defined(CONFIG_X86_32)  defined(CONFIG_SMP)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 00/14] x86: disable virt on kdump and emergency_restart

2008-11-05 Thread Ingo Molnar

* Avi Kivity [EMAIL PROTECTED] wrote:

 Eduardo Habkost wrote:
 Hi,

 This is a new version of the series to disabling virtualization on kdump,
 now extended to do the same tricks on emergency_restart() if needed.
   

 Looks good.  If you me to push it upstream, I'll need kexec/kdump 
 acks.  Otherwise, ack for the kvm bits.

general ack for the x86 bits, but i'm not sure whether we should be 
pushing this upstream so late in the cycle. If we do it in the next 
cycle then it's best we do it in the x86 tree, the KVM impact seems 
much smaller than the general x86 impact.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 15/15] Revert x86: default to reboot via ACPI

2008-11-05 Thread Ingo Molnar

* Eduardo Habkost [EMAIL PROTECTED] wrote:

 This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.
 
 Now that we have the hooks to disable virtualization on
 emergency_restart(), we can get back to the BOOT_KBD reboot_type default.
 
 Signed-off-by: Eduardo Habkost [EMAIL PROTECTED]

hm, why revert this? There's nothing wrong with the ACPI reboot 
method, it's just that we surprise the BIOS by exiting with an unclean 
VMX state in certain circumstances.

if the ACPI reboot method does not work we do the KBD method as the 
next thing in the reboot chain.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/3 -v4] kexec/i386: kexec page table code clean up

2008-10-31 Thread Ingo Molnar

* Huang Ying [EMAIL PROTECTED] wrote:

 This patchset cleans up page table setup code of kexec on i386.
 
 This patchset is based on v2.6.28-rc2-338-g65fc716 and has been 
 tested on i386.

applied to tip/x86/crashdump:

 9868ee6: kexec/i386: setup kexec page table in C
 92be3d6: kexec/i386: allocate page table pages dynamically
 31498a0: kexec/i386: remove PAGE_SIZE alignment from relocate_kernel

thanks! They are nice cleanups - both the .text reduction and the 
linecount reduction is nice:

 4 files changed, 102 insertions(+), 152 deletions(-)

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86: make oops_begin and oops_end equal

2008-10-22 Thread Ingo Molnar

* Neil Horman [EMAIL PROTECTED] wrote:

  Thanks for the review. I've sent a redone patch series just a moment 
  ago, based on your comments. There was also another problem with 
  these two patches: oops_end(flags, regs, signr) had special 
  behaviour for regs=NULL that I did not consider before. The series 
  has grown due to this issue...

[...]
 Copy that.  Thanks for the quick turn-around.

applied to tip/x86/dumpstack, thanks guys!

Could you please also create arch/x86/kernel/dumpstack.c and move all 
the now-unified bits there - before they get out of sync again?

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec jump: fix compiling warning on xchg(kexec_lock, 0) in kernel_kexec()

2008-08-13 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

  instead? Not that that's really right either, but at least it avoids 
  the _ridiculous_ crap. The real solution is probably to use a 
  spinlock and trylock/unlock.
 
 Or test_and_set_bit().  That's what I've been saying too, only 
 differently ;)
 
 But cleaning up the long-standing silly usage of xchg() is a different 
 activity from suppressing this recently-added compile warning.

actually, in this case i disagree: the warning here is a canary that 
there's something wrong about this code - i.e. gcc is _right_ about 
warning us. The warning is also totally harmless - the warning shows us 
the suckiness of the code structure - and squashing the warning doesnt 
fix that.

So im coal-mine analogies, i disagree with squashing the canary, we 
should find and fix the shaft that emits the smelly methane instead ;-)

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/5] x86: Define elfcorehdr_addr in arch dependent section

2008-07-31 Thread Ingo Molnar

* Vivek Goyal [EMAIL PROTECTED] wrote:

 o Move elfcorehdr_addr definition in arch dependent crash dump file. This is
   equivalent to defining elfcorehdr_addr under CONFIG_CRASH_DUMP instead of
   CONFIG_PROC_VMCORE. This is needed by is_kdump_kernel() which can be
   used irrespective of the fact whether CONFIG_PROC_VMCORE is enabled or
   not.
 
 Signed-off-by: Vivek Goyal [EMAIL PROTECTED]
 ---
 
  arch/x86/kernel/crash_dump_32.c |3 +++
  arch/x86/kernel/crash_dump_64.c |3 +++
  arch/x86/kernel/setup.c |8 +++-
  3 files changed, 13 insertions(+), 1 deletion(-)

the x86 bits look fine to me.

Acked-by: Ingo Molnar [EMAIL PROTECTED]

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [patch] crashdump: fix undefined reference to `elfcorehdr_addr'

2008-07-28 Thread Ingo Molnar

* Simon Horman [EMAIL PROTECTED] wrote:

 On Mon, Jul 28, 2008 at 12:45:28PM +1000, Simon Horman wrote:
  The rfc patch I made to rename is_kdump_kernel to kernel_has_vmcore,
  which I appended to my previous post should have looked more like the
  following. Although, as I noted in my previous post, its more a starting
  point for discussion than a solution to the problem at hand.
 
 Sorry, one more time. I forgot to quilt refresh.

doesnt apply cleanly to latest -git:

 Hunk #1 FAILED at 2070.
 1 out of 1 hunk FAILED -- saving rejects to file 
arch/ia64/hp/common/sba_iommu.c.rej

due to a crossing change i guess. Also, i guess this should go via -mm 
as it touches fs/proc/vmcore.c and include/linux/crash_dump.h. The x86 
bits look good to me.

 Acked-by: Ingo Molnar [EMAIL PROTECTED]

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86: Move dma32_reserve_bootmem() after reserve_crashkernel()

2008-07-18 Thread Ingo Molnar

* Bernhard Walle [EMAIL PROTECTED] wrote:

 On a x86-64 machine (nothing special I could encounter) I had the 
 problem that crashkernel reservation with the usual [EMAIL PROTECTED] 
 failed. 
 While debugging that, I encountered that dma32_reserve_bootmem() 
 reserves a memory region which is in that area.
 
 Because dma32_reserve_bootmem() does not rely on a specific offset but 
 crashkernel does, it makes sense to move the dma32_reserve_bootmem() 
 reservation down a bit. I tested that patch and it works without 
 problems. I don't see any negative effects of that move, but maybe I 
 oversaw something ...
 
 While we strictly don't need that patch in 2.6.27 because we have the 
 automatic, dynamic offset detection, it makes sense to also include it 
 here because:
 
   - it's easier to get it in -stable then,
   - many people are still used to the '[EMAIL PROTECTED]' syntax,
   - not everybody may be using a reloatable kernel.

applied to tip/x86/crashdump, thanks Bernhard.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86: Find offset for crashkernel reservation automatically

2008-07-03 Thread Ingo Molnar

* Bernhard Walle [EMAIL PROTECTED] wrote:

   But in general policy should go in userspace (if possible), so I 
   agree with you that kexec-tools can handle that.
  
  At a quick skim the patch looks good.  I thought I had initially 
  implemented the code to work this way but apparently in all of the 
  churn that aspect of it got lost.
  
  Let's start with trying to allocate the memory as low as possible 
  for a crash kernel (I think you are already doing that).
 
 Yes, in 95 % of the cases the crashkernel still gets reserved at 16M.

ok - i've applied it to tip/x86/crashdump - thanks Bernhard.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86: Fix documentation bug about relocatability

2008-07-03 Thread Ingo Molnar

* Bernhard Walle [EMAIL PROTECTED] wrote:

 This patch fixes a small bug in documentation: x86_64 also has now the 
 ability to build a relocatable kernel.

applied to tip/x86/crashdump - thanks Bernhard.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] Remove CONFIG_EXPERIMENTAL from kdump

2008-06-26 Thread Ingo Molnar

* Bernhard Walle [EMAIL PROTECTED] wrote:

 I would suggest to remove the experimental status from Kdump. Kdump 
 is now in the kernel since a long time and used by Enterprise 
 distributions. I don't think that experimental is true any more.

agreed. I have applied your patch to tip/x86/crashdump (separate topic 
due to its fs/Kconfig impact).

Thanks Bernhard,

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86: Limit E820 map when a user-defined memory map is specified

2008-06-26 Thread Ingo Molnar

* Yinghai Lu [EMAIL PROTECTED] wrote:

 On Wed, Jun 25, 2008 at 12:39 PM, Bernhard Walle [EMAIL PROTECTED] wrote:
  This patch brings back limiting of the E820 map when a user-defined
  E820 map is specified. While the behaviour of i386 (32 bit) was to limit
  the E820 map (and /proc/iomem), the behaviour of x86-64 (64 bit) was not to
  limit.
 
  That patch limits the E820 map again for both x86 architectures.
 
  Code was tested for compilation and booting on a 32 bit and 64 bit system.
 
 
  Signed-off-by: Bernhard Walle [EMAIL PROTECTED]
 
 Acked-by: Yinghai Lu [EMAIL PROTECTED]

applied to tip/x86/unify-setup - thanks.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [patch] [PATCH] Use BOOTMEM_EXCLUSIVE for i386

2008-06-18 Thread Ingo Molnar

* Bernhard Walle [EMAIL PROTECTED] wrote:

 This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also 
 for i386 and prints a error message on failure.
 
 The patch is still for 2.6.26 since it is only bug fixing. The 
 unification of reserve_crashkernel() between i386 and x86_64 should be 
 done for 2.6.27.

applied to tip/x86/urgent, thanks Bernhard.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path

2008-02-07 Thread Ingo Molnar

* Neil Horman [EMAIL PROTECTED] wrote:

 Ingo noted a few posts down the nmi_exit doesn't actually write to the 
 APIC EOI register, so yeah, I agree, its bogus (and I apologize, I 
 should have checked that more carefully).  Nevertheless, this patch 
 consistently allowed a hangning machine to boot through an Nmi lockup.  
 So I'm forced to wonder whats going on then that this patch helps 
 with.  perhaps its a just a very fragile timing issue, I'll need to 
 look more closely.

try a dummy iret, something like:

  asm volatile (pushf; push $1f; iret; 1: \n);

to get the CPU out of its 'nested NMI' state. (totally untested)

the idea is to push down an iret frame to the kernel stack that will 
just jump to the next instruction and gets it out of the NMI nesting. 
Note: interrupts will/must still be disabled, despite the iret. (the 
ordering of the pushes might be wrong, we might need more than that for 
a valid iret, etc. etc.)

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path

2008-02-06 Thread Ingo Molnar

* Neil Horman [EMAIL PROTECTED] wrote:

   if (!user_mode_vm(regs)) {
 + nmi_exit();
 + local_irq_enable();
   current-thread.trap_no = 2;
   crash_kexec(regs);

looks good to me, but please move the local_irq_enable() to within 
crash_kexec() instead - probably inside the got the kexec lock 
section. That makes crash_kexec() use generally safer too i guess: right 
it seems that die() too can call crash_kexec() with irqs disabled - and 
can thus hang in smp_send_stop() [or wherever it hung before].

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path

2008-02-06 Thread Ingo Molnar

* Vivek Goyal [EMAIL PROTECTED] wrote:

 On Wed, Feb 06, 2008 at 11:00:01PM +0100, Ingo Molnar wrote:
  
  * Neil Horman [EMAIL PROTECTED] wrote:
  
 if (!user_mode_vm(regs)) {
   + nmi_exit();
   + local_irq_enable();
 current-thread.trap_no = 2;
 crash_kexec(regs);
  
  looks good to me, but please move the local_irq_enable() to within 
  crash_kexec() instead - probably inside the got the kexec lock 
  section. That makes crash_kexec() use generally safer too i guess: right 
  it seems that die() too can call crash_kexec() with irqs disabled - and 
  can thus hang in smp_send_stop() [or wherever it hung before].
  
 
 In general, I think we should not be servicing interrupts once the 
 system has crashed and crash_kexec() has been invoked.
 
 In fact, right now machine_crash_shutdown() explicity disables 
 interrupt before sending NMIs to other cpus to stop these cpus and 
 which makes sense to me.
 
 I am wondering if interrupts are disabled on crashing cpu or if 
 crashing cpu is inside die_nmi(), how would it stop/prevent delivery 
 of NMI IPI to other cpus.
 
 Am I missing something obivious?

i wondered about that too. kexec should be as atomic as it can be - 
enabling interrupts only opens up a window for another crash (more 
memory corruption, etc. etc) to happen.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path

2008-02-06 Thread Ingo Molnar

* H. Peter Anvin [EMAIL PROTECTED] wrote:

 I am wondering if interrupts are disabled on crashing cpu or if 
 crashing cpu is inside die_nmi(), how would it stop/prevent delivery 
 of NMI IPI to other cpus.

 I don't see how it would.

cross-CPU IPIs are a bit fragile on some PC platforms. So if the kexec 
code relies on getting IPIs to all other CPUs, it might not be able to 
do it reliably. There might be limitations on how many APIC irqs there 
can be queued at a time, and if those slots are used up and the CPU is 
not servicing irqs then stuff gets retried. This might even affect NMIs 
sent via APIC messages - not sure about that.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path

2008-02-06 Thread Ingo Molnar

* Eric W. Biederman [EMAIL PROTECTED] wrote:

 Looking at the patch the local_irq_enable() is totally bogus.  As soon 
 was we hit machine_crash_shutdown the first thing we do is disable 
 irqs.

yeah.

 I'm wondering if someone was using the switch cpus on crash patch that 
 was floating around.  That would require the ipis to work.
 
 I don't know if nmi_exit makes sense.  There are enough layers of 
 abstraction in that piece of code I can't quickly spot the part that 
 is banging the hardware.
 
 The location of nmi_exit in the patch is clearly wrong.  crash_kexec 
 is a noop if we don't have a crash kernel loaded (and if we are not 
 the first cpu into it), so if we don't execute the crash code 
 something weird may happen.  Further the code is just more 
 maintainable if that kind of code lives in machine_crash_shutdown.

nmi_exit() has no hw effects - it's just our own bookeeping.

the hw knows that we finished the NMI when we do an iret. Perhaps that's 
the bug or side-effect that made the difference: via enabling irqs we 
get an irq entry, and that does an iret and clears the NMI nested state 
- allowing the kexec context to proceed? I suspect kexec() will do an 
iret eventually (at minimum in the booted up kernel's context) - all 
NMIs are blocked up to that point and maybe the APIC doesnt really like 
being frobbed in that state? In any case, the local_irq_enable() is just 
wrong - it's the worst thing a crashing kernel can do. Perhaps doing an 
intentional iret with a prepared stack-let that just restores to 
still-irqs-off state and jumps to the next instruction could 'exit' the 
NMI context without really having to exit it in the kernel code flow?

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] i386: handle an initrd in highmem (version 2)

2008-01-08 Thread Ingo Molnar

* H. Peter Anvin [EMAIL PROTECTED] wrote:

 Fix crash on NUMA reported by Dhaval Giani (reported as being a kexec 
 issue.)

here's the delta fix.

Ingo

Index: linux-x86.q/arch/x86/kernel/setup_32.c
===
--- linux-x86.q.orig/arch/x86/kernel/setup_32.c
+++ linux-x86.q/arch/x86/kernel/setup_32.c
@@ -510,16 +510,20 @@ static void __init reserve_initrd(void)
unsigned long end_of_lowmem = max_low_pfn  PAGE_SHIFT;
unsigned long ramdisk_here;
 
+   initrd_start = 0;
+
+   if (!boot_params.hdr.type_of_loader ||
+   !ramdisk_image || !ramdisk_size)
+   return; /* No initrd provided by bootloader */
+
if (ramdisk_end  ramdisk_image) {
-   printk(KERN_ERR initrd wraps around end of memory\n
-  KERN_ERR disabling initrd\n);
-   initrd_start = 0;
+   printk(KERN_ERR initrd wraps around end of memory, 
+  disabling initrd\n);
return;
}
if (ramdisk_size = end_of_lowmem/2) {
-   printk(KERN_ERR initrd too large to handle\n
-  KERN_ERR disabling initrd\n);
-   initrd_start = 0;
+   printk(KERN_ERR initrd too large to handle, 
+  disabling initrd\n);
return;
}
if (ramdisk_end = end_of_lowmem) {
@@ -554,9 +558,10 @@ static void __init relocate_initrd(void)
char *p, *q;
 
if (!do_relocate_initrd)
-   return; /* Nothing to do here... */
+   return;
 
ramdisk_here = initrd_start - PAGE_OFFSET;
+
q = (char *)initrd_start;
 
/* Copy any lowmem portion of the initrd */
@@ -642,11 +647,10 @@ void __init setup_bootmem_allocator(void
 */
find_smp_config();
 #endif
-   numa_kva_reserve();
 #ifdef CONFIG_BLK_DEV_INITRD
-   if (boot_params.hdr.type_of_loader  boot_params.hdr.ramdisk_image)
-   reserve_initrd();
+   reserve_initrd();
 #endif
+   numa_kva_reserve();
reserve_crashkernel();
 }
 
@@ -799,12 +803,13 @@ void __init setup_arch(char **cmdline_p)
/*
 * NOTE: at this point the bootmem allocator is fully available.
 */
-   paravirt_post_allocator_init();
 
 #ifdef CONFIG_BLK_DEV_INITRD
relocate_initrd();
 #endif
 
+   paravirt_post_allocator_init();
+
dmi_scan_machine();
 
io_delay_init();
Index: linux-x86.q/arch/x86/mm/discontig_32.c
===
--- linux-x86.q.orig/arch/x86/mm/discontig_32.c
+++ linux-x86.q/arch/x86/mm/discontig_32.c
@@ -288,8 +288,8 @@ unsigned long __init setup_memory(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
/* Numa kva area is below the initrd */
-   if (boot_params.hdr.type_of_loader  boot_params.hdr.ramdisk_image)
-   kva_start_pfn = PFN_DOWN(boot_params.hdr.ramdisk_image)
+   if (initrd_start)
+   kva_start_pfn = PFN_DOWN(initrd_start - PAGE_OFFSET)
- kva_pages;
 #endif
kva_start_pfn -= kva_start_pfn  (PTRS_PER_PTE-1);

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] i386: handle an initrd in highmem (version 2)

2008-01-08 Thread Ingo Molnar

* H. Peter Anvin [EMAIL PROTECTED] wrote:

 Fix crash on NUMA reported by Dhaval Giani (reported as being a kexec 
 issue.)

thanks, applied.

Ingo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: kexec refuses to boot latest -mm

2008-01-06 Thread Ingo Molnar

* Dhaval Giani [EMAIL PROTECTED] wrote:

 So I went ahead and bisected -mm, and the culprit is git-x86. It boots 
 fine before it, but with git-x86 applied, it fails to boot.
 
 Ingo/Thomas, could you please point me to the git-x86 tree so that I 
 can bisect it? (with instructions on how to pull the -mm branch, I 
 managed to pull the master branch, but not the -mm branch)

sure:

--{ x86.git instructions }--

 git-clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 
linux-2.6.git
 cd linux-2.6.git
 git-branch x86
 git-checkout x86
 git-pull git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git mm
 git-log [EMAIL PROTECTED] # see what's in #mm

(do subsequent pulls via git-pull --force, as we frequently rebase the
git tree. NOTE: this might override your own local changes, so do this
only if you dont mind about losing thse changes in that tree.)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec