Re: [PATCHv3 06/14] x86/kexec: Keep CR4.MCE set during kexec for TDX guest

2023-11-15 Thread Peter Zijlstra
On Wed, Nov 15, 2023 at 03:00:36PM +0300, Kirill A. Shutemov wrote:
> TDX guests are not allowed to clear CR4.MCE. Attempt to clear it leads
> to #VE.
> 
> Use alternatives to keep the flag during kexec for TDX guests.
> 
> The change doesn't affect non-TDX-guest environments.
> 
> Signed-off-by: Kirill A. Shutemov 
> Reviewed-by: Kai Huang 
> ---
>  arch/x86/kernel/relocate_kernel_64.S | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S 
> b/arch/x86/kernel/relocate_kernel_64.S
> index 56cab1bb25f5..bea89814b48e 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -145,11 +145,16 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>* Set cr4 to a known state:
>*  - physical address extension enabled
>*  - 5-level paging, if it was enabled before
> +  *  - Machine check exception on TDX guest. Clearing MCE is not allowed
> +  *in TDX guests.
>*/
>   movl$X86_CR4_PAE, %eax
>   testq   $X86_CR4_LA57, %r13
>   jz  1f
>   orl $X86_CR4_LA57, %eax
> +1:
> + ALTERNATIVE "jmp 1f", "", X86_FEATURE_TDX_GUEST
> + orl $X86_CR4_MCE, %eax
>  1:

ALTERNATIVE "", "orl $X86_CR4_MCE, %eax", X86_FEATURE_TDX_GUEST

?

>   movq%rax, %cr4
>  
> -- 
> 2.41.0
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 06/10] scheduler: Remove the now superfluous sentinel elements from ctl_table array

2023-11-07 Thread Peter Zijlstra
On Tue, Nov 07, 2023 at 02:45:06PM +0100, Joel Granados via B4 Relay wrote:
> From: Joel Granados 
> 
> This commit comes at the tail end of a greater effort to remove the
> empty elements at the end of the ctl_table arrays (sentinels) which
> will reduce the overall build time size of the kernel and run time
> memory bloat by ~64 bytes per sentinel (further information Link :
> https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)
> 
> rm sentinel element from ctl_table arrays
> 
> Signed-off-by: Joel Granados 
> ---
>  kernel/sched/autogroup.c | 1 -
>  kernel/sched/core.c  | 1 -
>  kernel/sched/deadline.c  | 1 -
>  kernel/sched/fair.c  | 1 -
>  kernel/sched/rt.c| 1 -
>  kernel/sched/topology.c  | 1 -
>  6 files changed, 6 deletions(-)

Acked-by: Peter Zijlstra (Intel) 



Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Peter Zijlstra
On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2d6d790d9bed..6c7c70bf50dd 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -491,7 +491,13 @@ struct vm_area_struct {
>* See vmf_insert_mixed_prot() for discussion.
>*/
>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;

We have __private and ACCESS_PRIVATE() to help with enforcing this.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] Revert "x86/apic/x2apic: Implement IPI shorthands support"

2022-12-20 Thread Peter Zijlstra
On Tue, Dec 20, 2022 at 01:34:58PM +0800, Baoquan He wrote:
> This reverts commit 43931d350f30c6cd8c2f498d54ef7d65750abc92.
> 
> On kvm guest with 4 cpus deployed, when adding 'nr_cpus=2' to normal
> kernel's cmdline, and triggering crash to jump to kdump kernel, kdump
> kernel will stably hang. Reverting commit 43931d350f30 ("x86/apic/x2apic:
> Implement IPI shorthands support") can fix it.
> 
> The problem will disappear if removing 'nr_cpus=2' from normal kerne's
> cmdline.

And the root cause for this is... ? Does the kvm x2apic emulation
somehow get upset when we shorthand CPUs that haven't been initialized?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v6 1/2] x86/kexec: Build identity mapping for EFI systab and ACPI tables

2019-05-14 Thread Peter Zijlstra
On Tue, May 14, 2019 at 08:58:35PM +0800, Dave Young wrote:

> Hmm, it seems caused by some WIP branch patches, I suspect below:

Grmbl.. Ingo, can you zap all those WIP branches, please? They mostly
just get in the way of things. If you want to run them, merge them in a
private branch or something.

> commit 124d6af5a5f559e516ed2c6ea857e889ed293b43
> x86/paravirt: Standardize 'insn_buff' variable names
> 
> The suspicious line is "per_cpu(insn_buff, cpu) = insn_buff;"

Yah, unfortunatly per-cpu variables live in the same namespace as normal
variables and so the above is incorrect, because the local @insn_buffer
variable shadows the global per-cpu symbol and very weird things will
happen.

This is of course consistent with C rules, where everything lives in the
same namespace...


Re: [PATCH v6 1/2] x86/kexec: Build identity mapping for EFI systab and ACPI tables

2019-05-14 Thread Peter Zijlstra
On Tue, May 14, 2019 at 04:48:41PM +0800, Dave Young wrote:

> > I did some tests on the laptop,  thing is:
> > 1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
> >on latest Linus master branch, everything works fine.
> > 
> > 2. build and test the tip/next-merge-window branch, kernel hangs early
> > without output, (both 1st boot and kexec boot)
> 
> Update about 2.  It should be not early rsdp related, I got the boot log
> Since can not reproduce with Linus master branch it may have been fixed.

Nothing was changed here since PTI.

> [0.685374][T1] rcu: Hierarchical SRCU implementation.
> [0.686414][T1] general protection fault:  [#1] SMP PTI
> [0.687328][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc6+ 
> #877
> [0.687328][T1] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET82WW 
> (1.52 ) 06/04/2018
> [0.687328][T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450

> [0.687328][T1] Call Trace:
> [0.687328][T1]  ? hardlockup_detector_event_create+0x50/0x50
> [0.687328][T1]  x86_reserve_hardware+0x173/0x180
> [0.687328][T1]  x86_pmu_event_init+0x39/0x220

The DS buffers are special in that they're part of cpu_entrt_area. If
this comes apart it might mean your pagetables are dodgy.


Re: [PATCH 09/13] Move bp_type_idx to kernel/event/hw_breakpoint.c

2015-11-09 Thread Peter Zijlstra
On Wed, Nov 04, 2015 at 12:41:06PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 03, 2015 at 11:46:30AM -0800, Palmer Dabbelt wrote:
> > This has a "#ifdef CONFIG_*" that used to be exposed to userspace.
> > 
> > The names in here are so generic that I don't think it's a good idea
> > to expose them to userspace (or even the rest of the kernel).  Since
> > there's only one kernel user, it's been moved to that file.
> > 
> > Signed-off-by: Palmer Dabbelt 
> > Reviewed-by: Andrew Waterman 
> > Reviewed-by: Albert Ou 
> 
> Assuming you want to keep all these patches together in a tree or so.
> Let me know if you want me to take this patch.
> 
> Acked-by: Peter Zijlstra (Intel) 

Ah, build-bot finds your change is broken and you didn't grep right. It
is used in more places.

How about moving it to include/linux/hw_breakpoint.h, instead of having
it in the uapi crud?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 09/13] Move bp_type_idx to kernel/event/hw_breakpoint.c

2015-11-09 Thread Peter Zijlstra
On Tue, Nov 03, 2015 at 11:46:30AM -0800, Palmer Dabbelt wrote:
> This has a "#ifdef CONFIG_*" that used to be exposed to userspace.
> 
> The names in here are so generic that I don't think it's a good idea
> to expose them to userspace (or even the rest of the kernel).  Since
> there's only one kernel user, it's been moved to that file.
> 
> Signed-off-by: Palmer Dabbelt 
> Reviewed-by: Andrew Waterman 
> Reviewed-by: Albert Ou 

Assuming you want to keep all these patches together in a tree or so.
Let me know if you want me to take this patch.

Acked-by: Peter Zijlstra (Intel) 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-09-30 Thread Peter Zijlstra
On Thu, Oct 01, 2015 at 02:33:18AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > > This patch introduces new boot option "noextnmi" which disables
> > > external NMI.  This option is useful for the dump capture kernel
> > > so that an HA application or administrator wouldn't mistakenly
> > > shoot down the kernel by NMI.
> > 
> > So that they can get really stuck when the crash kernel crashes, right?
> > ;-)
> 
> No, it is different from my intention.
> 
> `mistakenly' in the above means; they issue NMI due to a misconception
> that the monitored host is stuck in the 1st kernel while it is actually
> taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
> there is a tool such as fence_kdump which notifies "I'm taking a crash
> dump, so don't send NMI" to the HA clustering software.  However, there
> is a time window between kernel panic and the notification.
> 
> "noextnmi" allows users to avoid this kind of accident all the time of
> 2nd kernel.

Yes yes, I understand. But if the crash kernel also gets stuck they have
no means of recovery, right? (other than power cycling the hardware)

Just playing devils advocate here, I don't actually object to the patch.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-09-30 Thread Peter Zijlstra
On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> This patch introduces new boot option "noextnmi" which disables
> external NMI.  This option is useful for the dump capture kernel
> so that an HA application or administrator wouldn't mistakenly
> shoot down the kernel by NMI.

So that they can get really stuck when the crash kernel crashes, right?
;-)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-30 Thread Peter Zijlstra
On Mon, Sep 28, 2015 at 07:08:19AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
> >   atomic_xchg(&panic_cpu, -1);
> >   ^
> 
> I changed to use atomic_xchg() instead of atomic_set() in V3
> because atomic_set() doesn't mean memory barrier.  However,
> I thought again and there is no need of barrier; there is no
> problem if a competitor sees old value of panic_cpu or new one.
> So, atomic_set() is sufficient and using it will remove this warning.
> 
> I will resend the fixed version later.

So if you rely on the memory barrier; you should have also put a comment
on explaining the ordering requirements.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-09-30 Thread Peter Zijlstra
On Fri, Sep 25, 2015 at 08:28:07PM +0900, Hidehiro Kawai wrote:
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -718,6 +718,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
>  static nmi_shootdown_cb shootdown_callback;
>  
>  static atomic_t waiting_for_crash_ipi;
> +static int crash_ipi_done;
>  
>  static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
>  {
> @@ -779,6 +780,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
>   wmb();
>  
>   smp_send_nmi_allbutself();
> + crash_ipi_done = 1; /* Kick cpus looping in nmi context */

I would suggest using WRITE_ONCE() for that, because without the
volatile the compiler need not actually emit the store until after the
whole waiting thing _IF_ it can inline the whole thing.

Currently udelay() will end up being a function call and will therefore
force the store to be emitted, but I'd rather not rely on that.

>  
>   msecs = 1000; /* Wait at most a second for the other cpus to stop */
>   while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-09-30 Thread Peter Zijlstra
On Fri, Sep 25, 2015 at 12:13:55PM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
> previoius discussion, but it still exists.  So, I didn't change
> anything about panic_on_unrecovered_nmi.
> 

> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const 
> > char *name)
> >  #endif
> > 
> > if (panic_on_unrecovered_nmi)
> > -   panic("NMI: Not continuing");
> > +   nmi_panic("NMI: Not continuing");
> > 
> > pr_emerg("Dazed and confused, but trying to continue\n");
> > 

I was looking at unregister_nmi_handler() because that's the function
the diff points to. That still doesn't have panic_on_unrecovered_nmi.

It looks like your diff tool is 'broken' and generates nonsense function
data.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 09/13] Move bp_type_idx to kernel/event/hw_breakpoint.c

2015-09-16 Thread Peter Zijlstra
On Tue, Sep 15, 2015 at 11:40:11AM -0700, Palmer Dabbelt wrote:
> On Tue, 15 Sep 2015 01:06:07 PDT (-0700), pet...@infradead.org wrote:
> > On Mon, Sep 14, 2015 at 03:50:43PM -0700, Palmer Dabbelt wrote:
> >> This has a "#ifdef CONFIG_*" that used to be exposed to userspace.
> >>
> >> The names in here are so generic that I don't think it's a good idea
> >> to expose them to userspace (or even the rest of the kernel).  Since
> >> there's only one kernel user, it's been moved to that file.
> >>
> >> Signed-off-by: Palmer Dabbelt 
> >> Reviewed-by: Andrew Waterman 
> >> Reviewed-by: Albert Ou 
> >> ---
> >>  include/uapi/linux/hw_breakpoint.h | 10 --
> >>  kernel/events/hw_breakpoint.c  | 10 ++
> >>  2 files changed, 10 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/include/uapi/linux/hw_breakpoint.h 
> >> b/include/uapi/linux/hw_breakpoint.h
> >> index b04000a2296a..7a6a5a7f9511 100644
> >> --- a/include/uapi/linux/hw_breakpoint.h
> >> +++ b/include/uapi/linux/hw_breakpoint.h
> >> @@ -17,14 +17,4 @@ enum {
> >>HW_BREAKPOINT_INVALID   = HW_BREAKPOINT_RW | HW_BREAKPOINT_X,
> >>  };
> >>
> >> -enum bp_type_idx {
> >> -  TYPE_INST   = 0,
> >> -#ifdef CONFIG_HAVE_MIXED_BREAKPOINTS_REGS
> >> -  TYPE_DATA   = 0,
> >> -#else
> >> -  TYPE_DATA   = 1,
> >> -#endif
> >> -  TYPE_MAX
> >> -};
> >
> > This is rather unfortunate; you are correct that the naming is too
> > generic (and I tend to agree), but I think these values are required by
> > userspace to fill out:
> >
> >   perf_event_attr::bp_type
> >
> > So removing them will break things.
> >
> > Frederic?
> 
> perf_event_open(2) says
> 
>bp_type (since Linux 2.6.33)
>   This chooses the breakpoint type.  It is one of:
> 
>   HW_BREAKPOINT_EMPTY
>  No breakpoint.
> 
>   HW_BREAKPOINT_R
>  Count when we read the memory location.
> 
>   HW_BREAKPOINT_W
>  Count when we write the memory location.
> 
>   HW_BREAKPOINT_RW
>  Count when we read or write the memory location.
> 
>   HW_BREAKPOINT_X
>  Count when we execute code at the memory location.
> 
>   The values can be combined via a bitwise or, but the combination
>   of HW_BREAKPOINT_R or HW_BREAKPOINT_W  with  HW_BREAKPOINT_X  is
>   not allowed.
> 
> so I think removing this enum from userspace is OK.  Did I miss
> something?

Nah, could've just been me not being awake. Unless Frederic says
otherwise I'll chalk it up to not having drank enough morning juice.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 09/13] Move bp_type_idx to kernel/event/hw_breakpoint.c

2015-09-16 Thread Peter Zijlstra
On Mon, Sep 14, 2015 at 03:50:43PM -0700, Palmer Dabbelt wrote:
> This has a "#ifdef CONFIG_*" that used to be exposed to userspace.
> 
> The names in here are so generic that I don't think it's a good idea
> to expose them to userspace (or even the rest of the kernel).  Since
> there's only one kernel user, it's been moved to that file.
> 
> Signed-off-by: Palmer Dabbelt 
> Reviewed-by: Andrew Waterman 
> Reviewed-by: Albert Ou 
> ---
>  include/uapi/linux/hw_breakpoint.h | 10 --
>  kernel/events/hw_breakpoint.c  | 10 ++
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/include/uapi/linux/hw_breakpoint.h 
> b/include/uapi/linux/hw_breakpoint.h
> index b04000a2296a..7a6a5a7f9511 100644
> --- a/include/uapi/linux/hw_breakpoint.h
> +++ b/include/uapi/linux/hw_breakpoint.h
> @@ -17,14 +17,4 @@ enum {
>   HW_BREAKPOINT_INVALID   = HW_BREAKPOINT_RW | HW_BREAKPOINT_X,
>  };
>  
> -enum bp_type_idx {
> - TYPE_INST   = 0,
> -#ifdef CONFIG_HAVE_MIXED_BREAKPOINTS_REGS
> - TYPE_DATA   = 0,
> -#else
> - TYPE_DATA   = 1,
> -#endif
> - TYPE_MAX
> -};

This is rather unfortunate; you are correct that the naming is too
generic (and I tend to agree), but I think these values are required by
userspace to fill out:

  perf_event_attr::bp_type

So removing them will break things.

Frederic?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-31 Thread Peter Zijlstra
On Mon, Aug 31, 2015 at 08:53:11AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > I understand your question.  I don't intend to permit the recursive
> > call of crash_kexec() as for 'old_cpu != this_cpu' check.  That is
> > needed for the case of panic() --> crash_kexec().  Since panic_cpu has
> > already been set to this_cpu in panic() (please see PATCH 1/4), no one
> > can run crash_kexec() without 'old_cpu != this_cpu' check.
> > 
> > If you don't like this check, I would also be able to handle this case
> > like below:
> > 
> > crash_kexec()
> > {
> > old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > if (old_cpu != -1)
> > return;
> > 
> > __crash_kexec();
> > }
> > 
> > panic()
> > {
> > atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > __crash_kexec();
> > ...
> > 
> 
> Is that OK?

I suppose so, but I think me getting confused means comments can be
added/improved.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-25 Thread Peter Zijlstra
On Sat, Aug 22, 2015 at 02:35:24AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > From: Peter Zijlstra [mailto:pet...@infradead.org]
> > 
> > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > >  void crash_kexec(struct pt_regs *regs)
> > >  {
> > > + int old_cpu, this_cpu;
> > > +
> > > + /*
> > > +  * `old_cpu == -1' means we are the first comer and crash_kexec()
> > > +  * was called without entering panic().
> > > +  * `old_cpu == this_cpu' means crash_kexec() was called from panic().
> > > +  */
> > > + this_cpu = raw_smp_processor_id();
> > > + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > > + if (old_cpu != -1 && old_cpu != this_cpu)
> > > + return;
> > 
> > This allows recursive calling of crash_kexec(), the Changelog did not
> > mention that. Is this really required?
> 
> What part are you arguing?  Recursive call of crash_kexec() doesn't
> happen.  In the first place, one of the purpose of this patch is
> to prevent a recursive call of crash_kexec() in the following case
> as I stated in the description:
> 
> CPU 0:
>   oops_end()
> crash_kexec()
>   mutex_trylock() // acquired
> 
> io_check_error()
>   panic()
> crash_kexec()
>   mutex_trylock() // failed to acquire
> infinite loop
> 

Yes, but what to we want to do there? It seems to me that is wrong, we
do not want to let a recursive crash_kexec() proceed.

Whereas the condition you created explicitly allows this recursion by
virtue of the 'old_cpu != this_cpu' check.

You changelog does not explain why you want a recursive crash_kexec().

> Also, the logic doesn't change form V1 (although the implementation
> changed), so I didn't add changelogs any more.

I cannot remember V1, nor can any prior patch be relevant.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V3 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-08-20 Thread Peter Zijlstra
On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index d05bd2e..dcd4038 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char 
> *name)
>  #endif
>  
>   if (panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
>  
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  

What tree is this again.. my tree (-tip) doesn't have
panic_on_unrecovered_nmi nonsense.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

2015-08-20 Thread Peter Zijlstra

User-Agent: StGit/0.16

Fwiw, stgit is broken wrt sending email, all your emails have the exact
same timestamp, which means that the emails will be ordered on received
timestamp when threaded and generate the below mess:


 Aug 06 Hidehiro Kawai  (1.9K) [V3 PATCH 0/4] Fix race issues among panic, NMI 
and crash_kexec
 Aug 06 Hidehiro Kawai  (2.4K) ├─>[V3 PATCH 3/4] kexec: Fix race between 
panic() and crash_kexec() called directly
 Aug 06 Hidehiro Kawai  (4.9K) ├─>[V3 PATCH 1/4] panic/x86: Fix re-entrance 
problem due to panic on NMI
 Aug 06 Hidehiro Kawai  (5.3K) ├─>[V3 PATCH 2/4] panic/x86: Allow cpus to save 
registers even if they are looping in NMI context
 Aug 06 Hidehiro Kawai  (2.5K) ├─>[V3 PATCH 4/4] x86/apic: Introduce noextnmi 
boot option



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-08-20 Thread Peter Zijlstra
On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> When cpu-A panics on NMI just after cpu-B has panicked, cpu-A loops
> infinitely in NMI context.  Especially for x86, cpu-B issues NMI IPI
> to other cpus to save their register states and do some cleanups if
> kdump is enabled, but cpu-A can't handle the NMI and fails to save
> register states.
> 
> To solve thie issue, we wait for the timing of the NMI IPI, then
> call the NMI handler which saves register states.

Sorry, I don't follow, what?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-20 Thread Peter Zijlstra
On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
>  void crash_kexec(struct pt_regs *regs)
>  {
> + int old_cpu, this_cpu;
> +
> + /*
> +  * `old_cpu == -1' means we are the first comer and crash_kexec()
> +  * was called without entering panic().
> +  * `old_cpu == this_cpu' means crash_kexec() was called from panic().
> +  */
> + this_cpu = raw_smp_processor_id();
> + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> + if (old_cpu != -1 && old_cpu != this_cpu)
> + return;

This allows recursive calling of crash_kexec(), the Changelog did not
mention that. Is this really required?

> +
>   /* Take the kexec_mutex here to prevent sys_kexec_load
>* running on one cpu from replacing the crash kernel
>* we are using after a panic on a different cpu.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] perf: Clear MSRs on kexec

2015-08-04 Thread Peter Zijlstra
On Tue, Aug 04, 2015 at 07:52:29AM +0200, Jiri Olsa wrote:
> On Mon, Aug 03, 2015 at 11:54:17PM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 03, 2015 at 11:32:28PM +0200, Jiri Olsa wrote:
> > > hi,
> > > I'm getting following message on the kdump kernel start
> > > 
> > >   Broken BIOS detected, complain to your hardware vendor.\
> > >   [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is b0)
> > > 
> > > it seems to be caused by NMI watchdog being configured
> > > and fixed counter values stays in MSRs, which triggers
> > > warning in check_hw_exists and disables perf support
> > > in kdump kernel.. which probably does not hurt ;-)
> > > 
> > > zeroing MSRs during kdump shutdown seems to work (attached)
> > > but I'm not sure thats correct place for kdump perf callback
> > 
> > Right, but why bother? All that kernel needs to do is write a memory
> > dump to someplace and reboot, right? The less you do, the less can go
> > wrong.
> 
> well, I was hunting that 'Broken BIOS..' message which is wrong

Not really. The previous kernel being the BIOS in this case did
leave the counters in a funky state.

> I wouldn't think anyone wants to use perf under kdump kernel,
> but you never know ;-)

Yeah, I think we knew about this back then (might've been 2010) and
chose to not 'fix' it.

http://lkml.iu.edu/hypermail/linux/kernel/1012.1/00380.html

Is what google finds me.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] perf: Clear MSRs on kexec

2015-08-03 Thread Peter Zijlstra
On Mon, Aug 03, 2015 at 11:32:28PM +0200, Jiri Olsa wrote:
> hi,
> I'm getting following message on the kdump kernel start
> 
>   Broken BIOS detected, complain to your hardware vendor.\
>   [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is b0)
> 
> it seems to be caused by NMI watchdog being configured
> and fixed counter values stays in MSRs, which triggers
> warning in check_hw_exists and disables perf support
> in kdump kernel.. which probably does not hurt ;-)
> 
> zeroing MSRs during kdump shutdown seems to work (attached)
> but I'm not sure thats correct place for kdump perf callback

Right, but why bother? All that kernel needs to do is write a memory
dump to someplace and reboot, right? The less you do, the less can go
wrong.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-23 Thread Peter Zijlstra
On Wed, Jul 22, 2015 at 11:14:21AM +0900, Hidehiro Kawai wrote:
> +DEFINE_SPINLOCK(panic_lock);

At the very least this should be a raw spinlock, but wth aren't you
using a simple atomic_xchg() ?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v2 6/8] kexec jump: fix for lockdep

2008-08-10 Thread Peter Zijlstra
On Mon, 2008-08-11 at 08:59 +0800, Huang Ying wrote:
> On Fri, 2008-08-08 at 12:13 +0200, Peter Zijlstra wrote:
> > On Fri, 2008-08-08 at 14:52 +0800, Huang Ying wrote:
> > > Replace local_irq_disable() with raw_local_irq_disable() to prevent
> > > lockdep complain.
> > Uhhm, please provide more information - just using raw_* to silence
> > lockdep is generally the wrong thing to do.
> 
> In traditional kexec, the new kernel will replace current one, so the
> irq is simply disabled. But now jumping back from kexeced kernel is
> supported, so the irq should be enabled again.
> 
> The code sequence of irq during kexec jump is as follow:
> 
> local_irq_disable(); /* in kernel_kexec() */
> local_irq_disable(); /* in machine_kexec() */
> local_irq_enable(); /* in kernel_kexec() */
> 
> The disable and enable is not match. Maybe another method is to use
> local_irq_save(), local_irq_restore() pair in machine_kexec(), so the
> disable and enable is matched.

And its the machine kernel's lockdep instance that goes complain?

whichever annotation gets used - and I think I can agree that raw_*
might be approriate there, this should be accompanied with a rather
elaborate changelog and preferably a comment in the code too. Without
such we'll be wondering in the years to come WTH happens here.



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v2 6/8] kexec jump: fix for lockdep

2008-08-08 Thread Peter Zijlstra
On Fri, 2008-08-08 at 14:52 +0800, Huang Ying wrote:
> Replace local_irq_disable() with raw_local_irq_disable() to prevent
> lockdep complain.


Uhhm, please provide more information - just using raw_* to silence
lockdep is generally the wrong thing to do.

> Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
> 
> ---
>  arch/x86/kernel/machine_kexec_32.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/arch/x86/kernel/machine_kexec_32.c
> +++ b/arch/x86/kernel/machine_kexec_32.c
> @@ -123,7 +123,7 @@ void machine_kexec(struct kimage *image)
>   tracer_disable();
>  
>   /* Interrupts aren't acceptable while we reboot */
> - local_irq_disable();
> + raw_local_irq_disable();
>  
>   if (image->preserve_context) {
>  #ifdef CONFIG_X86_IO_APIC
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec