RE: RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread / KAWAIHIDEHIRO
Here is the revised commit description reflecting Dave's
comment.  Cc list was copied from -mm version.

From: Hidehiro Kawai 
Subject: x86/panic: replace smp_send_stop() with kdump friendly version in 
panic path

This patch fixes a problem reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44).

When kernel panics with crash_kexec_post_notifiers kernel parameter
enabled, other CPUs are stopped by smp_send_stop() instead of
machine_crash_shutdown() in __crash_kexec() path.

  panic()
if crash_kexec_post_notifiers == 1
  smp_send_stop()
  atomic_notifier_call_chain()
  kmsg_dump()
__crash_kexec()
  machine_crash_shutdown()

Different from smp_send_stop(), machine_crash_shutdown() stops other
CPUs with extra works for kdump.  So, if smp_send_stop() stops other
CPUs in advance, these extra works won't be done.  For x86, kdump
routines miss to save other CPUs' registers and disable virtualization
extensions.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
weak function, and it just call smp_send_stop().  Architecture
codes should override it so that kdump can work appropriately.
This patch only provides x86-specific version.

For Xen's PV kernel, just keep the current behavior.
As for Dom0, it doesn't use crash_kexec routines, and it relies on
panic notifier chain.  At the end of the chain, a hypercall is
issued which requests the hypervisor to execute kdump.  This means
regardless of crash_kexec_post_notifiers setting, smp_send_stop().
For PV HVM, it would work similarly to baremetal kernels with extra
cleanups for hypervisor.  It doesn't need additional care.

Changes in V4:
- Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
- Rename panic_smp_send_stop to crash_smp_send_stop
- Don't change the behavior for Xen's PV kernel

Changes in V3:
- Revise comments, description, and symbol names

Changes in V2:
- Replace smp_send_stop() call with crash_kexec version which
  saves cpu states and cleans up VMX/SVM
- Drop a fix for Problem 1 at this moment

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: 
http://lkml.kernel.org/r/20160810080948.11028.15344.st...@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai 
Reported-by: Daniel Walker 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Eric Biederman 
Cc: Masami Hiramatsu 
Cc: Daniel Walker 
Cc: Xunlei Pang 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: David Vrabel 
Cc: Toshi Kani 
Cc: Ralf Baechle 
Cc: David Daney 
Cc: Aaro Koskinen 
Cc: "Steven J. Hill" 
Cc: Corey Minyard 
Signed-off-by: Andrew Morton 

> -Original Message-
> From: Hidehiro Kawai
> Hi Xunlei,
> 
> > From: Xunlei Pang [mailto:xp...@redhat.com]
> > Sent: Tuesday, September 20, 2016 4:40 PM
> > On 2016/08/15/ at 19:22, Hidehiro Kawai wrote:
> > > Hi Dave,
> > >
> > > Thank you for the review.
> > >
> > >> From: Dave Young [mailto:dyo...@redhat.com]
> > >> Sent: Friday, August 12, 2016 12:17 PM
> > >>
> > >> Thanks for the update.
> > >> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> > >>> Daniel Walker reported problems which happens when
> > >>> crash_kexec_post_notifiers kernel option is enabled
> > >>> (https://lkml.org/lkml/2015/6/24/44).
> > >>>
> > >>> In that case, smp_send_stop() is called before entering kdump routines
> > >>> which assume other CPUs are still online.  As the result, for x86,
> > >>> kdump routines fail to save other CPUs' registers  and disable
> > >>> virtualization extensions.
> > >> Seems you simplified the changelog, but I think a little more details
> > >> will be helpful to understand the patch. You know sometimes lkml.org
> > >> does not work well.
> > > So, I'll try another archives when I post patch set next time.
> >
> > Hi Hidehiro Kawai,
> >
> > What's the status of this patch set, are you going to send an updated 
> > version?
> 
> Sorry, I misunderstood what Dave said, and I thought I don't
> need to revise the patch set.
> 
> Currently, this patch set is in -mm, then -next tree.
> What I need to fix is only commit descriptions, so I'm going to
> post revised descriptions as replies to this thread, and then
> request to Andrew to replace them if there is no objection.
> (or should I resend the whole patch set?)
> 
> Regards,
> Hidehiro Kawai
> 
> > >>> To fix this problem, call a new kdump friendly function,
> > >>> 

RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread / KAWAIHIDEHIRO
Dave Young suggested to me to explain the problem in more detail,
so here is the revised commit description.  The patch is now in -mm,
so I copied Cc list from -mm version.  Also I added Corey Minyard's
Tested-by and Reviewed-by.

From: Hidehiro Kawai 
Subject: mips/panic: replace smp_send_stop() with kdump friendly version in 
panic path

This patch fixes the problems reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44).

When kernel panics with crash_kexec_post_notifiers kernel parameter
enabled, other CPUs are stopped by smp_send_stop() instead of
machine_crash_shutdown() in __crash_kexec() path.

  panic()
if crash_kexec_post_notifiers == 1
  smp_send_stop()
  atomic_notifier_call_chain()
  kmsg_dump()
__crash_kexec()
  machine_crash_shutdown()
octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs

Different from smp_send_stop(), machine_crash_shutdown() stops other
CPUs with extra works for kdump.  So, if smp_send_stop() stops other
CPUs in advance, these extra works won't be done.  As the result,
kdump routines miss to save other CPUs' registers.  Additionally for
MIPS OCTEON, it misses to stop the watchdog timer.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
weak function, and it just call smp_send_stop().  Architecture
codes should override it so that kdump can work appropriately.
This patch provides MIPS version.

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: 
http://lkml.kernel.org/r/20160810080950.11028.28000.st...@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai 
Reported-by: Daniel Walker 
Tested-by: Corey Minyard 
Reviewed-by: Corey Minyard 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Eric Biederman 
Cc: Masami Hiramatsu 
Cc: Daniel Walker 
Cc: Xunlei Pang 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: David Vrabel 
Cc: Toshi Kani 
Cc: Ralf Baechle 
Cc: David Daney 
Cc: Aaro Koskinen 
Cc: "Steven J. Hill" 
Signed-off-by: Andrew Morton 

> From: Corey Minyard [mailto:cminy...@mvista.com]
> Sent: Friday, August 19, 2016 6:18 AM
> Sorry this took so long, but I have finally tested this, it seems to
> work fine:
> 
> Tested-by: Corey Minyard 
> Reviewed-by: Corey Minyard 
> 
> On 08/10/2016 03:09 AM, Hidehiro Kawai wrote:
> > Daniel Walker reported problems which happens when
> > crash_kexec_post_notifiers kernel option is enabled
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > In that case, smp_send_stop() is called before entering kdump routines
> > which assume other CPUs are still online.  As the result, kdump
> > routines fail to save other CPUs' registers.  Additionally for MIPS
> > OCTEON, it misses to stop the watchdog timer.
> >
> > To fix this problem, call a new kdump friendly function,
> > crash_smp_send_stop(), instead of the smp_send_stop() when
> > crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> > weak function, and it just call smp_send_stop().  Architecture
> > codes should override it so that kdump can work appropriately.
> > This patch provides MIPS version.
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Ralf Baechle 
> > Cc: David Daney 
> > Cc: Aaro Koskinen 
> > Cc: "Steven J. Hill" 
> > Cc: Corey Minyard 
> >
> > ---
> > I'm not familiar with MIPS, and I don't have a test environment and
> > just did build tests only.  Please don't apply this patch until
> > someone does enough tests, otherwise simply drop this patch.
> > ---
> >   arch/mips/cavium-octeon/setup.c  |   14 ++
> >   arch/mips/include/asm/kexec.h|1 +
> >   arch/mips/kernel/crash.c |   18 +-
> >   arch/mips/kernel/machine_kexec.c |1 +
> >   4 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/mips/cavium-octeon/setup.c 
> > b/arch/mips/cavium-octeon/setup.c
> > index cb16fcc..5537f95 100644
> > --- a/arch/mips/cavium-octeon/setup.c
> > +++ b/arch/mips/cavium-octeon/setup.c
> > @@ -267,6 +267,17 @@ static void 

RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread / KAWAIHIDEHIRO
Hi Xunlei,

> From: Xunlei Pang [mailto:xp...@redhat.com]
> Sent: Tuesday, September 20, 2016 4:40 PM
> On 2016/08/15/ at 19:22, Hidehiro Kawai wrote:
> > Hi Dave,
> >
> > Thank you for the review.
> >
> >> From: Dave Young [mailto:dyo...@redhat.com]
> >> Sent: Friday, August 12, 2016 12:17 PM
> >>
> >> Thanks for the update.
> >> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> >>> Daniel Walker reported problems which happens when
> >>> crash_kexec_post_notifiers kernel option is enabled
> >>> (https://lkml.org/lkml/2015/6/24/44).
> >>>
> >>> In that case, smp_send_stop() is called before entering kdump routines
> >>> which assume other CPUs are still online.  As the result, for x86,
> >>> kdump routines fail to save other CPUs' registers  and disable
> >>> virtualization extensions.
> >> Seems you simplified the changelog, but I think a little more details
> >> will be helpful to understand the patch. You know sometimes lkml.org
> >> does not work well.
> > So, I'll try another archives when I post patch set next time.
> 
> Hi Hidehiro Kawai,
> 
> What's the status of this patch set, are you going to send an updated version?

Sorry, I misunderstood what Dave said, and I thought I don't
need to revise the patch set.

Currently, this patch set is in -mm, then -next tree.
What I need to fix is only commit descriptions, so I'm going to
post revised descriptions as replies to this thread, and then
request to Andrew to replace them if there is no objection.
(or should I resend the whole patch set?)

Regards,
Hidehiro Kawai

> >>> To fix this problem, call a new kdump friendly function,
> >>> crash_smp_send_stop(), instead of the smp_send_stop() when
> >>> crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> >>> weak function, and it just call smp_send_stop().  Architecture
> >>> codes should override it so that kdump can work appropriately.
> >>> This patch only provides x86-specific version.
> >>>
> >>> For Xen's PV kernel, just keep the current behavior.
> >> Could you explain a bit about above Xen PV kernel behavior?
> >>
> >> BTW, this version looks better,  I think I'm fine with this version
> >> besides of the questions about changelog.
> > As for Dom0 kernel, it doesn't use crash_kexec routines, and
> > it relies on panic notifier chain.  At the end of the chain,
> > xen_panic_event is called, and it issues a hypercall which
> > requests Hypervisor to execute kdump.  This means whether
> > crash_kexec_panic_notifiers is set or not, panic notifiers
> > are called after smp_send_stop.  Even if we save registers
> > in Dom0 kernel, they seem to be ignored (Hypervisor is responsible
> > for that).  This is why I kept the current behavior for Xen.
> >
> > For PV DomU kernel, kdump is not supported.  For PV HVM
> > DomU, I'm not sure what will happen on panic because I
> > couldn't boot PV HVM DomU and test it.  But I think it will
> > work similarly to baremetal kernels with extra cleanups
> > for Hypervisor.
> >
> > Best regards,
> >
> > Hidehiro Kawai
> >
> >>> Changes in V4:
> >>> - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> >>> - Rename panic_smp_send_stop to crash_smp_send_stop
> >>> - Don't change the behavior for Xen's PV kernel
> >>>
> >>> Changes in V3:
> >>> - Revise comments, description, and symbol names
> >>>
> >>> Changes in V2:
> >>> - Replace smp_send_stop() call with crash_kexec version which
> >>>   saves cpu states and cleans up VMX/SVM
> >>> - Drop a fix for Problem 1 at this moment
> >>>
> >>> Reported-by: Daniel Walker 
> >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> >>> option)
> >>> Signed-off-by: Hidehiro Kawai 
> >>> Cc: Dave Young 
> >>> Cc: Baoquan He 
> >>> Cc: Vivek Goyal 
> >>> Cc: Eric Biederman 
> >>> Cc: Masami Hiramatsu 
> >>> Cc: Daniel Walker 
> >>> Cc: Xunlei Pang 
> >>> Cc: Thomas Gleixner 
> >>> Cc: Ingo Molnar 
> >>> Cc: "H. Peter Anvin" 
> >>> Cc: Borislav Petkov 
> >>> Cc: David Vrabel 
> >>> Cc: Toshi Kani 
> >>> Cc: Andrew Morton 
> >>> ---
> >>>  arch/x86/include/asm/kexec.h |1 +
> >>>  arch/x86/include/asm/smp.h   |1 +
> >>>  arch/x86/kernel/crash.c  |   22 +---
> >>>  arch/x86/kernel/smp.c|5 
> >>>  kernel/panic.c   |   47 
> >>> --
> >>>  5 files changed, 66 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> >>> index d2434c1..282630e 100644
> >>> --- a/arch/x86/include/asm/kexec.h
> >>> +++ b/arch/x86/include/asm/kexec.h
> >>> @@ -210,6 +210,7 @@ struct kexec_entry64_regs {
> >>>
> >>>  typedef void 

RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-15 Thread / KAWAIHIDEHIRO
Hi Dave,

Thank you for the review.

> From: Dave Young [mailto:dyo...@redhat.com]
> Sent: Friday, August 12, 2016 12:17 PM
> 
> Thanks for the update.
> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> > Daniel Walker reported problems which happens when
> > crash_kexec_post_notifiers kernel option is enabled
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > In that case, smp_send_stop() is called before entering kdump routines
> > which assume other CPUs are still online.  As the result, for x86,
> > kdump routines fail to save other CPUs' registers  and disable
> > virtualization extensions.
> 
> Seems you simplified the changelog, but I think a little more details
> will be helpful to understand the patch. You know sometimes lkml.org
> does not work well.

So, I'll try another archives when I post patch set next time.

> > To fix this problem, call a new kdump friendly function,
> > crash_smp_send_stop(), instead of the smp_send_stop() when
> > crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> > weak function, and it just call smp_send_stop().  Architecture
> > codes should override it so that kdump can work appropriately.
> > This patch only provides x86-specific version.
> >
> > For Xen's PV kernel, just keep the current behavior.
> 
> Could you explain a bit about above Xen PV kernel behavior?
> 
> BTW, this version looks better,  I think I'm fine with this version
> besides of the questions about changelog.

As for Dom0 kernel, it doesn't use crash_kexec routines, and
it relies on panic notifier chain.  At the end of the chain,
xen_panic_event is called, and it issues a hypercall which
requests Hypervisor to execute kdump.  This means whether
crash_kexec_panic_notifiers is set or not, panic notifiers
are called after smp_send_stop.  Even if we save registers
in Dom0 kernel, they seem to be ignored (Hypervisor is responsible
for that).  This is why I kept the current behavior for Xen.

For PV DomU kernel, kdump is not supported.  For PV HVM
DomU, I'm not sure what will happen on panic because I
couldn't boot PV HVM DomU and test it.  But I think it will
work similarly to baremetal kernels with extra cleanups
for Hypervisor.

Best regards,

Hidehiro Kawai

> > Changes in V4:
> > - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> > - Rename panic_smp_send_stop to crash_smp_send_stop
> > - Don't change the behavior for Xen's PV kernel
> >
> > Changes in V3:
> > - Revise comments, description, and symbol names
> >
> > Changes in V2:
> > - Replace smp_send_stop() call with crash_kexec version which
> >   saves cpu states and cleans up VMX/SVM
> > - Drop a fix for Problem 1 at this moment
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Dave Young 
> > Cc: Baoquan He 
> > Cc: Vivek Goyal 
> > Cc: Eric Biederman 
> > Cc: Masami Hiramatsu 
> > Cc: Daniel Walker 
> > Cc: Xunlei Pang 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Borislav Petkov 
> > Cc: David Vrabel 
> > Cc: Toshi Kani 
> > Cc: Andrew Morton 
> > ---
> >  arch/x86/include/asm/kexec.h |1 +
> >  arch/x86/include/asm/smp.h   |1 +
> >  arch/x86/kernel/crash.c  |   22 +---
> >  arch/x86/kernel/smp.c|5 
> >  kernel/panic.c   |   47 
> > --
> >  5 files changed, 66 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> > index d2434c1..282630e 100644
> > --- a/arch/x86/include/asm/kexec.h
> > +++ b/arch/x86/include/asm/kexec.h
> > @@ -210,6 +210,7 @@ struct kexec_entry64_regs {
> >
> >  typedef void crash_vmclear_fn(void);
> >  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> > +extern void kdump_nmi_shootdown_cpus(void);
> >
> >  #endif /* __ASSEMBLY__ */
> >
> > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> > index ebd0c16..f70989c 100644
> > --- a/arch/x86/include/asm/smp.h
> > +++ b/arch/x86/include/asm/smp.h
> > @@ -50,6 +50,7 @@ struct smp_ops {
> > void (*smp_cpus_done)(unsigned max_cpus);
> >
> > void (*stop_other_cpus)(int wait);
> > +   void (*crash_stop_other_cpus)(void);
> > void (*smp_send_reschedule)(int cpu);
> >
> > int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 9616cf7..650830e 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -133,15 +133,31 @@ static void kdump_nmi_callback(int cpu, struct 
> > pt_regs 

RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-15 Thread / KAWAIHIDEHIRO
Hi Corey,

> From: Corey Minyard [mailto:cminy...@mvista.com]
> Sent: Friday, August 12, 2016 10:56 PM
> I'll try to test this, but I have one comment inline...

Thank you very much!

> On 08/11/2016 10:17 PM, Dave Young wrote:
> > On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
[snip]
> >> diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
> >> index 610f0f3..1723b17 100644
> >> --- a/arch/mips/kernel/crash.c
> >> +++ b/arch/mips/kernel/crash.c
> >> @@ -47,9 +47,14 @@ static void crash_shutdown_secondary(void *passed_regs)
> >>
> >>   static void crash_kexec_prepare_cpus(void)
> >>   {
> >> +  static int cpus_stopped;
> >>unsigned int msecs;
> >> +  unsigned int ncpus;
> >>
> >> -  unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
> >> +  if (cpus_stopped)
> >> +  return;
> 
> Wouldn't you want an atomic operation and some special handling here to
> ensure that only one CPU does this?  So if a CPU comes in here and
> another CPU is already in the process stopping the CPUs it won't result in a
> deadlock.

Because this function can be called only one panicking CPU,
there is no problem.

There are two paths which crash_kexec_prepare_cpus is called.

Path 1 (panic path):
panic()
  crash_smp_send_stop()
crash_kexec_prepare_cpus()

Path 2 (oops path):
crash_kexec()
  __crash_kexec()
machine_crash_shutdown()
  default_machine_crash_shutdown() // for MIPS
crash_kexec_prepare_cpus()

Here, panic() and crash_kexec() run exclusively via
panic_cpu atomic variable.  So we can use cpus_stopped as
normal variable.

Best regards,

Hidehiro Kawai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-19 Thread / KAWAIHIDEHIRO
> From: 'Dave Young' [mailto:dyo...@redhat.com]
> Sent: Tuesday, July 19, 2016 3:52 PM
> Hi,
> On 07/19/16 at 05:51am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi,
> >
> > > From: 'Dave Young' [mailto:dyo...@redhat.com]
> > > Sent: Monday, July 18, 2016 6:02 PM
> > > On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi Dave,
> > > >
> > > > Thanks for your reply.
> > > >
> > > > > From: 'Dave Young' [mailto:dyo...@redhat.com]
> > > > > Sent: Wednesday, July 13, 2016 11:04 AM
> > > > >
> > > > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > > > Hi Dave,
> > > > > >
> > > > > > Thanks for the comments.
> > > > > >
> > > > > > > From: Dave Young [mailto:dyo...@redhat.com]
> > > > > > > Sent: Monday, July 11, 2016 5:35 PM
> > > > > > >
> > > > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > [snip]
> > > > > > > As for this patch I'm not sure it is safe to replace the
> > > > > > > smp_send_stop with the kdump friendly function. I'm also not sure 
> > > > > > > if
> > > > > > > the kdump friendly function is safe for kdump. Will glad to hear
> > > > > > > opinions from other arch experts.
> > > > > >
> > > > > > This stuff depends on architectures, so I speak only about
> > > > > > x86 (the logic doesn't change on other architectures at this time).
> > > > > >
> > > > > > kdump path with crash_kexec_post_notifiers disabled:
> > > > > >  panic()
> > > > > >__crash_kexec()
> > > > > >  crash_setup_regs()
> > > > > >  crash_save_vmcoreinfo()
> > > > > >  machine_crash_shutdown()
> > > > > >native_machine_crash_shutdown()
> > > > > >  panic_smp_send_stop() /* mostly same as original
> > > > > > * kdump_nmi_shootdown_cpus()
> > > > > > */
> > > > > >
> > > > > > kdump path with crash_kexec_post_notifiers enabled:
> > > > > >  panic()
> > > > > >panic_smp_send_stop()
> > > > > >__crash_kexec()
> > > > > >  crash_setup_regs()
> > > > > >  crash_save_vmcoreinfo()
> > > > > >  machine_crash_shutdown()
> > > > > >native_machine_crash_shutdown()
> > > > > >  panic_smp_send_stop() // do nothing
> > > > > >
> > > > > > The difference is that stopping other CPUs before crash_setup_regs()
> > > > > > and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
> > > > > > crash_save_vmcoreinfo() just save information to some memory area,
> > > > > > they wouldn't be affected by panic_smp_send_stop().  This means
> > > > > > placing panic_smp_send_stop before __crash_kexec is safe.
> > > > > >
> > > > > > BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the next
> > > > > > version.
> > > > >
> > > > > But it does breaks stuff which depends on cpu not being disabled like 
> > > > > problem 1 you mentioned in patch log.
> > > >
> > > > As I mentioned in the description of this patch, we should stop
> > > > other CPUs ASAP to preserve current state either
> > > > crash_kexec_post_notifiers is enabled or not.
> > > > Then, all remaining procedures should work well
> > > > after stopping other CPUs (but keep the CPU map online).
> > > >
> > > > Vivek also mentioned similar things:
> > > > https://lkml.org/lkml/2015/7/14/433
> > >
> > > The implementation in this patchset is different from suggestion in above 
> > > link?
> > >
> > > I think Vivek's suggestion is a good idea, to drop smp_send_stop and do 
> > > below:
> > >
> > > stop_cpus_save_register_state;
> > >
> > > if (!crash_kexec_post_notifiers)
> > >   crash_kexec()
> > > atomic_notifier_call_chain()
> > > kmsg_dump()
> > >
> > > I'm just commenting from code flow point of view, the detail 
> > > 

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-19 Thread / KAWAIHIDEHIRO
Hi,

> From: 'Dave Young' [mailto:dyo...@redhat.com]
> Sent: Monday, July 18, 2016 6:02 PM
> On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Dave,
> >
> > Thanks for your reply.
> >
> > > From: 'Dave Young' [mailto:dyo...@redhat.com]
> > > Sent: Wednesday, July 13, 2016 11:04 AM
> > >
> > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi Dave,
> > > >
> > > > Thanks for the comments.
> > > >
> > > > > From: Dave Young [mailto:dyo...@redhat.com]
> > > > > Sent: Monday, July 11, 2016 5:35 PM
> > > > >
> > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
[snip] 
> > > > > As for this patch I'm not sure it is safe to replace the
> > > > > smp_send_stop with the kdump friendly function. I'm also not sure if
> > > > > the kdump friendly function is safe for kdump. Will glad to hear
> > > > > opinions from other arch experts.
> > > >
> > > > This stuff depends on architectures, so I speak only about
> > > > x86 (the logic doesn't change on other architectures at this time).
> > > >
> > > > kdump path with crash_kexec_post_notifiers disabled:
> > > >  panic()
> > > >__crash_kexec()
> > > >  crash_setup_regs()
> > > >  crash_save_vmcoreinfo()
> > > >  machine_crash_shutdown()
> > > >native_machine_crash_shutdown()
> > > >  panic_smp_send_stop() /* mostly same as original
> > > > * kdump_nmi_shootdown_cpus()
> > > > */
> > > >
> > > > kdump path with crash_kexec_post_notifiers enabled:
> > > >  panic()
> > > >panic_smp_send_stop()
> > > >__crash_kexec()
> > > >  crash_setup_regs()
> > > >  crash_save_vmcoreinfo()
> > > >  machine_crash_shutdown()
> > > >native_machine_crash_shutdown()
> > > >  panic_smp_send_stop() // do nothing
> > > >
> > > > The difference is that stopping other CPUs before crash_setup_regs()
> > > > and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
> > > > crash_save_vmcoreinfo() just save information to some memory area,
> > > > they wouldn't be affected by panic_smp_send_stop().  This means
> > > > placing panic_smp_send_stop before __crash_kexec is safe.
> > > >
> > > > BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the next
> > > > version.
> > >
> > > But it does breaks stuff which depends on cpu not being disabled like 
> > > problem 1 you mentioned in patch log.
> >
> > As I mentioned in the description of this patch, we should stop
> > other CPUs ASAP to preserve current state either
> > crash_kexec_post_notifiers is enabled or not.
> > Then, all remaining procedures should work well
> > after stopping other CPUs (but keep the CPU map online).
> >
> > Vivek also mentioned similar things:
> > https://lkml.org/lkml/2015/7/14/433
> 
> The implementation in this patchset is different from suggestion in above 
> link?
> 
> I think Vivek's suggestion is a good idea, to drop smp_send_stop and do below:
> 
> stop_cpus_save_register_state;
> 
> if (!crash_kexec_post_notifiers)
>   crash_kexec()
> atomic_notifier_call_chain()
> kmsg_dump()
> 
> I'm just commenting from code flow point of view, the detail implementation
> definitely need more comments from Arch experts.
> 
> Any reason did not move the kdump friendly function to earlier point like
> before previous __crash_kexec() below?
> if (!crash_kexec_post_notifiers) {
> printk_nmi_flush_on_panic();
> __crash_kexec(NULL);
> }

The reason why the implementation differs from Vivek's is to keep
the current code flow if crash_kexec_post_notifiers is not specified.

If we apply Vivek's or your suggestion, it may always cause kdump
to fail on MIPS OCTEON due to Problem 1.  I don't want to make things
any worse.  I may post a patch for MIPS OCTEON, but I can't test it.
For other architectures, I'm not sure what problems there are.
So at first, I want to fix the case where crash_kexec_post_notifiers is
specified on x86.  Then, if all other architectures support
`stop other CPUs before crash_kexec', switch to your or Vivek's
suggesting code.

Is this acceptable?

Best regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-15 Thread / KAWAIHIDEHIRO
Hi Dave,

Thanks for your reply.

> From: 'Dave Young' [mailto:dyo...@redhat.com]
> Sent: Wednesday, July 13, 2016 11:04 AM
> 
> On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Dave,
> >
> > Thanks for the comments.
> >
> > > From: Dave Young [mailto:dyo...@redhat.com]
> > > Sent: Monday, July 11, 2016 5:35 PM
> > >
> > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > > > This patch fixes one of the problems reported by Daniel Walker
> > > > (https://lkml.org/lkml/2015/6/24/44).
> > > >
> > > > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > > > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > > > in crash_kexec() path.  This behavior change leads two problems.
> > > >
> > > >  Problem 1:
> > > >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs
> > > > are  still online and try to stop their watchdog timer.  If
> > > >  smp_send_stop() is called before octeon_generic_shutdown(),
> > > > stopping  watchdog timer will fail because other CPUs have been
> > > > offlined by  smp_send_stop().
> > > >
> > > >panic()
> > > >  if crash_kexec_post_notifiers == 1
> > > >smp_send_stop()
> > > >atomic_notifier_call_chain()
> > > >kmsg_dump()
> > > >  crash_kexec()
> > > >machine_crash_shutdown()
> > > >  octeon_generic_shutdown() // shutdown watchdog for ONLINE
> > > > CPUs
> > > >
> > > >  Problem 2:
> > > >  Most of architectures stop other CPUs in machine_crash_shutdown()
> > > > path, and they also do something needed for kdump.  For example,
> > > > they save registers, disable virtualization extensions, and so on.
> > > >  However, if smp_send_stop() stops other CPUs before
> > > > machine_crash_shutdown(), we miss those operations.
> > > >
> > > > How do we fix these problems?  In the first place, we should stop
> > > > other CPUs as soon as possible when panic() was called, otherwise
> > > > other CPUs may wipe out a clue to the cause of the failure.  So,
> > > > we replace smp_send_stop() with more suitable one for kdump.
> > >
> > > We have been avoiding extra things in panic path, but unfortunately
> > > crash_kexec_post_notifiers were added. I tend to agree the best
> > > place for this stuff is in 2nd kernel or purgatory instead of in 1st 
> > > kernel.
> >
> > Several months ago, I posted a patch set which writes regs to SEL,
> > generate an event to send SNMP message, and start/stop BMC's watchdog
> > timer in purgatory.  This feature requires BMC with KCS (Keyboard
> > Controller Style) I/F, but the most of enterprise grade server would have 
> > it.
> > (http://thread.gmane.org/gmane.linux.kernel.kexec/15382)
> >
> > Doing kmsg_dump things in purgatory wouldn't be suitable (should be
> > done in the 2nd kernel before enabling devices and IRQs?)
> 
> In theory it is doable maybe do something like oldmem_kmsg_dump while 
> /proc/vmcore initializing?

Hmm, I checked the case of using ACPI ERST as a persistent
storage. The feature is initialized by device_initcall():

device_initcall
  erst_init
pstore_register

And vmcore is initialized by fs_initcall() which is called
before device_initcall().  We may be able to change the sequence,
but anyway, these are done in later phase of the kernel initialization.
So, it may get less reliable although it would be doable.

> > > As for this patch I'm not sure it is safe to replace the
> > > smp_send_stop with the kdump friendly function. I'm also not sure if
> > > the kdump friendly function is safe for kdump. Will glad to hear
> > > opinions from other arch experts.
> >
> > This stuff depends on architectures, so I speak only about
> > x86 (the logic doesn't change on other architectures at this time).
> >
> > kdump path with crash_kexec_post_notifiers disabled:
> >  panic()
> >__crash_kexec()
> >  crash_setup_regs()
> >  crash_save_vmcoreinfo()
> >  machine_crash_shutdown()
> >native_machine_crash_shutdown()
> >  panic_smp_send_stop() /* mostly same as original
> > * kdump_nmi_shootdown_cpus()
> > */
> >
> > kdump path with crash_kexec_post_notifiers enabled:
> >  panic()
> >panic_smp_send_stop(

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-12 Thread / KAWAIHIDEHIRO
> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Xunlei Pang
> Sent: Tuesday, July 12, 2016 3:57 PM
> On 2016/07/12 at 11:56, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Xunlei,
> >
> > Thanks for the review.
> >
> >> From: Xunlei Pang [mailto:xp...@redhat.com]
> >> Sent: Tuesday, July 12, 2016 12:12 PM
> >> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
> >>> This patch fixes one of the problems reported by Daniel Walker
> >>> (https://lkml.org/lkml/2015/6/24/44).
> >>>
> >>> If crash_kexec_post_notifiers boot option is specified, other CPUs
> >>> are stopped by smp_send_stop() instead of machine_crash_shutdown()
> >>> in crash_kexec() path.  This behavior change leads two problems.
> >>>
> >>>  Problem 1:
> >>>  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >>>  still online and try to stop their watchdog timer.  If
> >>>  smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >>>  watchdog timer will fail because other CPUs have been offlined by
> >>>  smp_send_stop().
> >>>
> >>>panic()
> >>>  if crash_kexec_post_notifiers == 1
> >>>smp_send_stop()
> >>>atomic_notifier_call_chain()
> >>>kmsg_dump()
> >>>  crash_kexec()
> >>>machine_crash_shutdown()
> >>>  octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >>>
> >>>  Problem 2:
> >>>  Most of architectures stop other CPUs in machine_crash_shutdown()
> >>>  path, and they also do something needed for kdump.  For example,
> >>>  they save registers, disable virtualization extensions, and so on.
> >>>  However, if smp_send_stop() stops other CPUs before
> >>>  machine_crash_shutdown(), we miss those operations.
> >>>
> >>> How do we fix these problems?  In the first place, we should stop
> >>> other CPUs as soon as possible when panic() was called, otherwise
> >>> other CPUs may wipe out a clue to the cause of the failure.  So, we
> >>> replace smp_send_stop() with more suitable one for kdump.
> >>>
> >>> This patch solves Problem 2 by replacing smp_send_stop() in panic()
> >>> with panic_smp_send_stop().  This is a weak function which calls
> >>> smp_send_stop(), and architecture dependent code may override this
> >>> with appropriate one.  This patch only provides x86-specific version.
> >>>
> >>> Changes in V3:
> >>> - Revise comments, description, and symbol names
> >>>
> >>> Changes in V2:
> >>> - Replace smp_send_stop() call with crash_kexec version which
> >>>   saves cpu states and cleans up VMX/SVM
> >>> - Drop a fix for Problem 1 at this moment
> >>>
> >>> Reported-by: Daniel Walker <dwal...@fifo99.com>
> >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> >>> option)
> >>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai...@hitachi.com>
> >>> Cc: Dave Young <dyo...@redhat.com>
> >>> Cc: Baoquan He <b...@redhat.com>
> >>> Cc: Vivek Goyal <vgo...@redhat.com>
> >>> Cc: Eric Biederman <ebied...@xmission.com>
> >>> Cc: Masami Hiramatsu <mhira...@kernel.org>
> >>> Cc: Thomas Gleixner <t...@linutronix.de>
> >>> Cc: Ingo Molnar <mi...@redhat.com>
> >>> Cc: "H. Peter Anvin" <h...@zytor.com>
> >>> Cc: Borislav Petkov <b...@suse.de>
> >>> Cc: Toshi Kani <toshi.k...@hpe.com>
> >>> Cc: "Peter Zijlstra (Intel)" <pet...@infradead.org>
> >>> Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
> >>> Cc: "Lee, Chun-Yi" <joeyli.ker...@gmail.com>
> >>> Cc: Minfei Huang <mnfhu...@gmail.com>
> >>> Cc: Andrew Morton <a...@linux-foundation.org>
> >>> Cc: Michal Hocko <mho...@suse.com>
> >>> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> >>> Cc: Petr Mladek <pmla...@suse.com>
> >>> Cc: Tejun Heo <t...@kernel.org>
> >>> Cc: Josh Poimboeuf <jpoim...@redhat.com>
> >>> ---
> >>>  arch/x86/kernel/crash.c |   14 ++
> >>>  kernel/panic.c  |   43 
> >>> 

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-12 Thread / KAWAIHIDEHIRO
Hi Xunlei,

Thanks for the review.

> From: Xunlei Pang [mailto:xp...@redhat.com]
> Sent: Tuesday, July 12, 2016 12:12 PM
> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
> > This patch fixes one of the problems reported by Daniel Walker
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > in crash_kexec() path.  This behavior change leads two problems.
> >
> >  Problem 1:
> >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >  still online and try to stop their watchdog timer.  If
> >  smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >  watchdog timer will fail because other CPUs have been offlined by
> >  smp_send_stop().
> >
> >panic()
> >  if crash_kexec_post_notifiers == 1
> >smp_send_stop()
> >atomic_notifier_call_chain()
> >kmsg_dump()
> >  crash_kexec()
> >machine_crash_shutdown()
> >  octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >
> >  Problem 2:
> >  Most of architectures stop other CPUs in machine_crash_shutdown()
> >  path, and they also do something needed for kdump.  For example,
> >  they save registers, disable virtualization extensions, and so on.
> >  However, if smp_send_stop() stops other CPUs before
> >  machine_crash_shutdown(), we miss those operations.
> >
> > How do we fix these problems?  In the first place, we should stop
> > other CPUs as soon as possible when panic() was called, otherwise
> > other CPUs may wipe out a clue to the cause of the failure.  So, we
> > replace smp_send_stop() with more suitable one for kdump.
> >
> > This patch solves Problem 2 by replacing smp_send_stop() in panic()
> > with panic_smp_send_stop().  This is a weak function which calls
> > smp_send_stop(), and architecture dependent code may override this
> > with appropriate one.  This patch only provides x86-specific version.
> >
> > Changes in V3:
> > - Revise comments, description, and symbol names
> >
> > Changes in V2:
> > - Replace smp_send_stop() call with crash_kexec version which
> >   saves cpu states and cleans up VMX/SVM
> > - Drop a fix for Problem 1 at this moment
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Dave Young 
> > Cc: Baoquan He 
> > Cc: Vivek Goyal 
> > Cc: Eric Biederman 
> > Cc: Masami Hiramatsu 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Borislav Petkov 
> > Cc: Toshi Kani 
> > Cc: "Peter Zijlstra (Intel)" 
> > Cc: Takao Indoh 
> > Cc: "Lee, Chun-Yi" 
> > Cc: Minfei Huang 
> > Cc: Andrew Morton 
> > Cc: Michal Hocko 
> > Cc: Vitaly Kuznetsov 
> > Cc: Petr Mladek 
> > Cc: Tejun Heo 
> > Cc: Josh Poimboeuf 
> > ---
> >  arch/x86/kernel/crash.c |   14 ++
> >  kernel/panic.c  |   43 ---
> >  2 files changed, 42 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 9ef978d..3305433 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct 
> > pt_regs *regs)
> > disable_local_APIC();
> >  }
> >
> > -static void kdump_nmi_shootdown_cpus(void)
> > +/* Override the weak function in kernel/panic.c */
> > +void panic_smp_send_stop(void)
> >  {
> > -   nmi_shootdown_cpus(kdump_nmi_callback);
> > +   static int cpus_stopped;
> 
> Should be atomic_t type?

panic_smp_send_stop() can be called by only one panicking CPU
(but can be called twice). It is sufficient to be normal variable.

> > +
> > +   if (cpus_stopped)
> > +   return;
> >
> > +   nmi_shootdown_cpus(kdump_nmi_callback);
> > disable_local_APIC();
> > +   cpus_stopped = 1;
> >  }
> >
> >  #else
> > -static void kdump_nmi_shootdown_cpus(void)
> > +void panic_smp_send_stop(void)
> >  {
> > /* There are no cpus to shootdown */
> >  }
> > @@ -160,7 +166,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > /* The kernel is broken so disable interrupts */
> > local_irq_disable();
> >
> > -   kdump_nmi_shootdown_cpus();
> > +   panic_smp_send_stop();
> >
> > /*
> >  * VMCLEAR VMCSs loaded on this cpu if needed.
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 8aa7449..da8062d2 100644
> > --- a/kernel/panic.c
> > +++ 

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-12 Thread / KAWAIHIDEHIRO
Hi Dave,

Thanks for the comments.

> From: Dave Young [mailto:dyo...@redhat.com]
> Sent: Monday, July 11, 2016 5:35 PM
> 
> On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > This patch fixes one of the problems reported by Daniel Walker
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > in crash_kexec() path.  This behavior change leads two problems.
> >
> >  Problem 1:
> >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >  still online and try to stop their watchdog timer.  If
> >  smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >  watchdog timer will fail because other CPUs have been offlined by
> >  smp_send_stop().
> >
> >panic()
> >  if crash_kexec_post_notifiers == 1
> >smp_send_stop()
> >atomic_notifier_call_chain()
> >kmsg_dump()
> >  crash_kexec()
> >machine_crash_shutdown()
> >  octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >
> >  Problem 2:
> >  Most of architectures stop other CPUs in machine_crash_shutdown()
> >  path, and they also do something needed for kdump.  For example,
> >  they save registers, disable virtualization extensions, and so on.
> >  However, if smp_send_stop() stops other CPUs before
> >  machine_crash_shutdown(), we miss those operations.
> >
> > How do we fix these problems?  In the first place, we should stop
> > other CPUs as soon as possible when panic() was called, otherwise
> > other CPUs may wipe out a clue to the cause of the failure.  So, we
> > replace smp_send_stop() with more suitable one for kdump.
> 
> We have been avoiding extra things in panic path, but unfortunately
> crash_kexec_post_notifiers were added. I tend to agree the best place
> for this stuff is in 2nd kernel or purgatory instead of in 1st kernel.

Several months ago, I posted a patch set which writes regs to SEL, generate
an event to send SNMP message, and start/stop BMC's watchdog timer in
purgatory.  This feature requires BMC with KCS (Keyboard Controller Style)
I/F, but the most of enterprise grade server would have it.
(http://thread.gmane.org/gmane.linux.kernel.kexec/15382)

Doing kmsg_dump things in purgatory wouldn't be suitable (should be done
in the 2nd kernel before enabling devices and IRQs?)
 
> As for this patch I'm not sure it is safe to replace the smp_send_stop
> with the kdump friendly function. I'm also not sure if the kdump friendly
> function is safe for kdump. Will glad to hear opinions from other
> arch experts.

This stuff depends on architectures, so I speak only about
x86 (the logic doesn't change on other architectures at this time).

kdump path with crash_kexec_post_notifiers disabled:
 panic()
   __crash_kexec()
 crash_setup_regs()
 crash_save_vmcoreinfo()
 machine_crash_shutdown()
   native_machine_crash_shutdown()
 panic_smp_send_stop() /* mostly same as original 
* kdump_nmi_shootdown_cpus()
*/

kdump path with crash_kexec_post_notifiers enabled:
 panic()
   panic_smp_send_stop()
   __crash_kexec()
 crash_setup_regs()
 crash_save_vmcoreinfo()
 machine_crash_shutdown()
   native_machine_crash_shutdown()
 panic_smp_send_stop() // do nothing

The difference is that stopping other CPUs before crash_setup_regs()
and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
crash_save_vmcoreinfo() just save information to some memory area, 
they wouldn't be affected by panic_smp_send_stop().  This means
placing panic_smp_send_stop before __crash_kexec is safe.

BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the
next version.

> BTW, if one want to use crash_kexec_post_notifiers he should take the
> risk of unreliable kdump. How about only call smp_send_stop in case no
> crash_kexec_post_notifiers being used.

Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(), smp_send_stop()
for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI.
This would be because NMI IPI has a risk of deadlock.  We checked if
the kdump path has a risk of deadlock in the case of NMI panic and fixed
it.  But I'm not sure about normal panic path.  I agree with that use
smp_send_stop if crash_kexec_post_notifiers or kdump is disabled.

> > This patch solves Problem 2 by replacing smp_send_stop() in panic()
> > with panic_smp_send_stop().  This is a weak function which calls
> > smp_send_stop(), and architecture dependent code may override this
> > with appropriate one.  This patch only provides x86-specific version.
> 
> It does not fix the Problem 1, it seem not possible to fix it?

Problem 1 depends on architectures, and at least it doesn't happen
on x86.  I can try to fix the Problem 1 for MIPS, but I can't test it.
Possible solution will be to use an smp_send_stop variant which stop
the CPU 

RE: [RFC v2 PATCH 0/7] purgatory: Say last words in kexec on panic case

2016-02-28 Thread / KAWAIHIDEHIRO
Hello,

Are there any comments especially about the direction?
Currently, there are two opinions.

Corey Minyard (IPMI driver maintainer) prefers to do this kind of
works in the 1st kernel because the IPMI driver already supports
various BMC implementations. His comments can be found here with
my RFC v1 patch set:
http://thread.gmane.org/gmane.linux.kernel.kexec/15131

And, preliminary discussion related to this patch set with Eric
Biederman can be found here:
https://lkml.org/lkml/2015/8/4/301

If there is no strong objection, I just complete the missing parts...

Best regards,

Hidehiro Kawai

> From: kexec [mailto:kexec-boun...@lists.infradead.org] On Behalf Of Hidehiro 
> Kawai
> This is the version 2 of RFC patch series previously named
> "purgatory: Add basic support for IPMI command execution."
> 
> Changes since RFC v1:
> - Add --ipmi-kcs-ports option to specify I/O ports for KCS I/F
> - Add --ipmi-handle-panic option to generate an event on BMC to
>   inform and record about the panic
> - Add --output-cpu-ip option to output IP registers to console
>   and BMC's SEL
> ...and various cleanups and improvements
> 
> I performed some Web searches, and I found that major BMCs still
> support KCS I/F.  Although this patch series supports only KCS I/F, it
> should cover many recent servers with BMC.  However, I/O ports used
> for KCS I/F are varies among servers, so I added an option to specify
> port numbers for KCS.
> 
> About error handling for KCS protocol, I re-checked IPMI
> specification, and I found that simply retrying on error is
> sufficient.  So I didn't change the logic for the simplicity.  Please
> see PATCH 3/7 for details.
> 
> 
> Background and purpose
> ==
> 
> If the second kernel for crash dumping hangs up while booting, no
> information related to the first kernel will be saved.  This makes
> crash cause analysis difficult.  Some enterprise users don't permit
> the same faults happen multiple times, so we want to save minimal
> information before booting the second kernel.
> 
> 
> Approaches
> ==
> 
> One of the approaches is (1) to use panic notifier call or kmsg dump
> feature.  For example, a panic notifier callback registered by IPMI
> driver can save the panic message to BMC's SEL before booting the
> second kernel.  Similarly, kmsg dump saves kernel logs to a non-
> volatile memory on the server.  This approach covers multiple
> hardware/firmware implementation.  However, doing many things in
> crashed kernel will reduce the reliability.  Additionally, a part of
> the code is also used for normal operation and still evolving.  This
> would makes it difficult to keep stable.
> 
> Another approach is (2) to save minimal information to BMC's SEL in
> purgatory.  It is difficult to do complicate things in purgatory, but
> fortunately IPMI specification defines a simple interface, KCS
> (Keyboard Controller Style) I/F.  KCS is controlled by two I/O ports
> and supported by most of major BMCs.
> 
> Here, we want more reliable one for the purpose, we adopt (2).
> 
> 
> What are provided?
> ==
> 
> This patch series provides multiple RAS features other than the main
> purpose described above.
> 
> - Timeout mechanism relying on polling RTC
> - API to access BMC via KCS I/F
> - Command line option to start/stop BMC's watchdog timer in purgatory
> - Command line option to write the value of RIP registers to SEL and/or
>   serial console (useful for kernel hang-up cases)
> - Command line option to generate a plantform event on BMC (useful for
>   server monitoring or HA clustering; you can make the BMC issue an
>   SNMP trap)
> - Command line option to change the default I/O ports of KCS I/F
> 
> 
> Limitations of RFC version
> ==
> 
> This patch serires is incomplete, and there are some limitations.
> 
> - Related codes are unconditionally built into the kexec binary
> - Implemented only for x86_64 (it may break the build for other
>   architectures)
> - Timeout value for IPMI operations is hard-coded
> 
> 
> Future plan
> ===
> 
> - Add an option to save the panic message to BMC's SEL (this requires
>   some kernel modifications)
> 
> ---
> 
> Hidehiro Kawai (7):
>   purgatory: Introduce timeout API
>   purgatory/x86: Support CMOS RTC
>   purgatory/ipmi: Support IPMI KCS interface
>   purgatory/ipmi: Support BMC watchdog timer start/stop in purgatory
>   purgatory/ipmi: Add an option for purgatory to handle panic event with 
> IPMI
>   purgatory/x86: Add an option to output IP registers to console in panic 
> case
>   purgatory/ipmi/x86: Support logging of IP registers into SEL
> 
> 
>  kexec/arch/i386/crashdump-x86.c  |   10 -
>  kexec/arch/i386/include/arch/options.h   |4
>  kexec/arch/i386/kexec-x86.h  |1
>  kexec/arch/x86_64/kexec-x86_64.c |   15 +
>  kexec/ipmi.h |9 +
>  kexec/kexec.c 

RE: [RFC PATCH 1/4] purgatory/ipmi: Support BMC watchdog timer start/stop in purgatory

2016-01-21 Thread / KAWAIHIDEHIRO
> A general note here.  It does not appear that you implement the
> error recovery states in your state machine.  If the system fails
> in the middle of doing an IPMI operation, it is likely to fail.

The reason why I din't implement the error handling is that
I think the error rate is low and it may take many seconds (but I
don't have any statistical data, that's my anticipation).

The most important thing is to start booting the 2nd kernel surely
and as soon as possible.  For example, if a user uses a feature
like fence_kdump and if the execution of fence_kdump gets delayed,
the crashed host will be shot down by other host waiting for the
notification from fence_kdump.

Also, to keep the code simple is important for the reliability.

Anyway, I'll rethink whether I can implement the error handling
in simple logic or not.

> If you do this you will need to detect and abort any running
> operation.  Implementing the full state machine is probably the
> best approach, it should handle this, though it is rather complex.
> 
> -corey

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI

2015-12-10 Thread / KAWAIHIDEHIRO
> From: Borislav Petkov [mailto:b...@alien8.de]
[...]
> Looks better.
> 
> Did some comments cleanup and nmi_panic() macro reformatting so that it
> is more readable and ended up applying this:

Thanks a lot!

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

> ---
> From: Hidehiro Kawai 
> Date: Thu, 10 Dec 2015 10:46:26 +0900
> Subject: [PATCH] panic, x86: Fix re-entrance problem due to panic on NMI
> 
> If panic on NMI happens just after panic() on the same CPU, panic() is
> recursively called. Kernel stalls, as a result, after failing to acquire
> panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if we've
> already entered panic().
> 
> For that, introduce nmi_panic() macro to reduce code duplication. In
> the case of panic on NMI, don't return from NMI handlers if another CPU
> already panicked.
> 
> Signed-off-by: Hidehiro Kawai 
> Acked-by: Michal Hocko 
> Cc: Aaron Tomlin 
> Cc: Andrew Morton 
> Cc: Andy Lutomirski 
> Cc: Baoquan He 
> Cc: Chris Metcalf 
> Cc: David Hildenbrand 
> Cc: Don Zickus 
> Cc: "Eric W. Biederman" 
> Cc: Frederic Weisbecker 
> Cc: Gobinda Charan Maji 
> Cc: HATAYAMA Daisuke 
> Cc: "H. Peter Anvin" 
> Cc: Ingo Molnar 
> Cc: Javi Merino 
> Cc: Jonathan Corbet 
> Cc: kexec@lists.infradead.org
> Cc: linux-...@vger.kernel.org
> Cc: lkml 
> Cc: Masami Hiramatsu 
> Cc: Michal Nazarewicz 
> Cc: Nicolas Iooss 
> Cc: Peter Zijlstra 
> Cc: Prarit Bhargava 
> Cc: Rasmus Villemoes 
> Cc: Rusty Russell 
> Cc: Seth Jennings 
> Cc: Steven Rostedt 
> Cc: Thomas Gleixner 
> Cc: Ulrich Obergfell 
> Cc: Vitaly Kuznetsov 
> Cc: Vivek Goyal 
> Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
> [ Cleanup comments, fixup formatting. ]
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/nmi.c  | 16 
>  include/linux/kernel.h | 20 
>  kernel/panic.c | 16 +---
>  kernel/watchdog.c  |  2 +-
>  4 files changed, 46 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 697f90db0e37..292a24bd0553 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
>  #endif
> 
>   if (panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
> 
> @@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs 
> *regs)
>reason, smp_processor_id());
>   show_regs(regs);
> 
> - if (panic_on_io_nmi)
> - panic("NMI IOCK error: Not continuing");
> + if (panic_on_io_nmi) {
> + nmi_panic("NMI IOCK error: Not continuing");
> +
> + /*
> +  * If end up here, it means we have received an NMI while
> +  * processing panic(). Simply return without delaying and
> +  * re-enabling NMI.
> +  */
> + return;
> + }
> 
>   /* Re-enable the IOCK line, wait for a few seconds */
>   reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> @@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs 
> *regs)
> 
>   pr_emerg("Do you have a strange power saving mode enabled?\n");
>   if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 350dfb08aee3..750cc5c7c999 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -446,6 +446,26 @@ extern int sysctl_panic_on_stackoverflow;
>  extern bool crash_kexec_post_notifiers;
> 
>  /*
> + * panic_cpu is used for synchronizing panic() and crash_kexec() execution. 
> It
> + * holds a CPU number which is executing panic() currently. A value of
> + * PANIC_CPU_INVALID means no CPU has entered panic() or crash_kexec().
> + */
> +extern atomic_t panic_cpu;
> +#define PANIC_CPU_INVALID-1
> +
> +/*
> + * A variant of panic() called from NMI context. We return if 

RE: Re: [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast

2015-12-09 Thread / KAWAIHIDEHIRO
Hi Steven,

> From: Steven Rostedt [mailto:rost...@goodmis.org]
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> > > +  */
> > > + while (!raw_spin_trylock(_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box. But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

It seems that poll_crash_ipi_and_callback (now renamed to 
run_crash_ipi_callback)
is referred for UP if CONFIG_DEBUG_SPINLOCK=y, and it causes a build error
as below.  run_crash_ipi_callback refers crash_ipi_issued and 
crash_nmi_callback,
which are defined only if CONFIG_SMP=y.  So we need to defined it separately
for SMP and UP.

I'll resend this patch later.

> Hi Hidehiro,
> 
> [auto build test ERROR on v4.4-rc4]
> [also build test ERROR on next-20151209]
> [cannot apply to tip/x86/core]
> 
> url:
> https://github.com/0day-ci/linux/commits/Hidehiro-Kawai/Fix-race-issues-among-panic-NMI-and-crash_kexec/20151210-095
> 254
> config: x86_64-randconfig-s4-12101030 (attached as .config)
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
> 
> All errors (new ones prefixed by >>):
> 
>arch/x86/built-in.o: In function `do_nmi':
> >> (.text+0x7339): undefined reference to `run_crash_ipi_callback'
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-12-03 Thread / KAWAIHIDEHIRO
> On Thu, Dec 03, 2015 at 02:01:38AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Wed, Dec 02, 2015 at 11:57:38AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > We can do so, but I think resetting panic_cpu always would be
> > > > simpler and safer.
> >
> > I'll state in detail.
> >
> > When we call crash_kexec() without entering panic() and return from
> > it, panic() should be called eventually.
> 
> Huh, the call chain is
> 
> panic->crash_kexec
> 
> Or do you mean, when crash_kexec() is not called by panic() but by some
> of its other callers?

I was arguing about the case of oops_end --> crash_kexec
--> return from crash_kexec because of !kexec_crash_image -->
panic.

In the case of panic --> __crash_kexec, __crash_kexec is called
only once, so we don't need to check the return value of __crash_kexec
as you suggested.  So I thought you stated about crash_kexec --> panic
case.

> > But the code paths are a bit complicated and there are many
> > implementations for each architecture. So one day, this assumption may
> > be broken; the CPU doesn't call panic(). Or the CPU may fail to call
> > panic() because we are already in insane state. It would be nervous,
> > but allowing another CPU to process panic routines by resetting
> > panic_cpu is safer approach.
> 
> My suggestion was to do this only on the panic path - not necessarily on
> the others.
> 
> > Since this code is executed only once due to panic_cpu,
> > I think introducing this logic is not much valuable.
> > Also, current implementation is already quite simple:
> >
> > panic()
> > {
> > ...
> > __crash_kexec(NULL) {
> > if (mutex_trylock(_mutex)) {
> > if (kexec_crash_image) {
> > /* don't return */
> > }
> 
> I don't mean the kexec_crash_image case - I mean the opposite one:
> !kexec_crash_image.

I also mentioned !kexec_crash_image case...

> And I think I know now what you're trying to tell
> me: the first CPU which hits panic, will finish panic eventually and so
> it will take down the machine.

No.  The first CPU calls panic, and then it calls __crash_kexec.
Because of !kexec_crash_image, it returns from __crash_kexec and
continues to the panic procedure.  At the same time, another CPU
tries to call panic(), but it doesn't run the panic procedure;
panic_cpu prevents the second CPU from running it.
This means __crash_kexec is called only once even if we don't
check the return value of __crash_kexec.
(Please note that crash_kexec can be called multiple times in the
case of oops_end() --> crash_kexec().)

I'm sorry I couldn't tell my thought well.

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-12-02 Thread / KAWAIHIDEHIRO
Hello Borislav,

Sorry, I haven't replied to this mail yet.

> On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
...
> > +void crash_kexec(struct pt_regs *regs)
> > +{
> > +   int old_cpu, this_cpu;
> > +
> > +   /*
> > +* Only one CPU is allowed to execute the crash_kexec() code as with
> > +* panic().  Otherwise parallel calls of panic() and crash_kexec()
> > +* may stop each other.  To exclude them, we use panic_cpu here too.
> > +*/
> > +   this_cpu = raw_smp_processor_id();
> > +   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > +   if (old_cpu == -1) {
> > +   /* This is the 1st CPU which comes here, so go ahead. */
> > +   __crash_kexec(regs);
> > +
> > +   /*
> > +* Reset panic_cpu to allow another panic()/crash_kexec()
> > +* call.
> 
> So can we make __crash_kexec() return error values?
> 
> * failed to grab kexec_mutex -> reset panic_cpu
> 
> * no kexec_crash_image -> no need to reset it, all future crash_kexec()
> calls won't work so no need to run into that path anymore. However, this could
> be problematic if we want the other CPUs to panic. Do we care?
> 
> * machine_kexec successful -> doesn't matter

We can do so, but I think resetting panic_cpu always would be
simpler and safer.

Although checking kexec_crash_image each time is pointless, it
doesn't cause any actual problem.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-12-02 Thread / KAWAIHIDEHIRO
> On Wed, Dec 02, 2015 at 11:57:38AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > We can do so, but I think resetting panic_cpu always would be
> > simpler and safer.

I'll state in detail.

When we call crash_kexec() without entering panic() and return from
it, panic() should be called eventually.  But the code paths are
a bit complicated and there are many implementations for each
architecture.  So one day, this assumption may be broken; the CPU
doesn't call panic().  Or the CPU may fail to call panic() because
we are already in insane state.  It would be nervous, but allowing
another CPU to process panic routines by resetting panic_cpu
is safer approach.

> Well, I think executing code needlessly *especially* at panic time is
> not all that rosy either.
> 
> Besides something like this:
> 
>   static bool kexec_failed;
> 
>   ...
> 
> if (crash_kexec_post_notifiers && !kexec_failed)
>   kexec_failed = __crash_kexec(NULL);
> 
> is as simple as it gets.

Since this code is executed only once due to panic_cpu,
I think introducing this logic is not much valuable.
Also, current implementation is already quite simple:

panic()
{
...
__crash_kexec(NULL) {
if (mutex_trylock(_mutex)) {
if (kexec_crash_image) {
/* don't return */
}
}
mutex_unlock(_mutex)
}

How do you think?

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-12-02 Thread / KAWAIHIDEHIRO
> @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
>   }
> 
>   /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> - raw_spin_lock(_reason_lock);
> +
> + /*
> +  * Another CPU may be processing panic routines with holding
> +  * nmi_reason_lock.  Check IPI issuance from the panicking CPU
> +  * and call the callback directly.
> +  */
> + while (!raw_spin_trylock(_reason_lock))
> + poll_crash_ipi_and_callback(regs);
> +
>   reason = x86_platform.get_nmi_reason();

I noticed this logic is unneeded until applying PATCH 4/4.
Currently, unknown NMI can be broadcast to all CPUs, but in that case,
panic()/nmi_panic() are called after releasing nmi_reason_lock.
So CPUs can't loop infinitely here.

PATCH 4/4 allows us to broadcast external NMIs to all CPUs, and it
causes infinite loop in raw_spin_lock(_reason_lock).  So the above
changes are needed.

I'll move these chagnes to a later patch in the next version.

Thanks,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V5 PATCH 4/4] x86/apic: Introduce apic_extnmi boot option

2015-11-25 Thread / KAWAIHIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:50PM +0900, Hidehiro Kawai wrote:
> > This patch introduces new boot option, apic_extnmi:
> >
> >  apic_extnmi={ bsp | all | none}
> >
> > The default value is "bsp" and this is the current behavior; only
> > BSP receives external NMI.  "all" allows external NMIs to be
> > broadcast to all CPUs.  This would raise the success rate of panic
> > on NMI when BSP hangs up in NMI context or the external NMI is
> > swallowed by other NMI handlers on BSP.  If you specified "none",
> > any CPUs don't receive external NMIs.  This is useful for dump
> > capture kernel so that it wouldn't be shot down while saving a
> > crash dump.
> >
> > V5:
> > - Rename the option from "noextnmi" to "apic_extnmi"
> > - Add apic_extnmi=all feature
> > - Fix the wrong documentation about "noextnmi" (apic_extnmi=none)
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Jonathan Corbet 
> > ---
> >  Documentation/kernel-parameters.txt |9 +
> >  arch/x86/include/asm/apic.h |5 +
> >  arch/x86/kernel/apic/apic.c |   31 ++-
> >  3 files changed, 44 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index f8aae63..ceed3bc 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -472,6 +472,15 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > Change the amount of debugging information output
> > when initialising the APIC and IO-APIC components.
> >
> > +   apic_extnmi=[APIC,X86] External NMI delivery setting
> > +   Format: { bsp (default) | all | none }
> > +   bsp:  External NMI is delivered to only CPU 0
> 
>   only to

Thanks for tha correction.

> 
> > +   all:  External NMIs are broadcast to all CPUs as a
> > + backup of CPU 0
> > +   none: External NMI is masked for all CPUs. This is
> > + useful so that a dump capture kernel won't be
> > + shot down by NMI
> > +
> > autoconf=   [IPV6]
> > See Documentation/networking/ipv6.txt.
> >
> > diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> > index 7f62ad4..c80f6b6 100644
> > --- a/arch/x86/include/asm/apic.h
> > +++ b/arch/x86/include/asm/apic.h
> > @@ -23,6 +23,11 @@
> >  #define APIC_VERBOSE 1
> >  #define APIC_DEBUG   2
> >
> > +/* Macros for apic_extnmi which controls external NMI masking */
> > +#define APIC_EXTNMI_BSP0 /* Default */
> > +#define APIC_EXTNMI_ALL1
> > +#define APIC_EXTNMI_NONE   2
> > +
> >  /*
> >   * Define the default level of output to be very little
> >   * This can be turned up by using apic=verbose for more
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index 2f69e3b..a2a8074 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -82,6 +82,12 @@ physid_mask_t phys_cpu_present_map;
> >  static unsigned int disabled_cpu_apicid __read_mostly = BAD_APICID;
> >
> >  /*
> > + * This variable controls which CPUs receive external NMIs.  By default,
> > + * external NMIs are delivered to only BSP.
> 
> only to the BSP.

...and again.

> 
> > + */
> > +static int apic_extnmi = APIC_EXTNMI_BSP;
> > +
> > +/*
> >   * Map cpu index to physical APIC ID
> >   */
> >  DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
> > @@ -1161,6 +1167,8 @@ void __init init_bsp_APIC(void)
> > value = APIC_DM_NMI;
> > if (!lapic_is_integrated()) /* 82489DX */
> > value |= APIC_LVT_LEVEL_TRIGGER;
> > +   if (apic_extnmi == APIC_EXTNMI_NONE)
> > +   value |= APIC_LVT_MASKED;
> > apic_write(APIC_LVT1, value);
> >  }
> >
> > @@ -1380,7 +1388,8 @@ void setup_local_APIC(void)
> > /*
> >  * only the BP should see the LINT1 NMI signal, obviously.
> >  */
> 
> That comment needs adjusting.

OK, I'll do that.

> 
> > -   if (!cpu)
> > +   if ((!cpu && apic_extnmi != APIC_EXTNMI_NONE) ||
> > +   apic_extnmi == APIC_EXTNMI_ALL)
> > value = APIC_DM_NMI;
> > else
> > value = APIC_DM_NMI | APIC_LVT_MASKED;
> > @@ -2548,3 +2557,23 @@ static int __init apic_set_disabled_cpu_apicid(char 
> > *arg)
> > return 0;
> >  }
> >  early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
> > +
> > +static int __init apic_set_extnmi(char *arg)
> > +{
> > +   if (!arg)
> > +   return -EINVAL;
> > +
> > +   if (strcmp("all", arg) == 0)
> 
>   if (!strncmp("all", arg, 3))
> 

RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-25 Thread / KAWAIHIDEHIRO
> On Wed, Nov 25, 2015 at 09:46:37AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
...
> > I prefer this, but we might want to add some more prefix or suffix.
> > For example, "conditionally_run_crash_nmi_callback".
> 
> That's unnecessary IMO. If you need to express that, you could write
> that in a comment above the function definition. Anyone who looks at
> the code then will know that it is conditional, like so many other
> kernel functions. :)

OK, I'll use a simple one with a comment.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-25 Thread / KAWAIHIDEHIRO
> On Wed, Nov 25, 2015 at 05:51:59AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > `Infinite loop in NMI context' can happen:
> > > >
> > > >   a. when a cpu panics on NMI while another cpu is processing panic
> > > >   b. when a cpu received an external or unknown NMI while another
> > > >  cpu is processing panic on NMI
> > > >
> > > > In the case of a, it loops in panic_smp_self_stop().  In the case
> > > > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> > >
> > > Please describe those two cases more verbosely - it takes slow people
> > > like me a while to figure out what exactly can happen.
> >
> >   a. when a cpu panics on NMI while another cpu is processing panic
> >  Ex.
> >  CPU 0 CPU 1
> >  = =
> >  panic()
> >panic_cpu <-- 0
> >check panic_cpu
> >crash_kexec()
> >receive an unknown NMI
> >unknown_nmi_error()
> >  nmi_panic()
> >panic()
> >  check panic_cpu
> >  panic_smp_self_stop()
> >infinite loop in NMI context
> >
> >   b. when a cpu received an external or unknown NMI while another
> >  cpu is processing panic on NMI
> >  Ex.
> >  CPU 0 CPU 1
> >  ====
> >  receive an unknown NMI
> >  unknown_nmi_error()
> >nmi_panic() receive an unknown NMI
> >  panic_cpu <-- 0   unknown_nmi_error()
> >  panic() nmi_panic()
> >check panic_cpu panic
> >crash_kexec() check panic_cpu
> >  panic_smp_self_stop()
> >infinite loop in NMI context
> 
> Ok, that's what I saw too, thanks for confirming.
> 
> But please write those examples with the *old* code in the commit
> message, i.e. without panic_cpu and nmi_panic() which you're adding.

Does *old* code mean the code without this patch *series* ?
panic_cpu and nmi_panic() are introduced by PATCH 1/4, not this patch.

> Basically, you want to structure your commit message this way:
> 
> This is the problem the current code has: ...
> 
> But we need to do this: ...
> 
> We fix it by doing that: ...

Good suggestion!  I'll revise a bit with following your comment.

> > > > + * directly.  This function is used when we have already been in NMI 
> > > > handler.
> > > > + */
> > > > +void poll_crash_ipi_and_callback(struct pt_regs *regs)
> > >
> > > Why "poll"? We won't return from crash_nmi_callback() if we're not the
> > > crashing CPU.
> >
> > This function polls that crash IPI has been issued by checking
> > crash_ipi_done, then invokes the callback.  This is different
> > from so-called "poll" functions.  Do you have some good name?
> 
> Maybe something as simple as "run_crash_callback"?

I prefer this, but we might want to add some more prefix or suffix.
For example, "conditionally_run_crash_nmi_callback".

> Or since we're calling it from other places, maybe add the "crash"
> prefix:
> 
>   while (!raw_spin_trylock(_reason_lock))
>   crash_run_callback(regs);
> 
> Looks even better to me in code context. :)

Thanks for your deep review!

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread / KAWAIHIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:46PM +0900, Hidehiro Kawai wrote:
> > nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI
> > to non-panic cpus to stop them while saving their register
> 
>...to stop them and save their register...

Thanks for the correction.

> > information and doing some cleanups for crash dumping.  So if a
> > non-panic cpus is infinitely looping in NMI context, we fail to
> 
> That should be CPU. Please use "CPU" instead of "cpu" in all your text
> in your next submission.

OK, I'll fix that.

> > save its register information and lose the information from the
> > crash dump.
> >
> > `Infinite loop in NMI context' can happen:
> >
> >   a. when a cpu panics on NMI while another cpu is processing panic
> >   b. when a cpu received an external or unknown NMI while another
> >  cpu is processing panic on NMI
> >
> > In the case of a, it loops in panic_smp_self_stop().  In the case
> > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> 
> Please describe those two cases more verbosely - it takes slow people
> like me a while to figure out what exactly can happen.

  a. when a cpu panics on NMI while another cpu is processing panic
 Ex.
 CPU 0 CPU 1
 = =
 panic()
   panic_cpu <-- 0
   check panic_cpu
   crash_kexec()
   receive an unknown NMI
   unknown_nmi_error()
 nmi_panic()
   panic()
 check panic_cpu
 panic_smp_self_stop()
   infinite loop in NMI context

  b. when a cpu received an external or unknown NMI while another
 cpu is processing panic on NMI
 Ex.
 CPU 0 CPU 1
 ====
 receive an unknown NMI
 unknown_nmi_error()
   nmi_panic() receive an unknown NMI
 panic_cpu <-- 0   unknown_nmi_error()
 panic() nmi_panic()
   check panic_cpu panic
   crash_kexec() check panic_cpu
 panic_smp_self_stop()
   infinite loop in NMI context
 
> > This can
> > happen on some servers which broadcasts NMIs to all CPUs when a dump
> > button is pushed.
> >
> > To save registers in these case too, this patch does following things:
> >
> > 1. Move the timing of `infinite loop in NMI context' (actually
> >done by panic_smp_self_stop()) outside of panic() to enable us to
> >refer pt_regs
> 
> I can't parse that sentence. And I really tried :-\
> panic_smp_self_stop() is still in panic().

panic_smp_self_stop() is still used when a CPU in normal context
should go into infinite loop.  Only when a CPU is in NMI context,
nmi_panic_self_stop() is called and the CPU loops infinitely
without entering panic().

I'll try to revise this sentense.

> > 2. call a callback of nmi_shootdown_cpus() directly to save
> >registers and do some cleanups after setting waiting_for_crash_ipi
> >which is used for counting down the number of cpus which handled
> >the callback
> >
> > V5:
> > - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the
> >   compiler doesn't change the instruction order
> > - Support the case of b in the above description
> > - Add poll_crash_ipi_and_callback()
> >
> > V4:
> > - Rewrite the patch description
> >
> > V3:
> > - Newly introduced
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Peter Zijlstra 
> > Cc: Eric Biederman 
> > Cc: Vivek Goyal 
> > Cc: Michal Hocko 
> > ---
> >  arch/x86/include/asm/reboot.h |1 +
> >  arch/x86/kernel/nmi.c |   17 +
> >  arch/x86/kernel/reboot.c  |   28 
> >  include/linux/kernel.h|   12 ++--
> >  kernel/panic.c|   10 ++
> >  kernel/watchdog.c |2 +-
> >  6 files changed, 63 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> > index a82c4f1..964e82f 100644
> > --- a/arch/x86/include/asm/reboot.h
> > +++ b/arch/x86/include/asm/reboot.h
> > @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
> >
> >  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
> >  void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs);
> >
> >  #endif /* _ASM_X86_REBOOT_H */
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c

RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread / KAWAIHIDEHIRO
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> >
> > > +  */
> > > + while (!raw_spin_trylock(_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box. But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. 

OK, I'll add more comments around this.

> There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

I'll integrate these SMP and UP versions with a comment about
that.

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-11-24 Thread / KAWAIHIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
> > Currently, panic() and crash_kexec() can be called at the same time.
> > For example (x86 case):
> >
> > CPU 0:
> >   oops_end()
> > crash_kexec()
> >   mutex_trylock() // acquired
> > nmi_shootdown_cpus() // stop other cpus
> >
> > CPU 1:
> >   panic()
> > crash_kexec()
> >   mutex_trylock() // failed to acquire
> > smp_send_stop() // stop other cpus
> > infinite loop
> >
> > If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
> > fails.
> 
> So the smp_send_stop() stops CPU 0 from calling nmi_shootdown_cpus(), right?

Yes, but the important thing is that CPU 1 stops CPU 0 which is
only CPU processing crash_ kexec routines.

> >
> > In another case:
> >
> > CPU 0:
> >   oops_end()
> > crash_kexec()
> >   mutex_trylock() // acquired
> > 
> > io_check_error()
> >   panic()
> > crash_kexec()
> >   mutex_trylock() // failed to acquire
> > infinite loop
> >
> > Clearly, this is an undesirable result.
> 
> I'm trying to see how this patch fixes this case.
> 
> >
> > To fix this problem, this patch changes crash_kexec() to exclude
> > others by using atomic_t panic_cpu.
> >
> > V5:
> > - Add missing dummy __crash_kexec() for !CONFIG_KEXEC_CORE case
> > - Replace atomic_xchg() with atomic_set() in crash_kexec() because
> >   it is used as a release operation and there is no need of memory
> >   barrier effect.  This change also removes an unused value warning
> >
> > V4:
> > - Use new __crash_kexec(), no exclusion check version of crash_kexec(),
> >   instead of checking if panic_cpu is the current cpu or not
> >
> > V2:
> > - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
> >   to exclude concurrent accesses
> > - Don't introduce no-lock version of crash_kexec()
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Eric Biederman 
> > Cc: Vivek Goyal 
> > Cc: Andrew Morton 
> > Cc: Michal Hocko 
> > ---
> >  include/linux/kexec.h |2 ++
> >  kernel/kexec_core.c   |   26 +-
> >  kernel/panic.c|4 ++--
> >  3 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > index d140b1e..7b68d27 100644
> > --- a/include/linux/kexec.h
> > +++ b/include/linux/kexec.h
> > @@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage 
> > *image,
> >   unsigned int size, bool get_value);
> >  extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
> >  const char *name);
> > +extern void __crash_kexec(struct pt_regs *);
> >  extern void crash_kexec(struct pt_regs *);
> >  int kexec_should_crash(struct task_struct *);
> >  void crash_save_cpu(struct pt_regs *regs, int cpu);
> > @@ -332,6 +333,7 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr 
> > *ehdr, Elf_Shdr *sechdrs,
> >  #else /* !CONFIG_KEXEC_CORE */
> >  struct pt_regs;
> >  struct task_struct;
> > +static inline void __crash_kexec(struct pt_regs *regs) { }
> >  static inline void crash_kexec(struct pt_regs *regs) { }
> >  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
> >  #define kexec_in_progress false
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index 11b64a6..9d097f5 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -853,7 +853,8 @@ struct kimage *kexec_image;
> >  struct kimage *kexec_crash_image;
> >  int kexec_load_disabled;
> >
> > -void crash_kexec(struct pt_regs *regs)
> > +/* No panic_cpu check version of crash_kexec */
> > +void __crash_kexec(struct pt_regs *regs)
> >  {
> > /* Take the kexec_mutex here to prevent sys_kexec_load
> >  * running on one cpu from replacing the crash kernel
> > @@ -876,6 +877,29 @@ void crash_kexec(struct pt_regs *regs)
> > }
> >  }
> >
> > +void crash_kexec(struct pt_regs *regs)
> > +{
> > +   int old_cpu, this_cpu;
> > +
> > +   /*
> > +* Only one CPU is allowed to execute the crash_kexec() code as with
> > +* panic().  Otherwise parallel calls of panic() and crash_kexec()
> > +* may stop each other.  To exclude them, we use panic_cpu here too.
> > +*/
> > +   this_cpu = raw_smp_processor_id();
> > +   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > +   if (old_cpu == -1) {
> > +   /* This is the 1st CPU which comes here, so go ahead. */
> > +   __crash_kexec(regs);
> > +
> > +   /*
> > +* Reset panic_cpu to allow another panic()/crash_kexec()
> > +* call.
> > +*/
> > +   atomic_set(_cpu, -1);
> > +   }
> > +}
> > +
> >  size_t crash_get_memory_size(void)
> >  {
> > size_t size = 0;
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 

RE: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-23 Thread / KAWAIHIDEHIRO
Hi,

> On Fri, Nov 20, 2015 at 06:36:44PM +0900, Hidehiro Kawai wrote:
> > If panic on NMI happens just after panic() on the same CPU, panic()
> > is recursively called.  As the result, it stalls after failing to
> > acquire panic_lock.
> >
> > To avoid this problem, don't call panic() in NMI context if
> > we've already entered panic().
> >
> > V4:
> > - Improve comments in io_check_error() and panic()
> >
> > V3:
> > - Introduce nmi_panic() macro to reduce code duplication
> > - In the case of panic on NMI, don't return from NMI handlers
> >   if another cpu already panicked
> >
> > V2:
> > - Use atomic_cmpxchg() instead of current spin_trylock() to
> >   exclude concurrent accesses to the panic routines
> > - Don't introduce no-lock version of panic()
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Peter Zijlstra 
> > Cc: Michal Hocko 
> > ---
> >  arch/x86/kernel/nmi.c  |   16 
> >  include/linux/kernel.h |   13 +
> >  kernel/panic.c |   15 ---
> >  kernel/watchdog.c  |2 +-
> >  4 files changed, 38 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 697f90d..5131714 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs 
> > *regs)
> >  #endif
> >
> > if (panic_on_unrecovered_nmi)
> > -   panic("NMI: Not continuing");
> > +   nmi_panic("NMI: Not continuing");
> >
> > pr_emerg("Dazed and confused, but trying to continue\n");
> >
> > @@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs 
> > *regs)
> >  reason, smp_processor_id());
> > show_regs(regs);
> >
> > -   if (panic_on_io_nmi)
> > -   panic("NMI IOCK error: Not continuing");
> > +   if (panic_on_io_nmi) {
> > +   nmi_panic("NMI IOCK error: Not continuing");
> 
> Btw, that panic_on_io_nmi seems undocumented in
> Documentation/sysctl/kernel.txt. Care to document it, please, as a
> separate patch?

Sure. I'll post a documentation patch for it in a separate patch.
Because panic_on_io_nmi has been available since relatively old days,
I didn't check it. 

> > +
> > +   /*
> > +* If we return from nmi_panic(), it means we have received
> > +* NMI while processing panic().  So, simply return without
> > +* a delay and re-enabling NMI.
> > +*/
> > +   return;
> > +   }
> >
> > /* Re-enable the IOCK line, wait for a few seconds */
> > reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> > @@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs 
> > *regs)
> >
> > pr_emerg("Do you have a strange power saving mode enabled?\n");
> > if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> > -   panic("NMI: Not continuing");
> > +   nmi_panic("NMI: Not continuing");
> >
> > pr_emerg("Dazed and confused, but trying to continue\n");
> >  }
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 350dfb0..480a4fd 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -445,6 +445,19 @@ extern int sysctl_panic_on_stackoverflow;
> >
> >  extern bool crash_kexec_post_notifiers;
> >
> > +extern atomic_t panic_cpu;
> 
> This needs a comment explaining what this variable is and what it
> denotes.

OK, I'll do that in the next version.

> 
> > +
> > +/*
> > + * A variant of panic() called from NMI context.
> > + * If we've already panicked on this cpu, return from here.
> > + */
> > +#define nmi_panic(fmt, ...)
> > \
> > +   do {\
> > +   int this_cpu = raw_smp_processor_id();  \
> > +   if (atomic_cmpxchg(_cpu, -1, this_cpu) != this_cpu) \
> > +   panic(fmt, ##__VA_ARGS__);  \
> > +   } while (0)
> > +
> >  /*
> >   * Only to be used by arch init code. If the user over-wrote the default
> >   * CONFIG_PANIC_TIMEOUT, honor it.
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 4579dbb..24ee2ea 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -61,6 +61,8 @@ void __weak panic_smp_self_stop(void)
> > cpu_relax();
> >  }
> >
> > +atomic_t panic_cpu = ATOMIC_INIT(-1);
> > +
> >  /**
> >   * panic - halt the system
> >   * @fmt: The text string to print
> > @@ -71,17 +73,17 @@ void __weak panic_smp_self_stop(void)
> >   */
> >  void panic(const char *fmt, ...)
> >  {
> > -   static DEFINE_SPINLOCK(panic_lock);
> > static char buf[1024];
> > va_list args;
> > long i, 

RE: Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-27 Thread / KAWAIHIDEHIRO
Hi,

> I just have a look at this thread. I am wondering why we don't use
> existing is_kdump_kernel() directly to disable external NMI if it's
> in kdump kernel. Then no need to introduce another boot option "noextnmi"
> which is used only for kdump kernel.

As I stated in another mail, there is a case where we don't want to
mask external NMIs in the second kernel.  So, we need some
configurable way.  Please see the following quotation.

> We souldn't enable this feature silently.  Some users wouldn't like
> to enable this feature.  For example, a user enables a watchdog timer
> which raises an external NMI when the counter is not reset for a
> specific duration.  Then, the second kernel hangs up while saving
> crash dump, and NMI is delivered to the CPU.  The kernel gets panic
> due to the NMI, prints some information to the display and serial
> console, and then automatically reboot.  In this case, users don't
> want to block external NMIs.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-15 Thread / KAWAIHIDEHIRO
> * Thomas Gleixner <t...@linutronix.de> wrote:
> 
> > Borislav,
> >
> > On Mon, 5 Oct 2015, Borislav Petkov wrote:
> > > On Mon, Oct 05, 2015 at 02:03:58AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > That's different from my point of view.  I'm not going to pass
> > > > some data from the first kernel to the second kernel. I'm just going to
> > > > provide a configurable option for the second kernel to users.
> > >
> > > Dude, WTF?! You're adding a kernel command line which is supposed to
> > > be used *only* by the kdump kernel. But nooo, it is there in the open
> > > and visible to people. And anyone can type it in during boot. AND THAT
> > > SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> > >
> > > This information is strictly for the kdump kernel - it shouldn't be a
> > > generic command line option. How hard it is to understand that simple
> > > fact?!
> >
> > Calm down!
> >
> > Disabling that NMI in the first kernel is not going to make the world
> > explode. We have enough command line options a user can type in, which
> > are way worse than that one. If some "expert" types nonsense on the
> > first kernel command line, then it's none of our problems, really.
> >
> > If Kawai would have marked that option as a debug feature, this
> > discussion would have probably never happened.
> >
> > Aside of that, the best way to hand in options for the kdump kernel is
> > the command line. We have an existing interface for that.
> >
> > Let's move on. Nothing to see here.
> 
> So Boris kind of has a point: there are numerous problems with boot options as
> kexec parameter interface:
> 
>  - boot option strings are not a well defined programmatic interface:
> - failures are not obvious (they are often ignored)
> - inserting/setting parameters is awkward as well.
> 
>  - boot options are not an ABI, so when options have dual use with kexec it's 
> easy
>to break things inadvertently: without that failure being apparent other 
> than
>reintroducing obscure kexec failure modes again.
> 
>  - in the booted up kexec kernel it's not really obvious which options got 
> set by
>kexec and which got set by the user - as there's no separation of 
> namespaces.
> 
>  - likewise, if the user specifies an option in conflict with a kexec 
> requirement
>it's not really obvious what's happening: the user setting should probably
>dominate - I'm not sure that's the case with the current kexec code.

Thanks for the detailed explanation.  I can understand the disadvantages
of using boot option.  So these are essential problems with boot options
rather than new boot option added for kexec'ed kernel.
 
> Boot options are basically a user interface.
> 
> On the other hand the hack of (ab-)using boot parameters as kexec parameter
> passing is an existing, many years old practice and we cannot just stop it 
> without
> offering an alternative (and better!) interface first.
> 
> We could improve things by either adding a separate kexec-only parameter 
> passing
> facility (like programmatic boot parameters are) - or by creating some sort of
> boot parameter alias that clearly identifies kexec parameters.
> 
> So for example when introducing 'noextnmi' we'd also add a facility to add a
> 'kexec_noextnmi' alias - which clearly identifies this boot parameter as a 
> kexec
> related one.
> 
> Every 'kexec' inserted parameter would be prefixed by kexec_ - and then the
> separation of namespaces (and the prioritization of user vs. kexec 
> requirements)
> becomes well defined as well..
> 
> We should also probably print a warning if a kexec_* parameter is passed in 
> that
> has no matching handler in the kexec()-ed kernel.

It would be reasonable.  Or we might improve kexec command so that
it removes conflict boot options with warnings.

As I stated in another mail, I'm going to change "noextnmi" to
"apic_extnmi={bsp|all|none}", which can be used both the first and
second kernels.  So, I'll add this option without "kexec_" prefix
at this point.

> I do concur that this patch is probably OK given existing practices.

Thanks!

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-15 Thread / KAWAIHIDEHIRO
> > By the way, I have a pending patch which expands this option like
> > this:
> >
> > apic_extnmi={ bsp | all | none }
> >
> > If apic_extnmi=all is specified, external NMIs are broadcast to
> > all CPUs.  This raises the successful rate of kernel panic in the case
> > where an external NMI to CPU 0 is swallowed by other NMI handlers or
> > blocked due to hang-up in NMI context.  The patch works without any
> > problems, but I'm going to drop the feature if it will cause long
> > discussion.  I'd like to settle this patch set down once.  At least,
> > I'm going to change this option to apic_extnmi={bsp|none} style for
> > the future expansion.
> >
> > How do you think about this?
> 
> Do it right away with all three variants. They make a lot of sense to
> me.

OK. I'll do that.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-13 Thread / KAWAIHIDEHIRO
> On Fri, 25 Sep 2015, Hidehiro Kawai wrote:
> 
> > This patch introduces new boot option "noextnmi" which disables
> > external NMI.  This option is useful for the dump capture kernel
> > so that an HA application or administrator wouldn't mistakenly
> > shoot down the kernel by NMI.
> >
> > Currently, only x86 supports this option.
> 
> You might add that is can be used for debugging purposes as
> well. External NMIs can be their own source of trouble. :)

Thanks for your comments!  I'll do that.

By the way, I have a pending patch which expands this option like
this:

apic_extnmi={ bsp | all | none }

If apic_extnmi=all is specified, external NMIs are broadcast to
all CPUs.  This raises the successful rate of kernel panic in the case
where an external NMI to CPU 0 is swallowed by other NMI handlers or
blocked due to hang-up in NMI context.  The patch works without any
problems, but I'm going to drop the feature if it will cause long
discussion.  I'd like to settle this patch set down once.  At least,
I'm going to change this option to apic_extnmi={bsp|none} style for
the future expansion.

How do you think about this?

> > Signed-off-by: Hidehiro Kawai 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Jonathan Corbet 
> > ---
> >  Documentation/kernel-parameters.txt |4 
> >  arch/x86/kernel/apic/apic.c |   17 -
> >  2 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index 22a4b68..8bcaccd 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2379,6 +2379,10 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > noexec=on: enable non-executable mappings (default)
> > noexec=off: disable non-executable mappings
> >
> > +   noextnmi[X86]
> > +   Mask external NMI.  This option is useful for a
> > +   dump capture kernel to be shot down by NMI.
> 
> That should read: "...not to be shot down", right?

Yes, you are right.  I'll fix it.

> Other than that.
> 
> Acked-by: Thomas Gleixner 

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-13 Thread / KAWAIHIDEHIRO
Hello, Boris

Sorry for the late reply.

> On Mon, Oct 05, 2015 at 09:21:02AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > So, the problem for you is that "noextnmi" option is visible and effective
> > in the first kernel, isn't it?
> 
> No, such an option shouldn't exist at all. You should be passing
> information *in* *a* *different* *manner* to the kdump kernel - not with
> a kernel command line option.

Sorry, I couldn't find out the reason why I shouldn't use cmdline option.
It doesn't need new user I/F to inform the 1st kernel about masking/unmasking
external NMI in the 2nd kernel, doesn't need new data passing infrastructure,
and is easy to configure that.  Also, "elfcorehdr" and "disable_cpu_apicid"
have already been introduced as cmdline options for dump capture kernel.
This means the cmdline option approach would be mostly acceptable.

> I get the feeling I'm starting to sound like a broken record on this
> mail thread... :-(
> 
> One other thing we could probably try to do is use boot_params which
> is, IIUC, passed to the second kernel. So we can add another bit to
> boot_params.hdr.loadflags or so and use that. Or something similar.

I think using boot_params would be worse than ELF header approach.
It needs to reserve boot_params bits for all boot loaders.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-05 Thread / KAWAIHIDEHIRO
> On Mon, Oct 05, 2015 at 02:03:58AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > That's different from my point of view.  I'm not going to pass
> > some data from the first kernel to the second kernel. I'm just going to
> > provide a configurable option for the second kernel to users.
> 
> Dude, WTF?! You're adding a kernel command line which is supposed to
> be used *only* by the kdump kernel. But nooo, it is there in the open
> and visible to people. And anyone can type it in during boot. AND THAT
> SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> 
> This information is strictly for the kdump kernel - it shouldn't be a
> generic command line option. How hard it is to understand that simple
> fact?!

So, the problem for you is that "noextnmi" option is visible and effective
in the first kernel, isn't it?  If so, we can ignore "noextnmi" option
if we are in the first kernel and remove it from the documentation.
"elfcorehdr" cmdline option prepared by kexec command is passed to only
the second kernel, and it is also used to check if the booted kernel is
a kdump kernel.  Thus, if "elfcorehdr" is NOT specified, then ignore
"noextnmi".

Documentation/kernel-parameters.txt:
> elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390]
> Specifies physical address of start of kernel core
> image elf header and optionally the size. Generally
> kexec loader will pass this option to capture kernel.
> See Documentation/kdump/kdump.txt for details.

> 
> 
> > I think we should use the ELF header only if the passed information
> > is saved to a crash dump.
> 
> So what?! ELF header will contain the additional bit of information that
> the second kernel wasn't reacting to NMIs. But that's fine, that *is*
> the desired behavior anyway.
> 
> All I'm saying is, this is a strict kdump kernel "command", so to speak,
> and it doesn't belong with the generic kernel command line parameters.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-04 Thread / KAWAIHIDEHIRO
> On Fri, Oct 02, 2015 at 12:58:02AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Thu, Oct 01, 2015 at 10:24:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > But how do we check if the starting kernel is a dump capture kernel?
> > >
> > > How does that first kernel pass info to the capture kernel?
> >
> > As I described in the previous mail,
> 
> I meant: "How does the first kernel pass info to the capture kernel by
> *not* using the kernel command line"?
> 
> The kernel command line is not the channel to pass data to the kdump
> kernel.

That's different from my point of view.  I'm not going to pass
some data from the first kernel to the second kernel. I'm just going to
provide a configurable option for the second kernel to users.

We souldn't enable this feature silently.  Some users wouldn't like
to enable this feature.  For example, a user enables a watchdog timer
which raises an external NMI when the counter is not reset for a
specific duration.  Then, the second kernel hangs up while saving
crash dump, and NMI is delivered to the CPU.  The kernel gets panic
due to the NMI, prints some information to the display and serial
console, and then automatically reboot.  In this case, users don't
want to block external NMIs.

So, making this feature configurable by command line option is
reasonable.

> > Yes, your first kernel doesn't get external NMIs, but basically
> > you don't have to set "noextnmi" option to the first kernel.
> 
> So it doesn't belong there as a kernel command line parameter in the
> first place.
> 
> IOW, you need a different method to pass data to the second kernel. Be
> it an ELF header, be it a shared page, whatever.

I think we should use the ELF header only if the passed information
is saved to a crash dump.

Also, we wouldn't want to introduce new shared page for that purpose.
A memory segment provided by kexec syscall is not usable because
the second kernel doesn't know what there is in a segment without a
command line option.  Please note that "elfcorehdr" command line option
prepared by kexec command is used to inform the second kernel about
the address of the ELF header memory segment.

Regards,


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-01 Thread / KAWAIHIDEHIRO
> On Thu, Oct 01, 2015 at 07:01:50AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > I suppose that a sever which uses this feature will equip a BMC
> > and BMC mandatorily supports hard reset command for the server.
> > If the HA clustering software detects no response from the server
> > after relatively long timeout, it might want to insert hard reset
> > to the server by IPMI over LAN.
> 
> So why doesn't the capture kernel *automatically* ignore external NMIs,
> without a cmdline option?
> 
> Before it starts capturing, it says: "I'm starting capturing and am
> ignoring external NMIs from now on."

But how do we check if the starting kernel is a dump capture kernel?
I think using cmdline option is the simplest way.  You just have to
add "noextnmi" to KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-01 Thread / KAWAIHIDEHIRO
> On Thu, Oct 01, 2015 at 02:33:18AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > > > This patch introduces new boot option "noextnmi" which disables
> > > > external NMI.  This option is useful for the dump capture kernel
> > > > so that an HA application or administrator wouldn't mistakenly
> > > > shoot down the kernel by NMI.
> > >
> > > So that they can get really stuck when the crash kernel crashes, right?
> > > ;-)
> >
> > No, it is different from my intention.
> >
> > `mistakenly' in the above means; they issue NMI due to a misconception
> > that the monitored host is stuck in the 1st kernel while it is actually
> > taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
> > there is a tool such as fence_kdump which notifies "I'm taking a crash
> > dump, so don't send NMI" to the HA clustering software.  However, there
> > is a time window between kernel panic and the notification.
> >
> > "noextnmi" allows users to avoid this kind of accident all the time of
> > 2nd kernel.
> 
> Yes yes, I understand. But if the crash kernel also gets stuck they have
> no means of recovery, right? (other than power cycling the hardware)

Yes, but I think it's not a big problem.

I suppose that a sever which uses this feature will equip a BMC
and BMC mandatorily supports hard reset command for the server.
If the HA clustering software detects no response from the server
after relatively long timeout, it might want to insert hard reset
to the server by IPMI over LAN.

> Just playing devils advocate here, I don't actually object to the patch.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-01 Thread / KAWAIHIDEHIRO
> On Thu, Oct 01, 2015 at 10:24:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > But how do we check if the starting kernel is a dump capture kernel?
> 
> How does that first kernel pass info to the capture kernel?

As I described in the previous mail, You just have to add "noextnmi"
to KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump.  Then, "noextnmi"
option is passed to the capture kernel by the action of kexec command.

Cmdline option gives users flexibility.  I'm not sure all users
want to disable external NMIs in the 2nd kernel.

> > I think using cmdline option is the simplest way.
> 
> More often than not, simplest != correct.
> 
> What happens if I pass this option to the first kernel? All of a sudden
> my *first* kernel doesn't get external NMIs.

Yes, your first kernel doesn't get external NMIs, but basically
you don't have to set "noextnmi" option to the first kernel.


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-09-30 Thread / KAWAIHIDEHIRO
> On Fri, Sep 25, 2015 at 08:28:07PM +0900, Hidehiro Kawai wrote:
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -718,6 +718,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
> >  static nmi_shootdown_cb shootdown_callback;
> >
> >  static atomic_t waiting_for_crash_ipi;
> > +static int crash_ipi_done;
> >
> >  static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
> >  {
> > @@ -779,6 +780,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> > wmb();
> >
> > smp_send_nmi_allbutself();
> > +   crash_ipi_done = 1; /* Kick cpus looping in nmi context */
> 
> I would suggest using WRITE_ONCE() for that, because without the
> volatile the compiler need not actually emit the store until after the
> whole waiting thing _IF_ it can inline the whole thing.
> 
> Currently udelay() will end up being a function call and will therefore
> force the store to be emitted, but I'd rather not rely on that.

OK, I use WRITE_ONCE().
Thanks!

> >
> > msecs = 1000; /* Wait at most a second for the other cpus to stop */
> > while ((atomic_read(_for_crash_ipi) > 0) && msecs) {


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-09-30 Thread / KAWAIHIDEHIRO
> On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > This patch introduces new boot option "noextnmi" which disables
> > external NMI.  This option is useful for the dump capture kernel
> > so that an HA application or administrator wouldn't mistakenly
> > shoot down the kernel by NMI.
> 
> So that they can get really stuck when the crash kernel crashes, right?
> ;-)

No, it is different from my intention.

`mistakenly' in the above means; they issue NMI due to a misconception
that the monitored host is stuck in the 1st kernel while it is actually
taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
there is a tool such as fence_kdump which notifies "I'm taking a crash
dump, so don't send NMI" to the HA clustering software.  However, there
is a time window between kernel panic and the notification.

"noextnmi" allows users to avoid this kind of accident all the time of
2nd kernel.

Regards,


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-09-30 Thread / KAWAIHIDEHIRO
> On Fri, Sep 25, 2015 at 12:13:55PM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
> > previoius discussion, but it still exists.  So, I didn't change
> > anything about panic_on_unrecovered_nmi.
> >
> 
> > > --- a/arch/x86/kernel/nmi.c
> > > +++ b/arch/x86/kernel/nmi.c
> > > @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const 
> > > char *name)
> > >  #endif
> > >
> > >   if (panic_on_unrecovered_nmi)
> > > - panic("NMI: Not continuing");
> > > + nmi_panic("NMI: Not continuing");
> > >
> > >   pr_emerg("Dazed and confused, but trying to continue\n");
> > >
> 
> I was looking at unregister_nmi_handler() because that's the function
> the diff points to. That still doesn't have panic_on_unrecovered_nmi.
> 
> It looks like your diff tool is 'broken' and generates nonsense function
> data.

I had noticed the function name is wrong, but I didn't know how
do I fix that, sorry.  Now, I updated git to the latest version
and the issue disappeared.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-30 Thread / KAWAIHIDEHIRO
> On Mon, Sep 28, 2015 at 07:08:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
> > >   atomic_xchg(_cpu, -1);
> > >   ^
> >
> > I changed to use atomic_xchg() instead of atomic_set() in V3
> > because atomic_set() doesn't mean memory barrier.  However,
> > I thought again and there is no need of barrier; there is no
> > problem if a competitor sees old value of panic_cpu or new one.
> > So, atomic_set() is sufficient and using it will remove this warning.
> >
> > I will resend the fixed version later.
> 
> So if you rely on the memory barrier; you should have also put a comment
> on explaining the ordering requirements.

I don't intend to use an explicit memory barrier.  There is no
memory ordering requirement here.  Also, atomic_set() which will be
used instead of atomic_xchg() is used as a RELEASE operation, so
I believe there is no problem.

Documentation/memory-barriers.txt:
> The following operations are potential problems as they do _not_ imply memory
> barriers, but might be used for implementing such things as RELEASE-class
> operations:
> 
> atomic_set();
> ...

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-28 Thread / KAWAIHIDEHIRO
> Hi Hidehiro,
> 
> [auto build test results on v4.3-rc2 -- if it's inappropriate base, please 
> ignore]
> 
> config: ia64-allyesconfig (attached as .config)
> reproduce:
>   wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout 0077681103150af584e5e592c0238fd010654c26
>   # save the attached .config to linux build tree
>   make.cross ARCH=ia64
[snip]
>arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is not 
> used [-Wunused-value]
> ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr
>  ^
>arch/ia64/include/asm/atomic.h:135:30: note: in expansion of macro 'xchg'
> #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
>  ^
> >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
>   atomic_xchg(_cpu, -1);
>   ^

I changed to use atomic_xchg() instead of atomic_set() in V3
because atomic_set() doesn't mean memory barrier.  However,
I thought again and there is no need of barrier; there is no
problem if a competitor sees old value of panic_cpu or new one.
So, atomic_set() is sufficient and using it will remove this warning.

I will resend the fixed version later.

> vim +/atomic_xchg +899 kernel/kexec_core.c
> 
>883
>884/*
>885 * Only one CPU is allowed to execute the crash_kexec() 
> code as with
>886 * panic().  Otherwise parallel calls of panic() and 
> crash_kexec()
>887 * may stop each other.  To exclude them, we use 
> panic_cpu here too.
>888 */
>889this_cpu = raw_smp_processor_id();
>890old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
>891if (old_cpu == -1) {
>892/* This is the 1st CPU which comes here, so go 
> ahead. */
>893__crash_kexec(regs);
>894
>895/*
>896 * Reset panic_cpu to allow another 
> panic()/crash_kexec()
>897 * call.
>898 */
>  > 899atomic_xchg(_cpu, -1);
>900}
>901}
>902
>903size_t crash_get_memory_size(void)
>904{
>905size_t size = 0;
>906
>907mutex_lock(_mutex);
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-27 Thread / KAWAIHIDEHIRO
> Hi Hidehiro,
> 
> [auto build test results on v4.3-rc2 -- if it's inappropriate base, please 
> ignore]
> 
> config: x86_64-allnoconfig (attached as .config)
> reproduce:
>   git checkout 0077681103150af584e5e592c0238fd010654c26
>   # save the attached .config to linux build tree
>   make ARCH=x86_64
> 
> All error/warnings (new ones prefixed by >>):
> 
>kernel/panic.c: In function 'panic':
> >> kernel/panic.c:140:3: error: implicit declaration of function 
> >> '__crash_kexec' [-Werror=implicit-function-declaration]
>   __crash_kexec(NULL);
>   ^

Sorry, I missed to take into account the case of !CONFIG_KEXEC_CORE.

 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
+static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }

I'll resend the revised version later.

>cc1: some warnings being treated as errors
> 
> vim +/__crash_kexec +140 kernel/panic.c
> 
>134 * If we have crashed and we have a crash kernel loaded 
> let it handle
>135 * everything else.
>136 * If we want to run this after calling 
> panic_notifiers, pass
>137 * the "crash_kexec_post_notifiers" option to the 
> kernel.
>138 */
>139if (!crash_kexec_post_notifiers)
>  > 140__crash_kexec(NULL);
>141
>142/*
>143 * Note smp_send_stop is the usual smp shutdown 
> function, which
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-09-25 Thread / KAWAIHIDEHIRO
Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
previoius discussion, but it still exists.  So, I didn't change
anything about panic_on_unrecovered_nmi.

Thanks,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

> From: Hidehiro Kawai [mailto:hidehiro.kawai...@hitachi.com]
> 
> If panic on NMI happens just after panic() on the same CPU, panic()
> is recursively called.  As the result, it stalls after failing to
> acquire panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if
> we've already entered panic().
> 
> V4:
> - Improve comments in io_check_error() and panic()
> 
> V3:
> - Introduce nmi_panic() macro to reduce code duplication
> - In the case of panic on NMI, don't return from NMI handlers
>   if another cpu already panicked
> 
> V2:
> - Use atomic_cmpxchg() instead of current spin_trylock() to
>   exclude concurrent accesses to the panic routines
> - Don't introduce no-lock version of panic()
> 
> Signed-off-by: Hidehiro Kawai 
> Cc: Andrew Morton 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Peter Zijlstra 
> Cc: Michal Hocko 
> ---
>  arch/x86/kernel/nmi.c  |   16 
>  include/linux/kernel.h |   13 +
>  kernel/panic.c |   15 ---
>  kernel/watchdog.c  |2 +-
>  4 files changed, 38 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 697f90d..5131714 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char 
> *name)
>  #endif
> 
>   if (panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
> 
> @@ -255,8 +255,16 @@ void unregister_nmi_handler(unsigned int type, const 
> char *name)
>reason, smp_processor_id());
>   show_regs(regs);
> 
> - if (panic_on_io_nmi)
> - panic("NMI IOCK error: Not continuing");
> + if (panic_on_io_nmi) {
> + nmi_panic("NMI IOCK error: Not continuing");
> +
> + /*
> +  * If we return from nmi_panic(), it means we have received
> +  * NMI while processing panic().  So, simply return without
> +  * a delay and re-enabling NMI.
> +  */
> + return;
> + }
> 
>   /* Re-enable the IOCK line, wait for a few seconds */
>   reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> @@ -297,7 +305,7 @@ void unregister_nmi_handler(unsigned int type, const char 
> *name)
> 
>   pr_emerg("Do you have a strange power saving mode enabled?\n");
>   if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 5582410..57c33da 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -443,6 +443,19 @@ extern __scanf(2, 0)
> 
>  extern bool crash_kexec_post_notifiers;
> 
> +extern atomic_t panic_cpu;
> +
> +/*
> + * A variant of panic() called from NMI context.
> + * If we've already panicked on this cpu, return from here.
> + */
> +#define nmi_panic(fmt, ...)  \
> + do {\
> + int this_cpu = raw_smp_processor_id();  \
> + if (atomic_cmpxchg(_cpu, -1, this_cpu) != this_cpu) \
> + panic(fmt, ##__VA_ARGS__);  \
> + } while (0)
> +
>  /*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 04e91ff..a105e67 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -60,6 +60,8 @@ void __weak panic_smp_self_stop(void)
>   cpu_relax();
>  }
> 
> +atomic_t panic_cpu = ATOMIC_INIT(-1);
> +
>  /**
>   *   panic - halt the system
>   *   @fmt: The text string to print
> @@ -70,17 +72,17 @@ void __weak panic_smp_self_stop(void)
>   */
>  void panic(const char *fmt, ...)
>  {
> - static DEFINE_SPINLOCK(panic_lock);
>   static char buf[1024];
>   va_list args;
>   long i, i_next = 0;
>   int state = 0;
> + int old_cpu, this_cpu;
> 
>   /*
>* Disable local interrupts. This will prevent panic_smp_self_stop
>* from deadlocking the first cpu that invokes the panic, since
>* there is nothing to prevent an interrupt handler (that runs
> -  * after the panic_lock is acquired) from invoking panic 

RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-31 Thread / KAWAIHIDEHIRO
Hello Peter,

> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of 河合英宏 / KAWAI,
> 
> Hi,
> 
> > From: Peter Zijlstra [mailto:pet...@infradead.org]
> >
> > On Sat, Aug 22, 2015 at 02:35:24AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > From: Peter Zijlstra [mailto:pet...@infradead.org]
> > > >
> > > > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > > > >  void crash_kexec(struct pt_regs *regs)
> > > > >  {
> > > > > + int old_cpu, this_cpu;
> > > > > +
> > > > > + /*
> > > > > +  * `old_cpu == -1' means we are the first comer and 
> > > > > crash_kexec()
> > > > > +  * was called without entering panic().
> > > > > +  * `old_cpu == this_cpu' means crash_kexec() was called from 
> > > > > panic().
> > > > > +  */
> > > > > + this_cpu = raw_smp_processor_id();
> > > > > + old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > > > > + if (old_cpu != -1 && old_cpu != this_cpu)
> > > > > + return;
> > > >
> > > > This allows recursive calling of crash_kexec(), the Changelog did not
> > > > mention that. Is this really required?
> > >
> > > What part are you arguing?  Recursive call of crash_kexec() doesn't
> > > happen.  In the first place, one of the purpose of this patch is
> > > to prevent a recursive call of crash_kexec() in the following case
> > > as I stated in the description:
> > >
> > > CPU 0:
> > >   oops_end()
> > > crash_kexec()
> > >   mutex_trylock() // acquired
> > > 
> > > io_check_error()
> > >   panic()
> > > crash_kexec()
> > >   mutex_trylock() // failed to acquire
> > > infinite loop
> > >
> >
> > Yes, but what to we want to do there? It seems to me that is wrong, we
> > do not want to let a recursive crash_kexec() proceed.
> >
> > Whereas the condition you created explicitly allows this recursion by
> > virtue of the 'old_cpu != this_cpu' check.
> 
> I understand your question.  I don't intend to permit the recursive
> call of crash_kexec() as for 'old_cpu != this_cpu' check.  That is
> needed for the case of panic() --> crash_kexec().  Since panic_cpu has
> already been set to this_cpu in panic() (please see PATCH 1/4), no one
> can run crash_kexec() without 'old_cpu != this_cpu' check.
> 
> If you don't like this check, I would also be able to handle this case
> like below:
> 
> crash_kexec()
> {
>   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
>   if (old_cpu != -1)
>   return;
> 
>   __crash_kexec();
> }
> 
> panic()
> {
>   atomic_cmpxchg(_cpu, -1, this_cpu);
>   __crash_kexec();
> ...
> 

Is that OK?

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-25 Thread / KAWAIHIDEHIRO
Hi,

 From: Peter Zijlstra [mailto:pet...@infradead.org]
 
 On Sat, Aug 22, 2015 at 02:35:24AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
   From: Peter Zijlstra [mailto:pet...@infradead.org]
  
   On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
 void crash_kexec(struct pt_regs *regs)
 {
+   int old_cpu, this_cpu;
+
+   /*
+* `old_cpu == -1' means we are the first comer and 
crash_kexec()
+* was called without entering panic().
+* `old_cpu == this_cpu' means crash_kexec() was called from 
panic().
+*/
+   this_cpu = raw_smp_processor_id();
+   old_cpu = atomic_cmpxchg(panic_cpu, -1, this_cpu);
+   if (old_cpu != -1  old_cpu != this_cpu)
+   return;
  
   This allows recursive calling of crash_kexec(), the Changelog did not
   mention that. Is this really required?
 
  What part are you arguing?  Recursive call of crash_kexec() doesn't
  happen.  In the first place, one of the purpose of this patch is
  to prevent a recursive call of crash_kexec() in the following case
  as I stated in the description:
 
  CPU 0:
oops_end()
  crash_kexec()
mutex_trylock() // acquired
  NMI
  io_check_error()
panic()
  crash_kexec()
mutex_trylock() // failed to acquire
  infinite loop
 
 
 Yes, but what to we want to do there? It seems to me that is wrong, we
 do not want to let a recursive crash_kexec() proceed.
 
 Whereas the condition you created explicitly allows this recursion by
 virtue of the 'old_cpu != this_cpu' check.

I understand your question.  I don't intend to permit the recursive
call of crash_kexec() as for 'old_cpu != this_cpu' check.  That is
needed for the case of panic() -- crash_kexec().  Since panic_cpu has
already been set to this_cpu in panic() (please see PATCH 1/4), no one
can run crash_kexec() without 'old_cpu != this_cpu' check.

If you don't like this check, I would also be able to handle this case
like below:

crash_kexec()
{
old_cpu = atomic_cmpxchg(panic_cpu, -1, this_cpu);
if (old_cpu != -1)
return;

__crash_kexec();
}

panic()
{
atomic_cmpxchg(panic_cpu, -1, this_cpu);
__crash_kexec();
...


Regards,

Hidehiro Kawai
Hitachi, Ltd. Research  Development Group

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-21 Thread / KAWAIHIDEHIRO
 From: Peter Zijlstra [mailto:pet...@infradead.org]
 
 On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
   void crash_kexec(struct pt_regs *regs)
   {
  +   int old_cpu, this_cpu;
  +
  +   /*
  +* `old_cpu == -1' means we are the first comer and crash_kexec()
  +* was called without entering panic().
  +* `old_cpu == this_cpu' means crash_kexec() was called from panic().
  +*/
  +   this_cpu = raw_smp_processor_id();
  +   old_cpu = atomic_cmpxchg(panic_cpu, -1, this_cpu);
  +   if (old_cpu != -1  old_cpu != this_cpu)
  +   return;
 
 This allows recursive calling of crash_kexec(), the Changelog did not
 mention that. Is this really required?

What part are you arguing?  Recursive call of crash_kexec() doesn't
happen.  In the first place, one of the purpose of this patch is
to prevent a recursive call of crash_kexec() in the following case
as I stated in the description:

CPU 0:
  oops_end()
crash_kexec()
  mutex_trylock() // acquired
NMI
io_check_error()
  panic()
crash_kexec()
  mutex_trylock() // failed to acquire
infinite loop


Also, the logic doesn't change form V1 (although the implementation
changed), so I didn't add changelogs any more.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research  Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

2015-08-21 Thread / KAWAIHIDEHIRO
 From: Peter Zijlstra [mailto:pet...@infradead.org]
 User-Agent: StGit/0.16
 
 Fwiw, stgit is broken wrt sending email, all your emails have the exact
 same timestamp, which means that the emails will be ordered on received
 timestamp when threaded and generate the below mess:

Sorry for the inconvenience.  I'll try to find some workaround.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research  Development Group

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V3 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-08-21 Thread / KAWAIHIDEHIRO
 From: Peter Zijlstra [mailto:pet...@infradead.org]
 
 On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
  diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
  index d05bd2e..dcd4038 100644
  --- a/arch/x86/kernel/nmi.c
  +++ b/arch/x86/kernel/nmi.c
  @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const 
  char *name)
   #endif
 
  if (panic_on_unrecovered_nmi)
  -   panic(NMI: Not continuing);
  +   nmi_panic(NMI: Not continuing);
 
  pr_emerg(Dazed and confused, but trying to continue\n);
 
 
 What tree is this again.. my tree (-tip) doesn't have
 panic_on_unrecovered_nmi nonsense.

I created this patch set on top of the v4.2-rc5, but this is
mostly related to x86 and should be based on -tip tree.
I'll do rebase in the next version.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research  Development Group

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-08-21 Thread / KAWAIHIDEHIRO
 From: Peter Zijlstra [mailto:pet...@infradead.org]
 
 On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
  When cpu-A panics on NMI just after cpu-B has panicked, cpu-A loops
  infinitely in NMI context.  Especially for x86, cpu-B issues NMI IPI
  to other cpus to save their register states and do some cleanups if
  kdump is enabled, but cpu-A can't handle the NMI and fails to save
  register states.
 
  To solve thie issue, we wait for the timing of the NMI IPI, then
  call the NMI handler which saves register states.
 
 Sorry, I don't follow, what?

First, a subroutine of crash_kexec(), nmi_shootdown_cpus()
send NMI IPI to non-panic cpus to stop them while saving their
registers ans doing some cleanups for crash dumping.  So if a non-panic
cpu is looping in NMI context infinitely at that time, we fail to save
its register information and lose the information from the crash dump.

`Infinite loop in NMI context' can happen when panic on NMI is about
to happen while another cpu has already been processing panic().
To save regs and do some cleanups in that case too, this patch does
two things:

1. Moves the timing of `infinite loop in NMI context' (actually
   panic_smp_self_stop()) outside of panic() to keep the pt_regs object
2. call a callback of nmi_shootdown_cpus() directly to save regs and
   do some cleanups after setting waiting_for_crash_ipi which is used
   for counting down the number of cpus which handled the callback

Does that answer your question?

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research  Development Group

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [RFC V2 PATCH 0/1] kexec: crash_kexec_post_notifiers boot option related fixes

2015-08-06 Thread / KAWAIHIDEHIRO
 From: Eric W. Biederman [mailto:ebied...@xmission.com]
  From: Eric W. Biederman [mailto:ebied...@xmission.com]
  [...]
  A specific hook for a very specific purpose when there is no other way
  we can consider.
 
  So, is kmsg_dump like feature admissible?
 
  If you don't have something that generalises well into a general purpose
  operation that it makes sense for everyone to call you can always use
  the world's largest aka you can run code before the new kernel starts
  that is loaded with kexec_load.
 
  One of our purposes, notifying I'm dying, would be achieved by purgatory
  code provided by kexec command as I stated before.  Since the way of the
  notification will differ from each vendor, I think we need to modify
  the purgatory codes pluggable.  Also, I think we need some parameter
  passing mechanism to the purgatory code.  For example, passing the panic
  message via boot parameter to save it to SEL.  Although I'm not sure
  we can do that (I've not investigated well yet).  Is that acceptable?
 
 I think the address of panic message is available in crash notes.  If
 not that is very reasonable to add.

I believed the boot parameter is prepared by the 1st kernel, but
it's wrong.  The boot parameter is completely provieded kexec command.
So, passing the panic message through boot parameter will not
be feasible.  I'm not sure we can easily access to the crash notes
from purgatory, but I think it's a reasonable way to pass panic message.

 Updating the SEL from purgatory after purgatory has validated the
 checksums of the crash handling code is acceptable.
 
 All that is desired is to run as little code as possible in a kernel
 that is known broken.  Once the checksums have verified things in
 purgatory you should be in good shape, and there is no possibility of
 relying on broken infrastructure because that code simply is not present
 in purgatory.
 
 We already have a few early_printk style drivers in purgatory and I
 don't the code to update the SEL would be much worse.

For developers, early_printk style feature will be better solution.
For end users, however, it will not be true.  Sometimes they cannot
use a serial port for early_printk because the serial port is used
for other purpose.  Sometimes they cannot place additional machine
which receives messages from the serial port.  So we need some
plugin or enable/disable mechanism for specific purgatory code.

 On the flip side there are enough firmware bugs that I personally would
 not want to rely on firmware code running properly when the machine is
 in a known broken state, so I don't want the SEL update to be
 unconditional.

Yes, I don't also trust BMC firmware.  The most simple I/F to BMC
is KCS (Keyboard Controller Style) I/F which is accessible via
two I/O ports.  If BMC becomes insane, the state machine for the I/F
can go into infinite loop.  However, we can avoid this by introducing
proper timeout.  Of course, I think we should add some enable/disable
mechanism.


Regards,
Kawai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [RFC V2 PATCH 0/1] kexec: crash_kexec_post_notifiers boot option related fixes

2015-08-04 Thread / KAWAIHIDEHIRO
Hello,

Thanks for the reply.

 From: Eric W. Biederman [mailto:ebied...@xmission.com]
[...]
 A specific hook for a very specific purpose when there is no other way
 we can consider.

So, is kmsg_dump like feature admissible?

 If you don't have something that generalises well into a general purpose
 operation that it makes sense for everyone to call you can always use
 the world's largest aka you can run code before the new kernel starts
 that is loaded with kexec_load.

One of our purposes, notifying I'm dying, would be achieved by purgatory
code provided by kexec command as I stated before.  Since the way of the
notification will differ from each vendor, I think we need to modify
the purgatory codes pluggable.  Also, I think we need some parameter
passing mechanism to the purgatory code.  For example, passing the panic
message via boot parameter to save it to SEL.  Although I'm not sure
we can do that (I've not investigated well yet).  Is that acceptable?

Regards,
Kawai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-08-04 Thread / KAWAIHIDEHIRO
Hi,

 From: Michal Hocko [mailto:mho...@kernel.org]
 On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote:
   From: Michal Hocko [mailto:mho...@kernel.org]
There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
why CPU 130 waits so long.  I'll try to consider for a while.
  
   Yes, I do not understand the timing here either and the fact that the
   log is a complete mess in the important parts doesn't help a wee bit.
 
  I'm interested in where kernel panic -not syncing:  is.
  It may give us a clue.
 
 This one is lost in the mangled text:
 [  167.843771] U0[  167.843771] hhuh. NMI received for unkn00[  
 167.843765] Uh[  16NM843774I own rea reived for
 unknow0 r  16n 2d 765] Uhhuh. CPU recei11. 0known reason 7. on770] Ker[ - 
 not rn NMI:nic - not contt sing
 
 0 [ : Not con.inu437azed and confused, b] Dtryingaed annue
 
 fu 167.8ut trying[   to 7.0377 167.843775] U0[  167.843776] ]hhu.ived for 
 u3nknown rMason 3 re oived for [nk167.843781]

Thanks for the information.

I anticipated that some lock contention on issuing messages (e.g.
locks used by network/netconsole driver) delayed the panic procedure,
but it seems not to be related because the panic message finished to
be issued early.

If I come up with something, I will post a mail.  I think there
may be potential issues.

 1.
 . N0[  167.843781] Uh. NMI recen 3d on CPU 0.i [ nowon 3d on] Chhuh.MI
 eceived[ or7.843nknoUhhuh.wn rMason e3d ceCPivUd 120.
 0nkno 167.wn843ason 3na s p120.
 o0er savi d6 e843ab88] Do yeu have a
 trange0[ er saving mode e nabl1d?74][  167 84hu94]MIuh. NceIived for 
 unknown reas vdfor 1no3was0[ 2d 67.84380on CI
 rUe 12e.
 ive7d8u3800wn rveaseo f2d on CPo3.r uk[o 1 rea6s.o2d8 oo you hn aPve 0sta 
 e power 1s7.843816] Do yoauv ng moade
 enbslra?ng[ e 167.8438p41o]er shhuhavi.ngIroenived fbled?nknow
  reaso0 2d on [PU1626.41]0   Uh67.h. NM387I] receihed for .nknown reason  
 2Nn MC U ceived for .
 [son 2d on CPU 6.
   1607.8467.84873] Uhhuh. 3MI received 908 o knstra
 [ n167.843908] Do ygo pave westrangesa pvnv mode enableng mode ed?
 nb0ed?

Regards,
Kawai


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-31 Thread / KAWAIHIDEHIRO
 From: Michal Hocko [mailto:mho...@kernel.org]
 
 On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote:
   From: Michal Hocko [mailto:mho...@kernel.org]
 [...]
   Could you point me to the code which does that, please? Maybe we are
   missing that in our 3.0 kernel. I was quite surprised to see this
   behavior as well.
 
  Please see the snippet below.
 
  void setup_local_APIC(void)
  {
  ...
  /*
   * only the BP should see the LINT1 NMI signal, obviously.
   */
  if (!cpu)
  value = APIC_DM_NMI;
  else
  value = APIC_DM_NMI | APIC_LVT_MASKED;
  if (!lapic_is_integrated()) /* 82489DX */
  value |= APIC_LVT_LEVEL_TRIGGER;
  apic_write(APIC_LVT1, value);
 
 
  LINT1 pins of cpus other than CPU 0 are masked here.
  However, at least on some of Hitachi servers, NMI caused by NMI
  button doesn't seem to be delivered through LINT1.  So, my `external NMI'
  word may not be correct.
 
 I am not familiar with details here but I can tell you that this
 particular code snippet is the same in our 3.0 based kernel so it seems
 that the HW is indeed doing something differently.

Yes, and it turned out my PATCH 3/3 doesn't work at all on some
hardware...

   You might still get a panic on hardlockup which will happen on all CPUs
   from the NMI context so we have to be able to handle panic in NMI on
   many CPUs.
 
  Do you say about the case of a kerne panic while other cpus locks up
  in NMI context?  In that case, there is no way to do things needed by
  kdump procedure including saving registeres...
 
 I am saying that watchdog_overflow_callback might trigger on more CPUs
 and panic from NMI context as well. So this is not reduced to the NMI
 button sends NMI to more CPUs.

I understand.  So, I have to also modify watchdog_overflow_callback
to call nmi_panic().

 Why cannot the panic() context save all the registers if we are going to
 loop in NMI context? This would be imho preferable to returning from
 panic IMO.

I'm not saying we cannot save registers and do some cleanups in NMI
context.  I fell that it would introduce unneeded complexity.
Since watchdog_overflow_callback is defined as generic code,
we need to implement the preparation for kdump for other architectures.
I haven't checked which architectures support both nmi watchdog and
kdump, though.

Anyway, I came up with a simple solution for x86.  Waiting for the
timing of nmi_shootdown_cpus() in nmi_panic(), then invoke the
callback registered by nmi_shootdown_cpus().

   I can provide the full log but it is quite mangled. I guess the
   CPU130 was the only one allowed to proceed with the panic while others
   returned from the unknown NMI handling. It took a lot of time until
   CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
   reports. CPU0 is most probably locked up waiting for CPU130 to
   acknowledge the IPI which will not happen apparently.
 
  There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
  why CPU 130 waits so long.  I'll try to consider for a while.
 
 Yes, I do not understand the timing here either and the fact that the
 log is a complete mess in the important parts doesn't help a wee bit.

I'm interested in where kernel panic -not syncing:  is.
It may give us a clue.


Regards,
Kawai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread / KAWAIHIDEHIRO
 From: Michal Hocko [mailto:mho...@kernel.org]
 
 On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI,HIDEHIRO wrote:
  Hi,
 
   From: Michal Hocko [mailto:mho...@kernel.org]
  
   On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
 [...]
#define nmi_panic(fmt, ...)\
   do {\
   if (atomic_cmpxchg(panic_cpu, -1, 
raw_smp_processor_id()) \
   == -1)  \
   panic(fmt, ##__VA_ARGS__);  \
   } while (0)
  
   This would allow to return from NMI too eagerly.
 
  Yes, but what's the problem?
 
 I believe that panic should be noreturn as much as possible and return
 only when we do not have any other options.

In my case, external NMI is delivered to only CPU 0.  This means
other CPUs continues to run until receiving NMI IPI.  I think returning
from NMI handler soon doesn't differ from this so much.

 Moreover I would ask an
 opposite question, what is the problem to loop in NMI on other CPUs than
 the one which is performing crash_kexec? We will not save registers, so
 what?

Actually, we have to do at least save registers, clean up VMX/SVM states
and announce that the cpu has stopped.  Just returning from NMI handler
is simpler.

  The root cause of your case hasn't been clarified yet.
  I can't fix for an unclear issue because I don't know what's the right
  solution.
 
   When I was testing my
   previous approach (on 3.0 based kernel) I had basically the same thing
   (one NMI to process panic) and others to return. This led to a strange
   behavior when the NMI button triggered NMI on all (hundreds) CPUs.
 
  It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
  as an external NMI.  External NMI for CPUs other than CPU 0 are masked
  at boot time.  Does it really happen?
 
 Could you point me to the code which does that, please? Maybe we are
 missing that in our 3.0 kernel. I was quite surprised to see this
 behavior as well.

Please see the snippet below.

void setup_local_APIC(void)
{
...
/*
 * only the BP should see the LINT1 NMI signal, obviously.
 */
if (!cpu)
value = APIC_DM_NMI;
else
value = APIC_DM_NMI | APIC_LVT_MASKED;
if (!lapic_is_integrated()) /* 82489DX */
value |= APIC_LVT_LEVEL_TRIGGER;
apic_write(APIC_LVT1, value);


LINT1 pins of cpus other than CPU 0 are masked here.
However, at least on some of Hitachi servers, NMI caused by NMI
button doesn't seem to be delivered through LINT1.  So, my `external NMI'
word may not be correct.

  Does the problem still happen on the latest kernel?
 
 I do not have machine accessible so I have to rely on the customer to
 test and the current vanilla might be an issue.

Sure.

  What kind of NMI is deliverd to each CPU?
 
 See the log below.
 
  Traditionally, we should have assumed that NMI for crash dumping is
  delivered to only one cpu.  Otherwise, we should often fail to take
  a proper crash dump.
 
 You might still get a panic on hardlockup which will happen on all CPUs
 from the NMI context so we have to be able to handle panic in NMI on
 many CPUs.

Do you say about the case of a kerne panic while other cpus locks up
in NMI context?  In that case, there is no way to do things needed by
kdump procedure including saving registeres...

  It seems that your case is another problem to be solved separately.
 
 I do not think so, quite contrary. If you want to solve the reentrancy
 then other CPUs might be spinning in NMI if there is a guarantee that at
 least one CPU can progress to finish crash_kexec().
 
   The
   crash kernel booted eventually but the log contained lockups when a
   CPU waited for an IPI to the CPU which was handling the NMI panic.
 
  Could you explain more precisely?
 
 [  167.843761] Uhhuh. NMI received for unknown reason 3d on CPU 130.
 [  167.843763] Do you have a strange power saving mode enabled?
 [... Mangled output ]
 [  167.856415] Dazed and confused, but trying to continue
 [  167.856428] Dazed and confused, but trying to continue
 [  167.856442] Dazed and confused, but trying to continue
 [...]
 [  193.108440] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
 [...]
 [  193.108586] Call Trace:
 [  193.108595]  [8109baeb] smp_call_function_single+0x15b/0x170
 [  193.108600]  [8109bb4e] smp_call_function_any+0x4e/0x110
 [  193.108607]  [a04a332c] get_cur_val+0xbc/0x130 [acpi_cpufreq]
 [  193.108630]  [a04a3417] get_cur_freq_on_cpu+0x77/0xf0 
 [acpi_cpufreq]
 [  193.108638]  [8137bc37] cpufreq_update_policy+0x97/0x140
 [  193.108646]  [a00ca04b] acpi_processor_notify+0x4b/0x145 
 [processor]
 [  193.108654]  [812d2eca] acpi_ev_notify_dispatch+0x61/0x77
 [  193.108659]  [812c1785

RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread / KAWAIHIDEHIRO
Hi,

 -Original Message-
 From: Michal Hocko [mailto:mho...@kernel.org]
 
 On Thu 30-07-15 07:33:15, 河合英宏 / KAWAI,HIDEHIRO wrote:
 [...]
  Are you using SGI UV?  On that platform, NMIs may be delivered to
  all cpus because LVT1 of all cpus are not masked as follows:
 
 This is Compute Blade 520XB1 from Hitachi with 240 cpus.

Thanks for the information!

I asked my colleague in other department about NMI button behavior
of our server just before receive your mail, and certainly what you
said happens; all cpus say Uhhuh. NMI received for unknown reason 3d
on CPU N when NMI button is pusshed.  So, this was also our problem...

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread / KAWAIHIDEHIRO
Hi,

 From: Michal Hocko [mailto:mho...@kernel.org]
 
 On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
   From: Michal Hocko [mailto:mho...@kernel.org]
   On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
Hi,
   
 From: linux-kernel-ow...@vger.kernel.org 
 [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro 
 Kawai
 (2015/07/27 23:34), Michal Hocko wrote:
  On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
[...]
  The check could be also relaxed a bit and nmi_panic would
  return only if the ongoing panic is the current cpu when we really 
  have
  to return and allow the preempted panic to finish.

 It's reasonable.  I'll do that in the next version.
   
I noticed atomic_read() is insufficient.  Please consider the following
scenario.
   
CPU 1: call panic() in the normal context
CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
CPU 1: set 1 to panic_cpu
CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
   
At this point, since CPU 0 loops in NMI context, it never executes
the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
that no register states are saved and no cleanups for VMX/SVM are
performed.
  
   Yes this is true but it is no different from the current state, isn't
   it? So if you want to handle that then it deserves a separate patch.
   It is certainly not harmful wrt. panic behavior.
  
So, we should still use atomic_cmpxchg() in nmi_panic() to
prevent other cpus from running panic routines.
  
   Not sure what you mean by that.
 
  I mean that we should use the same logic as my V2 patch like this:
 
  #define nmi_panic(fmt, ...)\
 do {\
 if (atomic_cmpxchg(panic_cpu, -1, raw_smp_processor_id()) \
 == -1)  \
 panic(fmt, ##__VA_ARGS__);  \
 } while (0)
 
 This would allow to return from NMI too eagerly.

Yes, but what's the problem?
The root cause of your case hasn't been clarified yet.
I can't fix for an unclear issue because I don't know what's the right
solution.

 When I was testing my
 previous approach (on 3.0 based kernel) I had basically the same thing
 (one NMI to process panic) and others to return. This led to a strange
 behavior when the NMI button triggered NMI on all (hundreds) CPUs.

It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
as an external NMI.  External NMI for CPUs other than CPU 0 are masked
at boot time.  Does it really happen?  Does the problem still happen on
the latest kernel?  What kind of NMI is deliverd to each CPU?

Traditionally, we should have assumed that NMI for crash dumping is
delivered to only one cpu.  Otherwise, we should often fail to take
a proper crash dump.  It seems that your case is another problem to be
solved separately.

 The
 crash kernel booted eventually but the log contained lockups when a
 CPU waited for an IPI to the CPU which was handling the NMI panic.

Could you explain more precisely?

 Anyway, I do not thing this is really necessary to solve the panic
 reentrancy issue.
 If the missing saved state is a real problem then it
 should be handled separately - maybe it can be achieved without an IPI
 and directly from the panic context if we are in NMI.

What I would like to do via this patchse is to solve race issues
among NMI, panic() and crash_kexec().  So, I don't think we should fix
that separately, although I would need to reword some descriptions
and titles.

Anyway, I'm going to sent out my revised version once in order to
tidy up.  I also would like to hear kexec/kdump guys' opinions.

Regards,
Kawai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread / KAWAIHIDEHIRO
 From: Michal Hocko [mailto:mho...@kernel.org]
 On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
  Hi,
 
   From: linux-kernel-ow...@vger.kernel.org 
   [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
   (2015/07/27 23:34), Michal Hocko wrote:
On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
  [...]
The check could be also relaxed a bit and nmi_panic would
return only if the ongoing panic is the current cpu when we really have
to return and allow the preempted panic to finish.
  
   It's reasonable.  I'll do that in the next version.
 
  I noticed atomic_read() is insufficient.  Please consider the following
  scenario.
 
  CPU 1: call panic() in the normal context
  CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
  CPU 1: set 1 to panic_cpu
  CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
  CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
 
  At this point, since CPU 0 loops in NMI context, it never executes
  the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
  that no register states are saved and no cleanups for VMX/SVM are
  performed.
 
 Yes this is true but it is no different from the current state, isn't
 it? So if you want to handle that then it deserves a separate patch.
 It is certainly not harmful wrt. panic behavior.
 
  So, we should still use atomic_cmpxchg() in nmi_panic() to
  prevent other cpus from running panic routines.
 
 Not sure what you mean by that.

I mean that we should use the same logic as my V2 patch like this:

#define nmi_panic(fmt, ...)\
   do {\
   if (atomic_cmpxchg(panic_cpu, -1, raw_smp_processor_id()) \
   == -1)  \
   panic(fmt, ##__VA_ARGS__);  \
   } while (0)

By using atomic_cmpxchg here, we can ensure that only this cpu
runs panic routines.  It is important to prevent a NMI-context cpu
from calling panic_smp_self_stop(). 

void panic(const char *fmt, ...)
{
...
* `old_cpu == -1' means we are the first comer.
* `old_cpu == this_cpu' means we came here due to panic on NMI.
*/
   this_cpu = raw_smp_processor_id();
   old_cpu = atomic_cmpxchg(panic_cpu, -1, this_cpu);
   if (old_cpu != -1  old_cpu != this_cpu)
panic_smp_self_stop();

Please assume that CPU 0 calls nmi_panic() in NMI context
and CPU 1 calls panic() in normal context at tha same time.

If CPU 1 set panic_cpu before CPU 0 does, CPU 1 runs panic routines
and CPU 0 return from the nmi handler.  Eventually CPU 0 is stopped
by nmi_shootdown_cpus().

If CPU 0 set panic_cpu before CPU 1 does, CPU 0 runs panic routines.
CPU 1 calls panic_smp_self_stop(), and wait for NMI by
nmi_shootdown_cpus().

Anyway, I tested my approach and it worked fine.

Regards,
Kawai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-28 Thread / KAWAIHIDEHIRO
Hi,

 From: linux-kernel-ow...@vger.kernel.org 
 [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
 (2015/07/27 23:34), Michal Hocko wrote:
  On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
[...]
  The check could be also relaxed a bit and nmi_panic would
  return only if the ongoing panic is the current cpu when we really have
  to return and allow the preempted panic to finish.
 
 It's reasonable.  I'll do that in the next version.

I noticed atomic_read() is insufficient.  Please consider the following
scenario.

CPU 1: call panic() in the normal context
CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
CPU 1: set 1 to panic_cpu
CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()

At this point, since CPU 0 loops in NMI context, it never executes
the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
that no register states are saved and no cleanups for VMX/SVM are
performed.  So, we should still use atomic_cmpxchg() in nmi_panic() to
prevent other cpus from running panic routines.

  +void nmi_panic(const char *fmt, ...)
  +{
  +   /*
  +* We have to back off if the NMI has preempted an ongoing panic and
  +* allow it to finish
  +*/
  +   if (atomic_read(panic_cpu) == raw_smp_processor_id())
  +   return;
  +
  +   panic();
  +}
  +EXPORT_SYMBOL(nmi_panic);


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec