Re: [PATCH] align crash_notes allocation to make it be inside one physical page

2015-07-29 Thread Minfei Huang
On 07/30/15 at 11:07am, Baoquan He wrote:
> People reported that crash_notes in /proc/vmcore were corrupted and
> this cause crash kdump failure. With code debugging and log we got
> the root cause. This is because percpu variable crash_notes are
> allocated in 2 vmalloc pages. As you know percpu is based on vmalloc
> by default. Then vmalloc can't guarantee 2 continuous vmalloc pages
> are also on 2 continuous physical pages. Then 1st kernel export the
> starting addr and size, kdump kernel use the starting addr and size
> to get the content of crash_notes, then 2nd part may not be in the
> next neighbouring physical page as we think. That's why nhdr_ptr->n_namesz
> or nhdr_ptr->n_descsz could be very huge in update_note_header_size_elf64()
> and cause note header merging failure or some warnings.
> 
> In this patch change to call __alloc_percpu() to passed in the align
> value which is nearest the the 2^log(sizeof(note_buf_t)). This align
> value can make sure the crash_notes is allocated inside one physical
> page since sizeof(note_buf_t) in all ARCHS is smaller PAGE_SIZE. But
> add a WARN_ON in case it grow to be bigger than PAGE_SIZE in the future.
> 
> Signed-off-by: Baoquan He 
> ---
>  kernel/kexec.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index a785c10..1740c42 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>  static int __init crash_notes_memory_init(void)
>  {
>   /* Allocate memory for saving cpu registers. */
> - crash_notes = alloc_percpu(note_buf_t);
> + size_t size, align;
> + int order;
> +
> + size = sizeof(note_buf_t);
> + order = get_count_order(size);
> + align = 1<< order;
> +
> + WARN_ON(size > PAGE_SIZE);

It is fine without this warning, since percpu will fail to allocate the
memory larger than PAGE_SIZE.

Thanks
Minfei

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] align crash_notes allocation to make it be inside one physical page

2015-07-29 Thread Baoquan He
People reported that crash_notes in /proc/vmcore were corrupted and
this cause crash kdump failure. With code debugging and log we got
the root cause. This is because percpu variable crash_notes are
allocated in 2 vmalloc pages. As you know percpu is based on vmalloc
by default. Then vmalloc can't guarantee 2 continuous vmalloc pages
are also on 2 continuous physical pages. Then 1st kernel export the
starting addr and size, kdump kernel use the starting addr and size
to get the content of crash_notes, then 2nd part may not be in the
next neighbouring physical page as we think. That's why nhdr_ptr->n_namesz
or nhdr_ptr->n_descsz could be very huge in update_note_header_size_elf64()
and cause note header merging failure or some warnings.

In this patch change to call __alloc_percpu() to passed in the align
value which is nearest the the 2^log(sizeof(note_buf_t)). This align
value can make sure the crash_notes is allocated inside one physical
page since sizeof(note_buf_t) in all ARCHS is smaller PAGE_SIZE. But
add a WARN_ON in case it grow to be bigger than PAGE_SIZE in the future.

Signed-off-by: Baoquan He 
---
 kernel/kexec.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index a785c10..1740c42 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
 static int __init crash_notes_memory_init(void)
 {
/* Allocate memory for saving cpu registers. */
-   crash_notes = alloc_percpu(note_buf_t);
+   size_t size, align;
+   int order;
+
+   size = sizeof(note_buf_t);
+   order = get_count_order(size);
+   align = 1<< order;
+
+   WARN_ON(size > PAGE_SIZE);
+
+   crash_notes = __alloc_percpu(size, align);
if (!crash_notes) {
pr_warn("Kexec: Memory allocation for saving cpu register 
states failed\n");
return -ENOMEM;
-- 
2.1.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi,

> From: Michal Hocko [mailto:mho...@kernel.org]
> 
> On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > From: Michal Hocko [mailto:mho...@kernel.org]
> > > On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi,
> > > >
> > > > > From: linux-kernel-ow...@vger.kernel.org 
> > > > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro 
> > > > > Kawai
> > > > > (2015/07/27 23:34), Michal Hocko wrote:
> > > > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
> > > > [...]
> > > > > > The check could be also relaxed a bit and nmi_panic would
> > > > > > return only if the ongoing panic is the current cpu when we really 
> > > > > > have
> > > > > > to return and allow the preempted panic to finish.
> > > > >
> > > > > It's reasonable.  I'll do that in the next version.
> > > >
> > > > I noticed atomic_read() is insufficient.  Please consider the following
> > > > scenario.
> > > >
> > > > CPU 1: call panic() in the normal context
> > > > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
> > > > CPU 1: set 1 to panic_cpu
> > > > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
> > > > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
> > > >
> > > > At this point, since CPU 0 loops in NMI context, it never executes
> > > > the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
> > > > that no register states are saved and no cleanups for VMX/SVM are
> > > > performed.
> > >
> > > Yes this is true but it is no different from the current state, isn't
> > > it? So if you want to handle that then it deserves a separate patch.
> > > It is certainly not harmful wrt. panic behavior.
> > >
> > > > So, we should still use atomic_cmpxchg() in nmi_panic() to
> > > > prevent other cpus from running panic routines.
> > >
> > > Not sure what you mean by that.
> >
> > I mean that we should use the same logic as my V2 patch like this:
> >
> > #define nmi_panic(fmt, ...)\
> >do {\
> >if (atomic_cmpxchg(&panic_cpu, -1, raw_smp_processor_id()) \
> >== -1)  \
> >panic(fmt, ##__VA_ARGS__);  \
> >} while (0)
> 
> This would allow to return from NMI too eagerly.

Yes, but what's the problem?
The root cause of your case hasn't been clarified yet.
I can't fix for an unclear issue because I don't know what's the right
solution.

> When I was testing my
> previous approach (on 3.0 based kernel) I had basically the same thing
> (one NMI to process panic) and others to return. This led to a strange
> behavior when the NMI button triggered NMI on all (hundreds) CPUs.

It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
as an external NMI.  External NMI for CPUs other than CPU 0 are masked
at boot time.  Does it really happen?  Does the problem still happen on
the latest kernel?  What kind of NMI is deliverd to each CPU?

Traditionally, we should have assumed that NMI for crash dumping is
delivered to only one cpu.  Otherwise, we should often fail to take
a proper crash dump.  It seems that your case is another problem to be
solved separately.

> The
> crash kernel booted eventually but the log contained lockups when a
> CPU waited for an IPI to the CPU which was handling the NMI panic.

Could you explain more precisely?

> Anyway, I do not thing this is really necessary to solve the panic
> reentrancy issue.
> If the missing saved state is a real problem then it
> should be handled separately - maybe it can be achieved without an IPI
> and directly from the panic context if we are in NMI.

What I would like to do via this patchse is to solve race issues
among NMI, panic() and crash_kexec().  So, I don't think we should fix
that separately, although I would need to reword some descriptions
and titles.

Anyway, I'm going to sent out my revised version once in order to
tidy up.  I also would like to hear kexec/kdump guys' opinions.

Regards,
Kawai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread Michal Hocko
On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > From: Michal Hocko [mailto:mho...@kernel.org]
> > On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > Hi,
> > >
> > > > From: linux-kernel-ow...@vger.kernel.org 
> > > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
> > > > (2015/07/27 23:34), Michal Hocko wrote:
> > > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
> > > [...]
> > > > > The check could be also relaxed a bit and nmi_panic would
> > > > > return only if the ongoing panic is the current cpu when we really 
> > > > > have
> > > > > to return and allow the preempted panic to finish.
> > > >
> > > > It's reasonable.  I'll do that in the next version.
> > >
> > > I noticed atomic_read() is insufficient.  Please consider the following
> > > scenario.
> > >
> > > CPU 1: call panic() in the normal context
> > > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
> > > CPU 1: set 1 to panic_cpu
> > > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
> > > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
> > >
> > > At this point, since CPU 0 loops in NMI context, it never executes
> > > the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
> > > that no register states are saved and no cleanups for VMX/SVM are
> > > performed.
> > 
> > Yes this is true but it is no different from the current state, isn't
> > it? So if you want to handle that then it deserves a separate patch.
> > It is certainly not harmful wrt. panic behavior.
> > 
> > > So, we should still use atomic_cmpxchg() in nmi_panic() to
> > > prevent other cpus from running panic routines.
> > 
> > Not sure what you mean by that.
> 
> I mean that we should use the same logic as my V2 patch like this:
> 
> #define nmi_panic(fmt, ...)\
>do {\
>if (atomic_cmpxchg(&panic_cpu, -1, raw_smp_processor_id()) \
>== -1)  \
>panic(fmt, ##__VA_ARGS__);  \
>} while (0)

This would allow to return from NMI too eagerly. When I was testing my
previous approach (on 3.0 based kernel) I had basically the same thing
(one NMI to process panic) and others to return. This led to a strange
behavior when the NMI button triggered NMI on all (hundreds) CPUs. The
crash kernel booted eventually but the log contained lockups when a
CPU waited for an IPI to the CPU which was handling the NMI panic.

Anyway, I do not thing this is really necessary to solve the panic
reentrancy issue. If the missing saved state is a real problem then it
should be handled separately - maybe it can be achieved without an IPI
and directly from the panic context if we are in NMI.
-- 
Michal Hocko
SUSE Labs

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread 河合英宏 / KAWAI,HIDEHIRO
> From: Michal Hocko [mailto:mho...@kernel.org]
> On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi,
> >
> > > From: linux-kernel-ow...@vger.kernel.org 
> > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
> > > (2015/07/27 23:34), Michal Hocko wrote:
> > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
> > [...]
> > > > The check could be also relaxed a bit and nmi_panic would
> > > > return only if the ongoing panic is the current cpu when we really have
> > > > to return and allow the preempted panic to finish.
> > >
> > > It's reasonable.  I'll do that in the next version.
> >
> > I noticed atomic_read() is insufficient.  Please consider the following
> > scenario.
> >
> > CPU 1: call panic() in the normal context
> > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
> > CPU 1: set 1 to panic_cpu
> > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
> > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
> >
> > At this point, since CPU 0 loops in NMI context, it never executes
> > the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
> > that no register states are saved and no cleanups for VMX/SVM are
> > performed.
> 
> Yes this is true but it is no different from the current state, isn't
> it? So if you want to handle that then it deserves a separate patch.
> It is certainly not harmful wrt. panic behavior.
> 
> > So, we should still use atomic_cmpxchg() in nmi_panic() to
> > prevent other cpus from running panic routines.
> 
> Not sure what you mean by that.

I mean that we should use the same logic as my V2 patch like this:

#define nmi_panic(fmt, ...)\
   do {\
   if (atomic_cmpxchg(&panic_cpu, -1, raw_smp_processor_id()) \
   == -1)  \
   panic(fmt, ##__VA_ARGS__);  \
   } while (0)

By using atomic_cmpxchg here, we can ensure that only this cpu
runs panic routines.  It is important to prevent a NMI-context cpu
from calling panic_smp_self_stop(). 

void panic(const char *fmt, ...)
{
...
* `old_cpu == -1' means we are the first comer.
* `old_cpu == this_cpu' means we came here due to panic on NMI.
*/
   this_cpu = raw_smp_processor_id();
   old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
   if (old_cpu != -1 && old_cpu != this_cpu)
panic_smp_self_stop();

Please assume that CPU 0 calls nmi_panic() in NMI context
and CPU 1 calls panic() in normal context at tha same time.

If CPU 1 set panic_cpu before CPU 0 does, CPU 1 runs panic routines
and CPU 0 return from the nmi handler.  Eventually CPU 0 is stopped
by nmi_shootdown_cpus().

If CPU 0 set panic_cpu before CPU 1 does, CPU 0 runs panic routines.
CPU 1 calls panic_smp_self_stop(), and wait for NMI by
nmi_shootdown_cpus().

Anyway, I tested my approach and it worked fine.

Regards,
Kawai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread Michal Hocko
On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
> Hi,
> 
> > From: linux-kernel-ow...@vger.kernel.org 
> > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
> > (2015/07/27 23:34), Michal Hocko wrote:
> > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
> [...]
> > > The check could be also relaxed a bit and nmi_panic would
> > > return only if the ongoing panic is the current cpu when we really have
> > > to return and allow the preempted panic to finish.
> > 
> > It's reasonable.  I'll do that in the next version.
> 
> I noticed atomic_read() is insufficient.  Please consider the following
> scenario.
> 
> CPU 1: call panic() in the normal context
> CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
> CPU 1: set 1 to panic_cpu
> CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
> CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
> 
> At this point, since CPU 0 loops in NMI context, it never executes
> the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
> that no register states are saved and no cleanups for VMX/SVM are
> performed.

Yes this is true but it is no different from the current state, isn't
it? So if you want to handle that then it deserves a separate patch.
It is certainly not harmful wrt. panic behavior.

> So, we should still use atomic_cmpxchg() in nmi_panic() to
> prevent other cpus from running panic routines.

Not sure what you mean by that.

-- 
Michal Hocko
SUSE Labs

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec