Re: [PATCHv2 2/2] [fs] proc/vmcore: check the dummy place holder for offline cpu to avoid warning

2016-12-20 Thread Liu ping fan
On Wed, Dec 21, 2016 at 11:57 AM, Pratyush Anand  wrote:
>
>
> On Wednesday 21 December 2016 08:56 AM, Xunlei Pang wrote:
>>
>> On 12/20/2016 at 11:38 PM, Pratyush Anand wrote:
>>>
>>>
>>>
>>> On Monday 19 December 2016 08:10 AM, Dave Young wrote:

 Hi, Pingfan

 On 12/19/16 at 10:08am, Pingfan Liu wrote:
>>
>> kexec-tools always allocates program headers for present cpus. But
>> when crashing, offline cpus have dummy headers. We do not copy these
>> dummy notes into ELF file, also have no need of warning on them.

 I still think it is not worth such a fix, if you feel a lot of warnings
 in case large cpu numbers, I think you can change the pr_warn to
 pr_warn_once, we do not care the null cpu notes if it has nothing bad
 to the vmcore.

>>>
>>> I agree. Warning is more like information here. May be, we can count the
>>> number of times real_sz was 0, and then can print an info at the end in
>>> stead of warning, like..."N number of CPUs would have been offline, PT_NOTE
>>> entries was absent for them."
>>
>>
>> Well, OTOH the warning may also be due to some user-space misuse, we can't
>> distinguish that without extra information added.
>
>
> Yes, yes..I agree, I meant that the above info is just indicative. May be
> "might have been" could be better word than "would have been" in the above
> info print message.
>
>
>>
>> Another possible user-space fix would be: Firstly fix kexec-tools to add
>> notes only for online cpus,
>> then utilize udev rules(cpu online/offline events) to automatically
>> trigger kdump kernel reload.
>
>
> Hummm..this is certainly possible. But can we do much even when we get the
> info that the PT_NOTE was compromised by user space?
>
If doing this, kexec should be re-forked each time when
online/offline. I think it is not worth to do so just for suppressing
some warning.

> Therefore, I am of the view that if at all we are concerned about number of
> warning messages in case of multiple offline cpu, then we can just print the
> total number of NULL PT_NOTE at the end of loop.
>
Yes, I think it is a good idea. NULL PT_NOTES are only caused by
offlne cpu, it is better to change the waring to info

Thx,
Pingfan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-15 Thread Liu ping fan
On Thu, Dec 15, 2016 at 3:16 PM, Dave Young  wrote:
> Hi, Pingfan
> On 12/14/16 at 02:11pm, Pingfan Liu wrote:
>> kexec-tools always allocates program headers for each possible cpu. This
>> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
>> the capture kernel can distinguish it from the mistake of allocated
>> program header.
>> The counterpart of the capture kernel comes in next patch.
>
> I thought you saw the warnings on ppc64 and it might be a ppc64 issue.
It affects all archs, since kexec-tools creates each program header
for each _present_ cpu. So if the cpu is offline, then the warning
buzzes.

> But if this is instead a general issue, can we think about if this is
> really necessary?
>
> Does it have any side effect other than the warning messages? If there
> is nothing bad other than the warnings maybe leave it as is will be
> a better way.
>
I think kernel warning always made people upset  who do not know the
back ground of the related field.
While on the other hand, the program behaves correctly against our
expectation, so just suppress the warning to make everyone easy.

Regards,
Pingfan
>>
>> Signed-off-by: Pingfan Liu 
>> ---
>> This unnecessary warning buzz on all archs when there is offline cpu
>>
>>  include/uapi/linux/elf.h | 1 +
>>  kernel/kexec_core.c  | 9 +
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
>> index b59ee07..9744f1e 100644
>> --- a/include/uapi/linux/elf.h
>> +++ b/include/uapi/linux/elf.h
>> @@ -367,6 +367,7 @@ typedef struct elf64_shdr {
>>   * using the corresponding note types via the PTRACE_GETREGSET and
>>   * PTRACE_SETREGSET requests.
>>   */
>> +#define NT_DUMMY 0
>>  #define NT_PRSTATUS  1
>>  #define NT_PRFPREG   2
>>  #define NT_PRPSINFO  3
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 5616755..aeac16e 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -891,9 +891,12 @@ void __crash_kexec(struct pt_regs *regs)
>>   if (mutex_trylock(_mutex)) {
>>   if (kexec_crash_image) {
>>   struct pt_regs fixed_regs;
>> + unsigned int cpu;
>>
>>   crash_setup_regs(_regs, regs);
>>   crash_save_vmcoreinfo();
>> + for_each_cpu_not(cpu, cpu_online_mask)
>> + crash_save_cpu(NULL, cpu);
>>   machine_crash_shutdown(_regs);
>>   machine_kexec(kexec_crash_image);
>>   }
>> @@ -1040,6 +1043,12 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>>   buf = (u32 *)per_cpu_ptr(crash_notes, cpu);
>>   if (!buf)
>>   return;
>> + if (regs == NULL) {
>> + buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_DUMMY,
>> + NULL, 0);
>> + final_note(buf);
>> + return;
>> + }
>>   memset(, 0, sizeof(prstatus));
>>   prstatus.pr_pid = current->pid;
>>   elf_core_copy_kernel_regs(_reg, regs);
>> --
>> 2.7.4
>>
>
> Thanks
> Dave

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/2] [fs] proc/vmcore: check the dummy place holder for offline cpu to avoid warning

2016-12-14 Thread Liu ping fan
On Thu, Dec 15, 2016 at 7:56 AM, Xunlei Pang  wrote:
> On 12/14/2016 at 02:11 PM, Pingfan Liu wrote:
>> kexec-tools always allocates program headers for possible cpus. But
>> when crashing, offline cpus have dummy headers. We do not copy these
>> dummy notes into ELF file, also have no need of warning on them.
>>
>> Signed-off-by: Pingfan Liu 
>> ---
>>  fs/proc/vmcore.c | 21 +
>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index 8ab782d..bbc9dad 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -526,9 +526,10 @@ static u64 __init get_vmcore_size(size_t elfsz, size_t 
>> elfnotesegsz,
>>   */
>>  static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
>>  {
>> - int i, rc=0;
>> + int i, j, rc = 0;
>>   Elf64_Phdr *phdr_ptr;
>>   Elf64_Nhdr *nhdr_ptr;
>> + bool warn;
>>
>>   phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
>>   for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>> @@ -536,6 +537,7 @@ static int __init update_note_header_size_elf64(const 
>> Elf64_Ehdr *ehdr_ptr)
>>   u64 offset, max_sz, sz, real_sz = 0;
>>   if (phdr_ptr->p_type != PT_NOTE)
>>   continue;
>> + warn = true;
>>   max_sz = phdr_ptr->p_memsz;
>>   offset = phdr_ptr->p_offset;
>>   notes_section = kmalloc(max_sz, GFP_KERNEL);
>> @@ -547,7 +549,7 @@ static int __init update_note_header_size_elf64(const 
>> Elf64_Ehdr *ehdr_ptr)
>>   return rc;
>>   }
>>   nhdr_ptr = notes_section;
>> - while (nhdr_ptr->n_namesz != 0) {
>> + for (j = 0; nhdr_ptr->n_namesz != 0; j++) {
>
> Hi Pingfan,
>
> I think we don't need to be this complex, how about simply check before while 
> loop,
> if it is the cpu dummy note(initialize it with some magic), then handle it 
> differently,
> e.g. set a "nowarn" flag to use afterwards and make sure it has zero p_memsz?
>
I had thought that how the percpu note section was filled. But you are
right,  we can suppose that for all archs, cpus just overwrite the
note, not append the note.

> Also do the similar thing for update_note_header_size_elf32()?
>
Yes, will fix it.

Thx,

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
[...]
>>>
>> No. This patch just place a mark on these offline cpu. The next patch
>> for capture kernel will recognize this case, and ignore this kind of
>> pt_note by the code:
>> real_sz = 0; // although the size of this kind of PT_NOTE is not zero,
>> but it contains nothing useful, so just ignore it
>> phdr_ptr->p_memsz = real_sz
>
> If there is any other vmcore functional issue besides throwing "Warning: Zero 
> PT_NOTE entries found"?
>
Not at present when I debugged.
I just think we can not suppose the behaviour of different archs, so
just mark out the dummy pt_note. If some archs want to use these notes
memory,
they will just overwrite the dummy.

Thx,
Pingfan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
[...]
>> >> >
>> >> > When you execute dmesg on your testing machine and grep nr_cpu_ids,
>> >> > what's the value of nr_cpu_ids?
>> >> >
>> >> nr_cpu_ids=128
>> >
>> > And what's the cpu number of in "lscpu" command?
>>
>> NUMA node1 CPU(s): 0-7
>>
>> The system booted up with 128 possible cpu and only 8 online.
>> Also I tested on x86 guest, after bootup with 8 cpus, then  offline 4
>> of them, the zero PT_NOTE warning buzz too.
>
> Yes, this is what I think not quite appropriate using
> for_each_cpu_not(cpu, cpu_online_mask). Maybe it need try to save on
> those cpus which is present but not online. not online seems not good,
> it's not reasonable to save those getting apic but no cpu plugged.
>
In the file:kexec-tools/kexec/crashdump-elf.c
   nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
And this is why this patch needs to make a mark on these offline cpu.
This is no something like "_SC_NPROCESSORS_PRESENT" option, so just
work around it in kernel side.
Anyway for crash kernel, we only write "core" in percpu notes without
no more info, and it costs nothing when capture kernel gather the
PT_NOTE.

Thx,
Pingfan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
On Wed, Dec 14, 2016 at 4:48 PM, Xunlei Pang  wrote:
> On 12/14/2016 at 02:11 PM, Pingfan Liu wrote:
>> kexec-tools always allocates program headers for each possible cpu. This
>> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
>> the capture kernel can distinguish it from the mistake of allocated
>> program header.
>> The counterpart of the capture kernel comes in next patch.
>
> Hmm, we can initialize the cpu crash note buf in crash_notes_memory_init(), 
> needless
> to do it at the crash moment, right?
>
The cpus can be on-off-on.., We can not know the user's action.

> BTW, does this cause any issue, for example the crash utility can't parse the 
> vmcore
> properly? or just reproduce lots of warnings after offline multiple cpus?
>
No. This patch just place a mark on these offline cpu. The next patch
for capture kernel will recognize this case, and ignore this kind of
pt_note by the code:
real_sz = 0; // although the size of this kind of PT_NOTE is not zero,
but it contains nothing useful, so just ignore it
phdr_ptr->p_memsz = real_sz

Thx,
Pingfan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
On Wed, Dec 14, 2016 at 4:25 PM, Baoquan He <b...@redhat.com> wrote:
> On 12/14/16 at 04:15pm, Liu ping fan wrote:
>> On Wed, Dec 14, 2016 at 3:40 PM, Baoquan He <b...@redhat.com> wrote:
>> > On 12/14/16 at 02:11pm, Pingfan Liu wrote:
>> >> kexec-tools always allocates program headers for each possible cpu. This
>> >> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
>> >> the capture kernel can distinguish it from the mistake of allocated
>> >> program header.
>> >> The counterpart of the capture kernel comes in next patch.
>> >
>> > When you execute dmesg on your testing machine and grep nr_cpu_ids,
>> > what's the value of nr_cpu_ids?
>> >
>> nr_cpu_ids=128
>
> And what's the cpu number of in "lscpu" command?

NUMA node1 CPU(s): 0-7

The system booted up with 128 possible cpu and only 8 online.
Also I tested on x86 guest, after bootup with 8 cpus, then  offline 4
of them, the zero PT_NOTE warning buzz too.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
On Wed, Dec 14, 2016 at 3:40 PM, Baoquan He  wrote:
> On 12/14/16 at 02:11pm, Pingfan Liu wrote:
>> kexec-tools always allocates program headers for each possible cpu. This
>> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
>> the capture kernel can distinguish it from the mistake of allocated
>> program header.
>> The counterpart of the capture kernel comes in next patch.
>
> When you execute dmesg on your testing machine and grep nr_cpu_ids,
> what's the value of nr_cpu_ids?
>
nr_cpu_ids=128

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-13 Thread Liu ping fan
On Wed, Dec 14, 2016 at 2:11 PM, Pingfan Liu  wrote:
> kexec-tools always allocates program headers for each possible cpu. This

The code is in the file:kexec-tools/kexec/crashdump-elf.c
   nr_cpus = sysconf(_SC_NPROCESSORS_CONF);

> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
> the capture kernel can distinguish it from the mistake of allocated
> program header.
> The counterpart of the capture kernel comes in next patch.
>
> Signed-off-by: Pingfan Liu 
> ---
> This unnecessary warning buzz on all archs when there is offline cpu
>
>  include/uapi/linux/elf.h | 1 +
>  kernel/kexec_core.c  | 9 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index b59ee07..9744f1e 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -367,6 +367,7 @@ typedef struct elf64_shdr {
>   * using the corresponding note types via the PTRACE_GETREGSET and
>   * PTRACE_SETREGSET requests.
>   */
> +#define NT_DUMMY   0
>  #define NT_PRSTATUS1
>  #define NT_PRFPREG 2
>  #define NT_PRPSINFO3
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..aeac16e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -891,9 +891,12 @@ void __crash_kexec(struct pt_regs *regs)
> if (mutex_trylock(_mutex)) {
> if (kexec_crash_image) {
> struct pt_regs fixed_regs;
> +   unsigned int cpu;
>
> crash_setup_regs(_regs, regs);
> crash_save_vmcoreinfo();
> +   for_each_cpu_not(cpu, cpu_online_mask)
> +   crash_save_cpu(NULL, cpu);
> machine_crash_shutdown(_regs);
> machine_kexec(kexec_crash_image);
> }
> @@ -1040,6 +1043,12 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> buf = (u32 *)per_cpu_ptr(crash_notes, cpu);
> if (!buf)
> return;
> +   if (regs == NULL) {
> +   buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_DUMMY,
> +   NULL, 0);
> +   final_note(buf);
> +   return;
> +   }
> memset(, 0, sizeof(prstatus));
> prstatus.pr_pid = current->pid;
> elf_core_copy_kernel_regs(_reg, regs);
> --
> 2.7.4
>

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec