Am Thu, 23 Mar 2017 17:23:53 +0800 schrieb Xunlei Pang <xp...@redhat.com>:
> On 03/23/2017 at 04:48 AM, Michael Holzheu wrote: > > Am Wed, 22 Mar 2017 12:30:04 +0800 > > schrieb Dave Young <dyo...@redhat.com>: > > > >> On 03/21/17 at 10:18pm, Eric W. Biederman wrote: > >>> Dave Young <dyo...@redhat.com> writes: > >>> > > [snip] > > > >>>> I think makedumpfile is using it, but I also vote to remove the > >>>> CRASHTIME. It is better not to do this while crashing and a makedumpfile > >>>> userspace patch is needed to drop the use of it. > >>>> > >>>>> As we are looking at reliability concerns removing CRASHTIME should make > >>>>> everything in vmcoreinfo a boot time constant. Which should simplify > >>>>> everything considerably. > >>>> It is a nice improvement.. > >>> We also need to take a close look at what s390 is doing with vmcoreinfo. > >>> As apparently it is reading it in a different kind of crashdump process. > >> Yes, need careful review from s390 and maybe ppc64 especially about > >> patch 2/3, better to have comments from IBM about s390 dump tool and ppc > >> fadump. Added more cc. > > On s390 we have at least an issue with patch 1/3. For stand-alone dump > > and also because we create the ELF header for kdump in the new > > kernel we save the pointer to the vmcoreinfo note in the old kernel on a > > defined memory address in our absolute zero lowcore. > > > > This is done in arch/s390/kernel/setup.c: > > > > static void __init setup_vmcoreinfo(void) > > { > > mem_assign_absolute(S390_lowcore.vmcore_info, > > paddr_vmcoreinfo_note()); > > } > > > > Since with patch 1/3 paddr_vmcoreinfo_note() returns NULL at this point in > > time we have a problem here. > > > > To solve this - I think - we could move the initialization to > > arch/s390/kernel/machine_kexec.c: > > > > void arch_crash_save_vmcoreinfo(void) > > { > > VMCOREINFO_SYMBOL(lowcore_ptr); > > VMCOREINFO_SYMBOL(high_memory); > > VMCOREINFO_LENGTH(lowcore_ptr, NR_CPUS); > > mem_assign_absolute(S390_lowcore.vmcore_info, > > paddr_vmcoreinfo_note()); > > } > > > > Probably related to this is my observation that patch 3/3 leads to > > an empty VMCOREINFO note for kdump on s390. The note is there ... > > > > # readelf -n /var/crash/127.0.0.1-2017-03-22-21:14:39/vmcore | grep VMCORE > > VMCOREINFO 0x0000068e Unknown note type: (0x00000000) > > > > But it contains only zeros. > > Yes, this is a good catch, I will do more tests. Hello Xunlei, After spending some time on this, I now understood the problem: In patch 3/3 you copy vmcoreinfo into the control page before machine_kexec_prepare() is called. For s390 we give back all the crashkernel memory to the hypervisor before the new crashkernel is loaded: /* * Give back memory to hypervisor before new kdump is loaded */ static int machine_kexec_prepare_kdump(void) { #ifdef CONFIG_CRASH_DUMP if (MACHINE_IS_VM) diag10_range(PFN_DOWN(crashk_res.start), PFN_DOWN(crashk_res.end - crashk_res.start + 1)); return 0; #else return -EINVAL; #endif } So after machine_kexec_prepare_kdump() the contents of your control page is gone and therefore the vmcorinfo ELF note contains only zeros. If you call kimage_crash_copy_vmcoreinfo() after machine_kexec_prepare_kdump() the problem should be solved for s390. Regards Michael