Re: [PATCH] makedumpfile: ppc64: do page traversal if vmemmap_list not populated
On Wed, Sep 27, 2023 at 06:32:44AM +, HAGIO KAZUHITO(萩尾 一仁) wrote: > On 2023/09/14 18:22, Aditya Gupta wrote: > > Currently 'makedumpfile' fails to collect vmcore on upstream kernel, > > with the errors: > > > > readpage_elf: Attempt to read non-existent page at 0x4000. > > readmem: type_addr: 0, addr:0, size:8 > > get_vmemmap_list_info: Can't get vmemmap region addresses > > get_machdep_info_ppc64: Can't get vmemmap list info. > > > > This occurs since makedumpfile depends on 'vmemmap_list' for translating > > vmemmap addresses. But with below commit in Linux, vmemmap_list can be > > empty, in case of Radix MMU on PowerPC64 > > > > 368a0590d954: (powerpc/book3s64/vmemmap: switch radix to use a > > different vmemmap handling function) > > > > In case vmemmap_list is empty, then it's head is NULL, which causes > > makedumpfile to fail with above error. > > > > Since with above commit, 'vmemmap_list' is not populated (when MMU is > > Radix MMU), kernel populates corresponding page table entries in kernel > > page table. Hence, instead of depending on 'vmemmap_list' for address > > translation for vmemmap addresses, do a kernel pagetable walk. > > > > And since the pte can also be introduced at higher levels in the page > > table, such as at PMD level, add hugepage support, by checking for > > PAGE_PTE flag > > > > Reported-by: Sachin Sant > > Signed-off-by: Aditya Gupta > > Thank you for the patch, applied. > > https://github.com/makedumpfile/makedumpfile/commit/a34f017965583e89c4cb0b00117c200a6c191e54 Thanks for the update. > > Sorry for the delay, exceptionally busy this month.. No issues :) Thanks, - Aditya Gupta > > Thanks, > Kazu > > > --- > > arch/ppc64.c | 111 ++--- > > makedumpfile.h | 6 +++ > > 2 files changed, 84 insertions(+), 33 deletions(-) > > > > diff --git a/arch/ppc64.c b/arch/ppc64.c > > index 5e70acb51aba..9456b8b570c5 100644 > > --- a/arch/ppc64.c > > +++ b/arch/ppc64.c > > @@ -196,6 +196,10 @@ ppc64_vmemmap_init(void) > > int psize, shift; > > ulong head; > > > > + /* initialise vmemmap_list in case SYMBOL(vmemmap_list) is not found */ > > + info->vmemmap_list = NULL; > > + info->vmemmap_cnt = 0; > > + > > if ((SYMBOL(vmemmap_list) == NOT_FOUND_SYMBOL) > > || (SYMBOL(mmu_psize_defs) == NOT_FOUND_SYMBOL) > > || (SYMBOL(mmu_vmemmap_psize) == NOT_FOUND_SYMBOL) > > @@ -216,15 +220,24 @@ ppc64_vmemmap_init(void) > > return FALSE; > > info->vmemmap_psize = 1 << shift; > > > > - if (!readmem(VADDR, SYMBOL(vmemmap_list), &head, sizeof(unsigned long))) > > - return FALSE; > > - > > /* > > -* Get vmemmap list count and populate vmemmap regions info > > -*/ > > - info->vmemmap_cnt = get_vmemmap_list_info(head); > > - if (info->vmemmap_cnt == 0) > > - return FALSE; > > +* vmemmap_list symbol can be missing or set to 0 in the kernel. > > +* This would imply vmemmap region is mapped in the kernel pagetable. > > +* > > +* So, read vmemmap_list anyway, and use 'vmemmap_list' if it's not > > empty > > +* (head != NULL), or we will do a kernel pagetable walk for vmemmap > > address > > +* translation later > > +**/ > > + readmem(VADDR, SYMBOL(vmemmap_list), &head, sizeof(unsigned long)); > > + > > + if (head) { > > + /* > > +* Get vmemmap list count and populate vmemmap regions info > > +*/ > > + info->vmemmap_cnt = get_vmemmap_list_info(head); > > + if (info->vmemmap_cnt == 0) > > + return FALSE; > > + } > > > > info->flag_vmemmap = TRUE; > > return TRUE; > > @@ -347,29 +360,6 @@ ppc64_vmalloc_init(void) > > return TRUE; > > } > > > > -/* > > - * If the vmemmap address translation information is stored in the kernel, > > - * make the translation. > > - */ > > -static unsigned long long > > -ppc64_vmemmap_to_phys(unsigned long vaddr) > > -{ > > - int i; > > - ulong offset; > > - unsigned long long paddr = NOT_PADDR; > > - > > - for (i = 0; i < info->vmemmap_cnt; i++) { > > - if ((vaddr >= info->vmemmap_list[i].virt) && (vaddr < > > - (info->vmemmap_list[i].virt + info->vmemmap_psize))) { > > - offset = vaddr - info->vmemmap_list[i].virt; > > - paddr = info->vmemmap_list[i].phys + offset; > > - break; > > - } > > - } > > - > > - return paddr; > > -} > > - > > static unsigned long long > > ppc64_vtop_level4(unsigned long vaddr) > > { > > @@ -379,6 +369,8 @@ ppc64_vtop_level4(unsigned long vaddr) > > unsigned long long pgd_pte, pud_pte; > > unsigned long long pmd_pte, pte; > > unsigned long long paddr = NOT_PADDR; > > + uint is_hugepage = 0; > > + uint pdshift; > > uint swap = 0; > > > > if (info->page_buf == NULL) { > > @@ -413,6
Re: [PATCH] makedumpfile: ppc64: do page traversal if vmemmap_list not populated
On 2023/09/14 18:22, Aditya Gupta wrote: > Currently 'makedumpfile' fails to collect vmcore on upstream kernel, > with the errors: > > readpage_elf: Attempt to read non-existent page at 0x4000. > readmem: type_addr: 0, addr:0, size:8 > get_vmemmap_list_info: Can't get vmemmap region addresses > get_machdep_info_ppc64: Can't get vmemmap list info. > > This occurs since makedumpfile depends on 'vmemmap_list' for translating > vmemmap addresses. But with below commit in Linux, vmemmap_list can be > empty, in case of Radix MMU on PowerPC64 > > 368a0590d954: (powerpc/book3s64/vmemmap: switch radix to use a > different vmemmap handling function) > > In case vmemmap_list is empty, then it's head is NULL, which causes > makedumpfile to fail with above error. > > Since with above commit, 'vmemmap_list' is not populated (when MMU is > Radix MMU), kernel populates corresponding page table entries in kernel > page table. Hence, instead of depending on 'vmemmap_list' for address > translation for vmemmap addresses, do a kernel pagetable walk. > > And since the pte can also be introduced at higher levels in the page > table, such as at PMD level, add hugepage support, by checking for > PAGE_PTE flag > > Reported-by: Sachin Sant > Signed-off-by: Aditya Gupta Thank you for the patch, applied. https://github.com/makedumpfile/makedumpfile/commit/a34f017965583e89c4cb0b00117c200a6c191e54 Sorry for the delay, exceptionally busy this month.. Thanks, Kazu > --- > arch/ppc64.c | 111 ++--- > makedumpfile.h | 6 +++ > 2 files changed, 84 insertions(+), 33 deletions(-) > > diff --git a/arch/ppc64.c b/arch/ppc64.c > index 5e70acb51aba..9456b8b570c5 100644 > --- a/arch/ppc64.c > +++ b/arch/ppc64.c > @@ -196,6 +196,10 @@ ppc64_vmemmap_init(void) > int psize, shift; > ulong head; > > + /* initialise vmemmap_list in case SYMBOL(vmemmap_list) is not found */ > + info->vmemmap_list = NULL; > + info->vmemmap_cnt = 0; > + > if ((SYMBOL(vmemmap_list) == NOT_FOUND_SYMBOL) > || (SYMBOL(mmu_psize_defs) == NOT_FOUND_SYMBOL) > || (SYMBOL(mmu_vmemmap_psize) == NOT_FOUND_SYMBOL) > @@ -216,15 +220,24 @@ ppc64_vmemmap_init(void) > return FALSE; > info->vmemmap_psize = 1 << shift; > > - if (!readmem(VADDR, SYMBOL(vmemmap_list), &head, sizeof(unsigned long))) > - return FALSE; > - > /* > - * Get vmemmap list count and populate vmemmap regions info > - */ > - info->vmemmap_cnt = get_vmemmap_list_info(head); > - if (info->vmemmap_cnt == 0) > - return FALSE; > + * vmemmap_list symbol can be missing or set to 0 in the kernel. > + * This would imply vmemmap region is mapped in the kernel pagetable. > + * > + * So, read vmemmap_list anyway, and use 'vmemmap_list' if it's not > empty > + * (head != NULL), or we will do a kernel pagetable walk for vmemmap > address > + * translation later > + **/ > + readmem(VADDR, SYMBOL(vmemmap_list), &head, sizeof(unsigned long)); > + > + if (head) { > + /* > + * Get vmemmap list count and populate vmemmap regions info > + */ > + info->vmemmap_cnt = get_vmemmap_list_info(head); > + if (info->vmemmap_cnt == 0) > + return FALSE; > + } > > info->flag_vmemmap = TRUE; > return TRUE; > @@ -347,29 +360,6 @@ ppc64_vmalloc_init(void) > return TRUE; > } > > -/* > - * If the vmemmap address translation information is stored in the kernel, > - * make the translation. > - */ > -static unsigned long long > -ppc64_vmemmap_to_phys(unsigned long vaddr) > -{ > - int i; > - ulong offset; > - unsigned long long paddr = NOT_PADDR; > - > - for (i = 0; i < info->vmemmap_cnt; i++) { > - if ((vaddr >= info->vmemmap_list[i].virt) && (vaddr < > - (info->vmemmap_list[i].virt + info->vmemmap_psize))) { > - offset = vaddr - info->vmemmap_list[i].virt; > - paddr = info->vmemmap_list[i].phys + offset; > - break; > - } > - } > - > - return paddr; > -} > - > static unsigned long long > ppc64_vtop_level4(unsigned long vaddr) > { > @@ -379,6 +369,8 @@ ppc64_vtop_level4(unsigned long vaddr) > unsigned long long pgd_pte, pud_pte; > unsigned long long pmd_pte, pte; > unsigned long long paddr = NOT_PADDR; > + uint is_hugepage = 0; > + uint pdshift; > uint swap = 0; > > if (info->page_buf == NULL) { > @@ -413,6 +405,13 @@ ppc64_vtop_level4(unsigned long vaddr) > if (!pgd_pte) > return NOT_PADDR; > > + if (IS_HUGEPAGE(pgd_pte)) { > + is_hugepage = 1; > + pte = pgd_pte; > + pdshift = info->l4_shift; > + goto out; > + } >
Re: [PATCH] makedumpfile: ppc64: do page traversal if vmemmap_list not populated
> On 14-Sep-2023, at 2:52 PM, Aditya Gupta wrote: > > Currently 'makedumpfile' fails to collect vmcore on upstream kernel, > with the errors: > >readpage_elf: Attempt to read non-existent page at 0x4000. >readmem: type_addr: 0, addr:0, size:8 >get_vmemmap_list_info: Can't get vmemmap region addresses >get_machdep_info_ppc64: Can't get vmemmap list info. > > This occurs since makedumpfile depends on 'vmemmap_list' for translating > vmemmap addresses. But with below commit in Linux, vmemmap_list can be > empty, in case of Radix MMU on PowerPC64 > >368a0590d954: (powerpc/book3s64/vmemmap: switch radix to use a >different vmemmap handling function) > > In case vmemmap_list is empty, then it's head is NULL, which causes > makedumpfile to fail with above error. > > Since with above commit, 'vmemmap_list' is not populated (when MMU is > Radix MMU), kernel populates corresponding page table entries in kernel > page table. Hence, instead of depending on 'vmemmap_list' for address > translation for vmemmap addresses, do a kernel pagetable walk. > > And since the pte can also be introduced at higher levels in the page > table, such as at PMD level, add hugepage support, by checking for > PAGE_PTE flag > > Reported-by: Sachin Sant > Signed-off-by: Aditya Gupta > — Thanks for the fix. With this (and the corresponding kernel) fix applied, I am able to capture crash dump. Tested-by: Sachin Sant - Sachin ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] makedumpfile: ppc64: do page traversal if vmemmap_list not populated
To be able to test this on a system with Radix MMU, the kernel needs to be patched with given kernel patch, since makedumpfile requires 'cpu_features.mmu_features' to check if MMU was Radix MMU: https://lore.kernel.org/linuxppc-dev/20230911091409.415662-1-adit...@linux.ibm.com/T/#u Or, the vmlinux can be provided to makedumpfile, so that it finds the 'cur_cpu_spec.mmu_features' symbol, run makedumpfile with '-x' flag similar to this: makedumpfile -l -d 31 -x VMLINUX /proc/vmcore /var/crash/vmcore.kdump Thanks, Aditya Gupta ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec