Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
On 05/26/2017 at 10:49 AM, Dave Young wrote: > Ccing Xunlei he is reading the patches see what need to be done for > kdump. There should still be several places to handle to make kdump work. > > On 05/18/17 at 07:01pm, Borislav Petkov wrote: >> On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote: >>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can >>> determine if SME is active. >> But why do user-space tools need to know that? >> >> I mean, when we load the kdump kernel, we do it with the first kernel, >> with the kexec_load() syscall, AFAICT. And that code does a lot of >> things during that init, like machine_kexec_prepare()->init_pgtable() to >> prepare the ident mapping of the second kernel, for example. >> >> What I'm aiming at is that the first kernel knows *exactly* whether SME >> is enabled or not and doesn't need to tell the second one through some >> sysfs entries - it can do that during loading. >> >> So I don't think we need any userspace things at all... > If kdump kernel can get the SME status from hardware register then this > should be not necessary and this patch can be dropped. Yes, I also agree with dropping this one. Regards, Xunlei ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
On 04/19/2017 at 05:21 AM, Tom Lendacky wrote: > Provide support so that kexec can be used to boot a kernel when SME is > enabled. > > Support is needed to allocate pages for kexec without encryption. This > is needed in order to be able to reboot in the kernel in the same manner > as originally booted. Hi Tom, Looks like kdump will break, I didn't see the similar handling for kdump cases, see kernel: kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc. We need to support kdump with SME, kdump kernel/initramfs/purgatory/elfcorehdr/etc are all loaded into the reserved memory(see crashkernel=X) by userspace kexec-tools. I think a straightforward way would be to mark the whole reserved memory range without encryption before loading all the kexec segments for kdump, I guess we can handle this easily in arch_kexec_unprotect_crashkres(). Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be remapped to the encrypted data. Regards, Xunlei > > Additionally, when shutting down all of the CPUs we need to be sure to > flush the caches and then halt. This is needed when booting from a state > where SME was not active into a state where SME is active (or vice-versa). > Without these steps, it is possible for cache lines to exist for the same > physical location but tagged both with and without the encryption bit. This > can cause random memory corruption when caches are flushed depending on > which cacheline is written last. > > Signed-off-by: Tom Lendacky > --- > arch/x86/include/asm/init.h |1 + > arch/x86/include/asm/irqflags.h |5 + > arch/x86/include/asm/kexec.h |8 > arch/x86/include/asm/pgtable_types.h |1 + > arch/x86/kernel/machine_kexec_64.c | 35 > +- > arch/x86/kernel/process.c| 26 +++-- > arch/x86/mm/ident_map.c | 11 +++ > include/linux/kexec.h| 14 ++ > kernel/kexec_core.c |7 +++ > 9 files changed, 101 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h > index 737da62..b2ec511 100644 > --- a/arch/x86/include/asm/init.h > +++ b/arch/x86/include/asm/init.h > @@ -6,6 +6,7 @@ struct x86_mapping_info { > void *context; /* context for alloc_pgt_page */ > unsigned long pmd_flag; /* page flag for PMD entry */ > unsigned long offset;/* ident mapping offset */ > + unsigned long kernpg_flag; /* kernel pagetable flag override */ > }; > > int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page, > diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h > index ac7692d..38b5920 100644 > --- a/arch/x86/include/asm/irqflags.h > +++ b/arch/x86/include/asm/irqflags.h > @@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void) > asm volatile("hlt": : :"memory"); > } > > +static inline __cpuidle void native_wbinvd_halt(void) > +{ > + asm volatile("wbinvd; hlt" : : : "memory"); > +} > + > #endif > > #ifdef CONFIG_PARAVIRT > diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h > index 70ef205..e8183ac 100644 > --- a/arch/x86/include/asm/kexec.h > +++ b/arch/x86/include/asm/kexec.h > @@ -207,6 +207,14 @@ struct kexec_entry64_regs { > uint64_t r15; > uint64_t rip; > }; > + > +extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, > +gfp_t gfp); > +#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages > + > +extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages); > +#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages > + > #endif > > typedef void crash_vmclear_fn(void); > diff --git a/arch/x86/include/asm/pgtable_types.h > b/arch/x86/include/asm/pgtable_types.h > index ce8cb1c..0f326f4 100644 > --- a/arch/x86/include/asm/pgtable_types.h > +++ b/arch/x86/include/asm/pgtable_types.h > @@ -213,6 +213,7 @@ enum page_cache_mode { > #define PAGE_KERNEL __pgprot(__PAGE_KERNEL | _PAGE_ENC) > #define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC) > #define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC) > +#define PAGE_KERNEL_EXEC_NOENC __pgprot(__PAGE_KERNEL_EXEC) > #define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC) > #define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC) > #define PAGE_KERNEL_LARGE__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC) > diff --git a/arch/x86/kernel/machine_kexec_64.c > b/arch/x86/kernel/machine_kexec_64.c > index 085c3b3..11c0ca9 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, > pgd_t *pgd) > set_pmd(pmd, __pmd(__pa(
[Makedumpfile PATCH v4 0/2] Fix refiltering when kaslr enabled
Hi All, We came across another failure in makedumpfile when kaslr is enabled. This failure occurs when we try re-filtering. We try to erase some symbol from a dumpfile which was copied/compressed from /proc/vmcore using makedumpfile. We have very limited symbol information in vmcoreinfo. So symbols to be erased may not be available in vmcoreinfo and we look for it in vmlinux. However, symbol address from vmlinux is a static address which differs from run time address with KASLR_OFFSET. Therefore, reading any "virtual address of vmlinux" from vmcore is not possible. These patches finds runtime KASLR offset and then calculates run time address of symbols read from vmlinux. Hatayama Daisuke also found some issue [1] when he was working with a sadump and virsh dump of a none kaslr kernel. Patch 2/2 of this series has been improved to take care of those issues as well. [1]http://lists.infradead.org/pipermail/kexec/2017-May/018833.html Thanks ~Pratyush v1->v2: - reading KERNELOFFSET from vmcoreinfo now instead of calculating it from _stext v2->v3: - Fixed initialization of info->file_vmcoreinfo - Improved page_offset calculation logic to take care of different dump scenarios. v3->v4: - Removed info->kaslr_offset write to VMCOREINFO Pratyush Anand (2): makedumpfile: add runtime kaslr offset if it exists x86_64: calculate page_offset in case of re-filtering/sadump/virsh dump arch/x86_64.c | 72 -- erase_info.c | 1 + makedumpfile.c | 46 + makedumpfile.h | 16 + 4 files changed, 128 insertions(+), 7 deletions(-) -- 2.9.3 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[Makedumpfile PATCH v4 1/2] makedumpfile: add runtime kaslr offset if it exists
If we have to erase a symbol from vmcore whose address is not present in vmcoreinfo, then we need to pass vmlinux as well to get the symbol address. When kaslr is enabled, virtual address of all the kernel symbols are randomized with an offset. vmlinux always has a static address, but all the arch specific calculation are based on run time kernel address. So we need to find a way to translate symbol address from vmlinux to kernel run time address. without this patch: # cat > scrub.conf << EOF [vmlinux] erase jiffies erase init_task.utime for tsk in init_task.tasks.next within task_struct:tasks erase tsk.utime endfor EOF # makedumpfile --split -d 5 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3} readpage_kdump_compressed: pfn(f97ea) is excluded from vmcore. readmem: type_addr: 1, addr:f97eaff8, size:8 vtop4_x86_64: Can't get pml4 (page_dir:f97eaff8). readmem: Can't convert a virtual address(819f1284) to physical address. readmem: type_addr: 0, addr:819f1284, size:390 check_release: Can't get the address of system_utsname. After this patch check_release() is ok, and also we are able to erase symbol from vmcore. Signed-off-by: Pratyush Anand --- arch/x86_64.c | 36 erase_info.c | 1 + makedumpfile.c | 46 ++ makedumpfile.h | 16 4 files changed, 99 insertions(+) diff --git a/arch/x86_64.c b/arch/x86_64.c index e978a36f8878..fd2e8ac154d6 100644 --- a/arch/x86_64.c +++ b/arch/x86_64.c @@ -33,6 +33,42 @@ get_xen_p2m_mfn(void) return NOT_FOUND_LONG_VALUE; } +unsigned long +get_kaslr_offset_x86_64(unsigned long vaddr) +{ + unsigned int i; + char buf[BUFSIZE_FGETS], *endp; + + if (!info->kaslr_offset && info->file_vmcoreinfo) { + if (fseek(info->file_vmcoreinfo, 0, SEEK_SET) < 0) { + ERRMSG("Can't seek the vmcoreinfo file(%s). %s\n", + info->name_vmcoreinfo, strerror(errno)); + return FALSE; + } + + while (fgets(buf, BUFSIZE_FGETS, info->file_vmcoreinfo)) { + i = strlen(buf); + if (!i) + break; + if (buf[i - 1] == '\n') + buf[i - 1] = '\0'; + if (strncmp(buf, STR_KERNELOFFSET, + strlen(STR_KERNELOFFSET)) == 0) + info->kaslr_offset = + strtoul(buf+strlen(STR_KERNELOFFSET),&endp,16); + } + } + if (vaddr >= __START_KERNEL_map && + vaddr < __START_KERNEL_map + info->kaslr_offset) + return info->kaslr_offset; + else + /* +* TODO: we need to check if it is vmalloc/vmmemmap/module +* address, we will have different offset +*/ + return 0; +} + static int get_page_offset_x86_64(void) { diff --git a/erase_info.c b/erase_info.c index f2ba9149e93e..60abfa1a1adf 100644 --- a/erase_info.c +++ b/erase_info.c @@ -1088,6 +1088,7 @@ resolve_config_entry(struct config_entry *ce, unsigned long long base_vaddr, ce->line, ce->name); return FALSE; } + ce->sym_addr += get_kaslr_offset(ce->sym_addr); ce->type_name = get_symbol_type_name(ce->name, DWARF_INFO_GET_SYMBOL_TYPE, &ce->size, &ce->type_flag); diff --git a/makedumpfile.c b/makedumpfile.c index 301772a8820c..9babf1a07154 100644 --- a/makedumpfile.c +++ b/makedumpfile.c @@ -3782,6 +3782,46 @@ free_for_parallel() } int +find_kaslr_offsets() +{ + off_t offset; + unsigned long size; + int ret = FALSE; + + get_vmcoreinfo(&offset, &size); + + if (!(info->name_vmcoreinfo = strdup(FILENAME_VMCOREINFO))) { + MSG("Can't duplicate strings(%s).\n", FILENAME_VMCOREINFO); + return FALSE; + } + if (!copy_vmcoreinfo(offset, size)) + goto out; + + if (!open_vmcoreinfo("r")) + goto out; + + unlink(info->name_vmcoreinfo); + + /* +* This arch specific function should update info->kaslr_offset. If +* kaslr is not enabled then offset will be set to 0. arch specific +* function might need to read from vmcoreinfo, therefore we have +* called this function between open_vmcoreinfo() and +* close_vmcoreinfo() +*/ + get_kaslr_offset(SYMBOL(_stext)); + + close_vmcoreinfo(); + + ret = TRUE; +out: + free(info->name_vmcoreinfo); + info->name_vmcoreinfo = NULL; + + return ret; +
[Makedumpfile PATCH v4 2/2] x86_64: calculate page_offset in case of re-filtering/sadump/virsh dump
we do not call get_elf_info() in case of refiltering and sadump. Therefore, we will not have any pt_load in that case, and so we get: get_page_offset_x86_64: Can't get any pt_load to calculate page offset. However, we will have vmcoreinfo and vmlinux information in case of re-filtering. So, we are able to find kaslr offset and we can get page_offset_base address. Thus we can read the page offset as well. If kaslr is not enabled and also we do not have valid PT_LOAD to calculate page offset then use old method to find fixed page offset. In case of virsh dump virtual addresses in PT_LOAD are 0. Ignore such addresses for the page_offset calculation. Suggested-by: HATAYAMA Daisuke Signed-off-by: Pratyush Anand --- arch/x86_64.c | 36 +--- 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/arch/x86_64.c b/arch/x86_64.c index fd2e8ac154d6..18384a8dd684 100644 --- a/arch/x86_64.c +++ b/arch/x86_64.c @@ -75,17 +75,39 @@ get_page_offset_x86_64(void) int i; unsigned long long phys_start; unsigned long long virt_start; + unsigned long page_offset_base; + + if (info->kaslr_offset) { + page_offset_base = get_symbol_addr("page_offset_base"); + page_offset_base += info->kaslr_offset; + if (!readmem(VADDR, page_offset_base, &info->page_offset, + sizeof(info->page_offset))) { +ERRMSG("Can't read page_offset_base.\n"); +return FALSE; + } + return TRUE; + } - for (i = 0; get_pt_load(i, &phys_start, NULL, &virt_start, NULL); i++) { - if (virt_start < __START_KERNEL_map - && phys_start != NOT_PADDR) { - info->page_offset = virt_start - phys_start; - return TRUE; + if (get_num_pt_loads()) { + for (i = 0; + get_pt_load(i, &phys_start, NULL, &virt_start, NULL); + i++) { + if (virt_start != NOT_KV_ADDR + && virt_start < __START_KERNEL_map + && phys_start != NOT_PADDR) { + info->page_offset = virt_start - phys_start; + return TRUE; + } } } - ERRMSG("Can't get any pt_load to calculate page offset.\n"); - return FALSE; + if (info->kernel_version < KERNEL_VERSION(2, 6, 27)) { + info->page_offset = __PAGE_OFFSET_ORIG; + } else { + info->page_offset = __PAGE_OFFSET_2_6_27; + } + + return TRUE; } int -- 2.9.3 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [Makedumpfile PATCH v3 1/2] makedumpfile: add runtime kaslr offset if it exists
On Friday 26 May 2017 07:17 AM, Atsushi Kumagai wrote: write_vmcoreinfo_data(void) { /* +* write 1st kernel's KERNELOFFSET +*/ + if (info->kaslr_offset) + fprintf(info->file_vmcoreinfo, "%s%lx\n", STR_KERNELOFFSET, + info->kaslr_offset); When will this data written to VMCOREINFO file be used ? info->kaslr_offset is necessary for vmlinux but -x and -i are exclusive. This is what I thought: Lets says we have got a vmcore1 after re-filtering original vmcore. Now, if we would like to re-filter vmcore1 then we will need kaslr_offset again. So, should we not right kaslr_offset in vmcoreinfo of vmcore1 as well? write_vmcoreinfo_data() is called only for -g option, it makes a VMCOREINFO file as a separate file, it doesn't overwrite VMCOREINFO in vmcore. OK..got it. Will remove this function and send v4. Thanks ~Pratyush ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
Ccing Xunlei he is reading the patches see what need to be done for kdump. There should still be several places to handle to make kdump work. On 05/18/17 at 07:01pm, Borislav Petkov wrote: > On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote: > > Add sysfs support for SME so that user-space utilities (kdump, etc.) can > > determine if SME is active. > > But why do user-space tools need to know that? > > I mean, when we load the kdump kernel, we do it with the first kernel, > with the kexec_load() syscall, AFAICT. And that code does a lot of > things during that init, like machine_kexec_prepare()->init_pgtable() to > prepare the ident mapping of the second kernel, for example. > > What I'm aiming at is that the first kernel knows *exactly* whether SME > is enabled or not and doesn't need to tell the second one through some > sysfs entries - it can do that during loading. > > So I don't think we need any userspace things at all... If kdump kernel can get the SME status from hardware register then this should be not necessary and this patch can be dropped. Thanks Dave ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
RE: [Makedumpfile PATCH v3 1/2] makedumpfile: add runtime kaslr offset if it exists
>>> diff --git a/makedumpfile.c b/makedumpfile.c >>> index 301772a8820c..4986d098d69a 100644 >>> --- a/makedumpfile.c >>> +++ b/makedumpfile.c >>> @@ -2099,6 +2099,13 @@ void >>> write_vmcoreinfo_data(void) >>> { >>> /* >>> +* write 1st kernel's KERNELOFFSET >>> +*/ >>> + if (info->kaslr_offset) >>> + fprintf(info->file_vmcoreinfo, "%s%lx\n", STR_KERNELOFFSET, >>> + info->kaslr_offset); >> >> When will this data written to VMCOREINFO file be used ? >> info->kaslr_offset is necessary for vmlinux but -x and -i are exclusive. > >This is what I thought: > >Lets says we have got a vmcore1 after re-filtering original vmcore. Now, if we >would like to re-filter vmcore1 then we will need kaslr_offset again. So, >should we not right kaslr_offset in vmcoreinfo of vmcore1 as well? write_vmcoreinfo_data() is called only for -g option, it makes a VMCOREINFO file as a separate file, it doesn't overwrite VMCOREINFO in vmcore. if (info->flag_generate_vmcoreinfo) generate_vmcoreinfo() + write_vmcoreinfo_data() find_kaslr_offsets() doesn't refer to the separate VMCOREINFO file, writing STR_KERNELOFFSET in it is meaningless. Thanks, Atsushi Kumagai ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
RE: [Makedumpfile Patch] Fix get_kcore_dump_loads() error case
>commit f10d1e2e94c50 introduced another bug while fixing memory leak. >Use the braces with if condition. Thanks, I'll merge this into v1.6.2 Atsushi Kumagai >Signed-off-by: Pratyush Anand >--- > elf_info.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > >diff --git a/elf_info.c b/elf_info.c >index 601d66e3f176..69b1719b020f 100644 >--- a/elf_info.c >+++ b/elf_info.c >@@ -893,9 +893,10 @@ int get_kcore_dump_loads(void) > if (p->phys_start == NOT_PADDR > || !is_phys_addr(p->virt_start)) > continue; >- if (j >= loads) >+ if (j >= loads) { > free(pls); > return FALSE; >+ } > > if (j == 0) { > offset_pt_load_memory = p->file_offset; >-- >2.9.3 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place
On 5/18/2017 7:46 AM, Borislav Petkov wrote: On Tue, Apr 18, 2017 at 04:21:49PM -0500, Tom Lendacky wrote: Add the support to encrypt the kernel in-place. This is done by creating new page mappings for the kernel - a decrypted write-protected mapping and an encrypted mapping. The kernel is encrypted by copying it through a temporary buffer. Signed-off-by: Tom Lendacky --- arch/x86/include/asm/mem_encrypt.h |6 + arch/x86/mm/Makefile |2 arch/x86/mm/mem_encrypt.c | 262 arch/x86/mm/mem_encrypt_boot.S | 151 + 4 files changed, 421 insertions(+) create mode 100644 arch/x86/mm/mem_encrypt_boot.S diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h index b406df2..8f6f9b4 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -31,6 +31,12 @@ static inline u64 sme_dma_mask(void) return ((u64)sme_me_mask << 1) - 1; } +void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr, +unsigned long decrypted_kernel_vaddr, +unsigned long kernel_len, +unsigned long encryption_wa, +unsigned long encryption_pgd); + void __init sme_early_encrypt(resource_size_t paddr, unsigned long size); void __init sme_early_decrypt(resource_size_t paddr, diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 9e13841..0633142 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -38,3 +38,5 @@ obj-$(CONFIG_NUMA_EMU)+= numa_emulation.o obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o + +obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index 30b07a3..0ff41a4 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -24,6 +24,7 @@ #include #include #include +#include /* * Since SME related variables are set early in the boot process they must @@ -216,8 +217,269 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned long size) set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT); } +void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start, static Yup. + unsigned long end) +{ + unsigned long addr = start; + pgdval_t *pgd_p; + + while (addr < end) { + unsigned long pgd_end; + + pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE; + if (pgd_end > end) + pgd_end = end; + + pgd_p = (pgdval_t *)pgd_base + pgd_index(addr); + *pgd_p = 0; Hmm, so this is a contiguous range from [start:end] which translates to 8-byte PGD pointers in the PGD page so you can simply memset that range, no? Instead of iterating over each one? I guess I could do that, but this will probably only end up clearing a single PGD entry anyway since it's highly doubtful the address range would cross a 512GB boundary. + + addr = pgd_end; + } +} + +#define PGD_FLAGS _KERNPG_TABLE_NOENC +#define PUD_FLAGS _KERNPG_TABLE_NOENC +#define PMD_FLAGS (__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL) + +static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area, +unsigned long vaddr, pmdval_t pmd_val) +{ + pgdval_t pgd, *pgd_p; + pudval_t pud, *pud_p; + pmdval_t pmd, *pmd_p; You should use the enclosing type, not the underlying one. I.e., pgd_t *pgd; pud_t *pud; ... and then the macros native_p*d_val(), p*d_offset() and so on. I say native_* because we don't want to have any paravirt nastyness here. I believe your previous version was using the proper interfaces. I won't be able to use the p*d_offset() macros since they use __va() and we're identity mapped during this time (which is why I would guess the proposed changes for the 5-level pagetables in arch/x86/kernel/head64.c, __startup_64, don't use these macros either). I should be able to use the native_set_p*d() and others though, I'll look into that. And the kernel has gotten 5-level pagetables support in the meantime, so this'll need to start at p4d AFAICT. arch/x86/mm/fault.c::dump_pagetable() looks like a good example to stare at. Yeah, I accounted for that in the other parts of the code but I need to do that here also. + pgd_p = (pgdval_t *)pgd_base + pgd_index(vaddr); + pgd = *pgd_p; + if (pgd) { + pud_p = (pudval_t *)(pgd & ~PTE_FLAGS_MASK); + } else { + pud_p = pgtable_area; + memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD); + pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD; + + *pgd_p = (pgdval_t)pud_p + PGD_FLAGS
Re: [PATCH 1/2] sadump: set info->page_size before cache_init()
On Tuesday 23 May 2017 08:22 AM, Hatayama, Daisuke wrote: Currently, makedumpfile results in Segmentation fault on sadump dump files as follows: # LANG=C makedumpfile -f --message-level=31 -ld31 -x vmlinux ./sadump_vmcore sadump_vmcore-ld31 sadump: read dump device as single partition sadump: single partition configuration page_size: 4096 sadump: timezone information is missing Segmentation fault By bisect, I found that this issue is caused by the following commit that moves invocation of cache_init() in initial() a bit early: # git bisect bad 8e2834bac4f62da3894da297f083068431be6d80 is the first bad commit commit 8e2834bac4f62da3894da297f083068431be6d80 Author: Pratyush Anand Date: Thu Mar 2 17:37:11 2017 +0900 [PATCH v3 2/7] initial(): call cache_init() a bit early Call cache_init() before get_kcore_dump_loads(), because latter uses cache_search(). Call path is like this : get_kcore_dump_loads() -> process_dump_load() -> vaddr_to_paddr() -> vtop4_x86_64() -> readmem() -> cache_search() Signed-off-by: Pratyush Anand :100644 100644 6942047199deb09dd1fff2121e264584dbb05587 3b8e9810468de26b0d8b73d456f0bd4f3d3aa2fe M makedumpfile.c In this timing, on sadump vmcores, info->page_size has not been initialized yet so has 0. So, malloc() in cache_init() returns a chunk of 0 size. A bit later, info->page_size is initialized with 4096. Later processing on cache.c behaves assuming the chunk size is 8 * 4096. This destroys objects allocated after the chunk, resulting in the above Segmentation fault. To fix this issue, this commit moves setting info->page_size before cache_init(). Signed-off-by: HATAYAMA Daisuke Cc: Pratyush Anand For 1/2 Reviewed-by: Pratyush Anand --- makedumpfile.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/makedumpfile.c b/makedumpfile.c index 301772a..f300b19 100644 --- a/makedumpfile.c +++ b/makedumpfile.c @@ -3878,6 +3878,9 @@ initial(void) if (!get_value_for_old_linux()) return FALSE; + if (info->flag_sadump && !set_page_size(sadump_page_size())) + return FALSE; + if (!is_xen_memory() && !cache_init()) return FALSE; @@ -3906,9 +3909,6 @@ initial(void) return FALSE; } - if (!set_page_size(sadump_page_size())) - return FALSE; - if (!sadump_initialize_bitmap_memory()) return FALSE; ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[Makedumpfile Patch] Fix get_kcore_dump_loads() error case
commit f10d1e2e94c50 introduced another bug while fixing memory leak. Use the braces with if condition. Signed-off-by: Pratyush Anand --- elf_info.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/elf_info.c b/elf_info.c index 601d66e3f176..69b1719b020f 100644 --- a/elf_info.c +++ b/elf_info.c @@ -893,9 +893,10 @@ int get_kcore_dump_loads(void) if (p->phys_start == NOT_PADDR || !is_phys_addr(p->virt_start)) continue; - if (j >= loads) + if (j >= loads) { free(pls); return FALSE; + } if (j == 0) { offset_pt_load_memory = p->file_offset; -- 2.9.3 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec