Re: [PATCH] efi: Fix handling of multiple contiguous efi_fake_mem= entries
Hi Dan, On 12/30/19 at 11:58am, Dan Williams wrote: > A recent test of efi_fake_mem=4G@9G:0x4,4G@13G:0x4 crashed with > a signature of: > > BUG: unable to handle page fault for address: ff281000 > [..] > RIP: 0010:efi_memmap_insert+0x11d/0x191 > [..] > Call Trace: > ? bgrt_init+0xbe/0xbe > ? efi_arch_mem_reserve+0x1cb/0x228 > ? acpi_parse_bgrt+0xa/0xd > ? acpi_table_parse+0x86/0xb8 > ? acpi_boot_init+0x494/0x4e3 > ? acpi_parse_x2apic+0x87/0x87 > ? setup_acpi_sci+0xa2/0xa2 > ? setup_arch+0x8db/0x9e1 > ? start_kernel+0x6a/0x547 > ? secondary_startup_64+0xb6/0xc0 > > efi_memmap_insert() is attempting to insert entries past the end of the > new map. This condition is setup by efi_fake_mem() leaking empty entries > to the end of memory map, and then efi_arch_mem_reserve() trips over the > bad entry when attempting an additional efi_memmap_insert(). The empty > entry causes efi_memmap_insert() to attempt more memmap splits / copies > than efi_memmap_split_count() accounted for when sizing the new map. > > Update efi_fake_memmap() to cleanup lagging empty entries. > > This change is related to commit af1648984828 "x86/efi: Update e820 with > reserved EFI boot services data to fix kexec breakage" since that > introduces more occurrences where efi_memmap_insert() is invoked after > an efi_fake_mem= configuration has been parsed. Previously the side > effects of vestigial empty entries were benign, but with commit > af1648984828 that follow-on efi_memmap_insert() invocation triggers the > above crash signature. > > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option") > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot > services...") > Cc: Taku Izumi > Cc: Michael Weiser > Cc: Dave Young > Cc: Ard Biesheuvel > Cc: Thomas Gleixner > Cc: Ingo Molnar > Signed-off-by: Dan Williams > --- > drivers/firmware/efi/fake_mem.c | 22 +- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c > index bb9fc70d0cfa..6df51ba93ae8 100644 > --- a/drivers/firmware/efi/fake_mem.c > +++ b/drivers/firmware/efi/fake_mem.c > @@ -67,13 +67,33 @@ void __init efi_fake_memmap(void) > return; > } > > + memset(new_memmap, 0, efi.memmap.desc_size * new_nr_map); > for (i = 0; i < nr_fake_mem; i++) > efi_memmap_insert(, new_memmap, _fake_mems[i]); > > + /* > + * efi_memmap_split_count() may over count the number of > + * required splits in the case when contiguous fake entries are > + * specified. Check that all new_nr_map entries were consumed. > + */ > + for (i = new_nr_map; i > 0; i--) { > + efi_memory_desc_t *md; > + u64 start, end; > + > + md = new_memmap + efi.memmap.desc_size * (new_nr_map - i - 1); > + end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1; > + start = md->phys_addr; > + > + if (start == 0 && end + 1 == 0) > + continue; > + break; > + } > + > /* swap into new EFI memmap */ > early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map); > > - efi_memmap_install(new_memmap_phy, new_nr_map); > + /* install only the valid entries */ > + efi_memmap_install(new_memmap_phy, i); > > /* print new EFI memmap */ > efi_print_memmap(); > Although kernel bootup works with this patch, it still does not fix the issue I noticed, you can see: [root@localhost ~]# cat /proc/cmdline BOOT_IMAGE=/bzImage root=/dev/vda3 ro audit=0 selinux=0 crashkernel=160M efi=debug console=ttyS0 console=tty0 3 efi_fake_mem=200M@5G:0x4,300M@5600M:0x4 earlyprintk=serial [root@localhost ~]# dmesg|grep fake_mem [0.00] Command line: BOOT_IMAGE=/bzImage root=/dev/vda3 ro audit=0 selinux=0 crashkernel=160M efi=debug console=ttyS0 console=tty0 3 efi_fake_mem=200M@5G:0x4,300M@5600M:0x4 earlyprintk=serial [0.00] efi_fake_mem: add attr=0x0004 to [mem 0x00014000-0x00014c7f] [0.00] efi_fake_mem: add attr=0x0004 to [mem 0x00015e00-0x000170bf] [root@localhost ~]# dmesg|grep SP [0.085762] efi: mem48: [Conventional Memory| | |SP| | | | | | |WB|WT|WC|UC] range=[0x00015e00-0x000170bf] (300MB) With this patch applied, there is still only one md set "SP" attr. That means only the last insert worked. void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf, struct efi_mem_range *mem) The above function will use the old_memmap as the base for each inserting. the old_memmap == , so when you do below: for (i = 0; i < nr_fake_mem; i++) efi_memmap_insert(, new_memmap, _fake_mems[i]); Only the last inserting will take effect. Below
Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
Hi Dave, On 2019/12/28 17:32, Dave Young wrote: > On 12/27/19 at 07:04pm, Chen Zhou wrote: >> Hi Dave >> >> On 2019/12/27 13:54, Dave Young wrote: >>> Hi, >>> On 12/23/19 at 11:23pm, Chen Zhou wrote: In preparation for supporting reserve_crashkernel_low in arm64 as x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c. Note, in arm64, we reserve low memory if and only if crashkernel=X,low is specified. Different with x86_64, don't set low memory automatically. >>> >>> Do you have any reason for the difference? I'd expect we have same >>> logic if possible and remove some of the ifdefs. >> >> In x86_64, if we reserve crashkernel above 4G, then we call >> reserve_crashkernel_low() >> to reserve low memory. >> >> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of >> reserve_crashkernel() >> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() >> allocated something. >> In this case, if reserve crashkernel below 4G there will be 256M low memory >> set automatically >> and this needs extra considerations. > > Sorry that I did not read the old thread details and thought that is > arch dependent. But rethink about that, it would be better that we can > have same semantic about crashkernel parameters across arches. If we > make them different then it causes confusion, especially for > distributions. > > OTOH, I thought if we reserve high memory then the low memory should be > needed. There might be some exceptions, but I do not know the exact > one, can we make the behavior same, and special case those systems which > do not need low memory reservation. > I thought like this and did implement with crashkernel parameters arch independent. This is my v4: https://lkml.org/lkml/2019/5/6/1361, i implemented according to x86_64's behavior. >> >> previous discusses: >> https://lkml.org/lkml/2019/6/5/670 >> https://lkml.org/lkml/2019/6/13/229 > > Another concern from James: > " > With both crashk_low_res and crashk_res, we end up with two entries in > /proc/iomem called > "Crash kernel". Because its sorted by address, and kexec-tools stops > searching when it > find "Crash kernel", you are always going to get the kernel placed in the > lower portion. > " > > The kexec-tools code is iterating all "Crash kernel" ranges and add them > in an array. In X86 code, it uses the higher range to locate memory. We also discussed about this: https://lkml.org/lkml/2019/6/13/227. I guess James's opinion is that kexec-tools should take forward compatibility into account. "But we can't rely on people updating user-space when they update the kernel!" -- James > >> >>> Reported-by: kbuild test robot Signed-off-by: Chen Zhou --- arch/x86/kernel/setup.c| 62 - include/linux/crash_core.h | 3 ++ include/linux/kexec.h | 2 -- kernel/crash_core.c| 87 ++ kernel/kexec_core.c| 17 - 5 files changed, 96 insertions(+), 75 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index cedfe20..5f38942 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -486,59 +486,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) # define CRASH_ADDR_HIGH_MAX SZ_64T #endif -static int __init reserve_crashkernel_low(void) -{ -#ifdef CONFIG_X86_64 - unsigned long long base, low_base = 0, low_size = 0; - unsigned long total_low_mem; - int ret; - - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT)); - - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(boot_command_line, total_low_mem, _size, ); - if (ret) { - /* - * two parts from kernel/dma/swiotlb.c: - * -swiotlb size: user-specified with swiotlb= or default. - * - * -swiotlb overflow buffer: now hardcoded to 32k. We round it - * to 8M for other buffers that may need to stay low too. Also - * make sure we allocate enough extra low memory so that we - * don't run out of DMA buffers for 32-bit devices. - */ - low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20); - } else { - /* passed with crashkernel=0,low ? */ - if (!low_size) - return 0; - } - - low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN); - if (!low_base) { - pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n", - (unsigned long)(low_size >> 20)); - return -ENOMEM; - } - - ret = memblock_reserve(low_base,
Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage
On Mon, Dec 30, 2019 at 1:42 AM Dan Williams wrote: [..] > I'll send a patch to fix up efi_fake_memmap(). For others following this thread, that patch is here: http://lore.kernel.org/r/157773590338.4153451.5898675419563883883.st...@dwillia2-desk3.amr.corp.intel.com ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] efi: Fix handling of multiple contiguous efi_fake_mem= entries
A recent test of efi_fake_mem=4G@9G:0x4,4G@13G:0x4 crashed with a signature of: BUG: unable to handle page fault for address: ff281000 [..] RIP: 0010:efi_memmap_insert+0x11d/0x191 [..] Call Trace: ? bgrt_init+0xbe/0xbe ? efi_arch_mem_reserve+0x1cb/0x228 ? acpi_parse_bgrt+0xa/0xd ? acpi_table_parse+0x86/0xb8 ? acpi_boot_init+0x494/0x4e3 ? acpi_parse_x2apic+0x87/0x87 ? setup_acpi_sci+0xa2/0xa2 ? setup_arch+0x8db/0x9e1 ? start_kernel+0x6a/0x547 ? secondary_startup_64+0xb6/0xc0 efi_memmap_insert() is attempting to insert entries past the end of the new map. This condition is setup by efi_fake_mem() leaking empty entries to the end of memory map, and then efi_arch_mem_reserve() trips over the bad entry when attempting an additional efi_memmap_insert(). The empty entry causes efi_memmap_insert() to attempt more memmap splits / copies than efi_memmap_split_count() accounted for when sizing the new map. Update efi_fake_memmap() to cleanup lagging empty entries. This change is related to commit af1648984828 "x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage" since that introduces more occurrences where efi_memmap_insert() is invoked after an efi_fake_mem= configuration has been parsed. Previously the side effects of vestigial empty entries were benign, but with commit af1648984828 that follow-on efi_memmap_insert() invocation triggers the above crash signature. Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option") Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...") Cc: Taku Izumi Cc: Michael Weiser Cc: Dave Young Cc: Ard Biesheuvel Cc: Thomas Gleixner Cc: Ingo Molnar Signed-off-by: Dan Williams --- drivers/firmware/efi/fake_mem.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index bb9fc70d0cfa..6df51ba93ae8 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -67,13 +67,33 @@ void __init efi_fake_memmap(void) return; } + memset(new_memmap, 0, efi.memmap.desc_size * new_nr_map); for (i = 0; i < nr_fake_mem; i++) efi_memmap_insert(, new_memmap, _fake_mems[i]); + /* +* efi_memmap_split_count() may over count the number of +* required splits in the case when contiguous fake entries are +* specified. Check that all new_nr_map entries were consumed. +*/ + for (i = new_nr_map; i > 0; i--) { + efi_memory_desc_t *md; + u64 start, end; + + md = new_memmap + efi.memmap.desc_size * (new_nr_map - i - 1); + end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1; + start = md->phys_addr; + + if (start == 0 && end + 1 == 0) + continue; + break; + } + /* swap into new EFI memmap */ early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map); - efi_memmap_install(new_memmap_phy, new_nr_map); + /* install only the valid entries */ + efi_memmap_install(new_memmap_phy, i); /* print new EFI memmap */ efi_print_memmap(); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage
On 12/30/19 at 01:42am, Dan Williams wrote: > On Sat, Dec 28, 2019 at 10:13 PM Dan Williams > wrote: > > > > On Sat, Dec 28, 2019 at 12:54 PM Dan Williams > > wrote: > > > > > > On Tue, Dec 3, 2019 at 11:53 PM Dave Young wrote: > > > > > > > > Michael Weiser reported he got below error during a kexec rebooting: > > > > esrt: Unsupported ESRT version 2904149718861218184. > > > > > > > > The ESRT memory stays in EFI boot services data, and it was reserved > > > > in kernel via efi_mem_reserve(). The initial purpose of the reservation > > > > is to reuse the EFI boot services data across kexec reboot. For example > > > > the BGRT image data and some ESRT memory like Michael reported. > > > > > > > > But although the memory is reserved it is not updated in X86 e820 table. > > > > And kexec_file_load iterate system ram in io resource list to find > > > > places > > > > for kernel, initramfs and other stuff. In Michael's case the kexec > > > > loaded > > > > initramfs overwritten the ESRT memory and then the failure happened. > > > > > > > > Since kexec_file_load depends on the e820 to be updated, just fix this > > > > by updating the reserved EFI boot services memory as reserved type in > > > > e820. > > > > > > > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are > > > > bypassed in the reservation code path because they are assumed as > > > > reserved. > > > > But the reservation is still needed for multiple kexec reboot. > > > > And it is the only possible case we come here thus just drop the code > > > > chunk then everything works without side effects. > > > > > > > > On my machine the ESRT memory sits in an EFI runtime data range, it does > > > > not trigger the problem, but I successfully tested with BGRT instead. > > > > both kexec_load and kexec_file_load work and kdump works as well. > > > > > > > > Signed-off-by: Dave Young > > > > --- > > > > arch/x86/platform/efi/quirks.c |6 ++ > > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c > > > > +++ linux-x86/arch/x86/platform/efi/quirks.c > > > > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad > > > > return; > > > > } > > > > > > > > - /* No need to reserve regions that will never be freed. */ > > > > - if (md.attribute & EFI_MEMORY_RUNTIME) > > > > - return; > > > > - > > > > size += addr % EFI_PAGE_SIZE; > > > > size = round_up(size, EFI_PAGE_SIZE); > > > > addr = round_down(addr, EFI_PAGE_SIZE); > > > > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad > > > > early_memunmap(new, new_size); > > > > > > > > efi_memmap_install(new_phys, num_entries); > > > > + e820__range_update(addr, size, E820_TYPE_RAM, > > > > E820_TYPE_RESERVED); > > > > + e820__update_table(e820_table); > > > > } > > > > > > > > /* > > > > > > > > > > Bisect says this change (commit af1648984828) is triggering a > > > regression, likely not urgent, in my testing of the new efi_fake_mem= > > > facility to allow memory to be marked "soft reserved" via the kernel > > > command line (commit 199c84717612 x86/efi: Add efi_fake_mem support > > > for EFI_MEMORY_SP). The following command line triggers the crash > > > signature below: > > > > > > efi_fake_mem=4G@9G:0x4,4G@13G:0x4 > > > > > > However, this command line works ok: > > > > > > efi_fake_mem=8G@9G:0x4 > > > > > > So, something about multiple efi_fake_mem statements interacts badly > > > with this change. Nothing obvious occurs to me at the moment, I'll > > > keep debugging, but wanted to highlight this in the meantime in case > > > someone else sees a deeper issue or the root cause. > > > > Still looking, but this failure does not seem to be specific to the > > "soft reservation" changes. Any update to the efi memmap that pushes > > it over a page boundary triggers this failure. I.e. I can fix the > > problem by over-allocating the efi memmap and then page aligning the > > result. __early_ioremap "should" be handling this case, but it appears > > something else is messing this up. > > Found it. Neither this patch nor the soft reservation changes are at > fault, they are just helping to trigger a long standing bug in > efi_fake_memmap(). Its usage of efi_memmap_split_count() can over > count the number of splits needed for new entries. Consider the case > of 2 contiguous fake entries intersecting the end of a single entry. > The first call to efi_memmap_split_count() determines the resulting > split will be (old1, new1, old2), the second call determines (old1, > new2). The result is 2 splits when only 1 is needed to get a result of > (old1, new1, new2) and the new map ends up with an empty entry. > efi_memmap_install() interprets an empty entry as start = 0 end = > 0x and attempts an extra split / copy past the end of > the new map. > > I'll send a patch
Re: [PATCH] makedumpfile/s390: Add get_kaslr_offset() for s390x
> Hi Lianbo, > >> -Original Message- >> 在 2019年12月26日 11:38, lijiang 写道: >>> Hi, Kazu and Mikhail, >>> Hi Mikhail, > -Original Message- > Hi, > > On 12.12.2019 17:12, Kazuhito Hagio wrote: >> Hi Mikhail, >> >>> -Original Message- >>> Hello Kazu, >>> >>> I think we can try to generalize the kaslr offset extraction. >>> I won't speak for other architectures, but for s390 that >>> get_kaslr_offset_arm64() >>> should work fine. The only concern of mine is this TODO statement: >>> >>> if (_text <= vaddr && vaddr <= _end) { >>> DEBUG_MSG("info->kaslr_offset: %lx\n", info->kaslr_offset); >>> return info->kaslr_offset; >>> } else { >>> /* >>> * TODO: we need to check if it is vmalloc/vmmemmap/module >>> * address, we will have different offset >>> */ >>> return 0; >>> } >>> >>> Could you explain this one? >> >> Probably it was considered that the check would be needed to support >> the whole KASLR behavior when get_kaslr_offset_x86_64() was written >> originally. >> >> But in the current makedumpfile for x86_64 and arm64 supporting KASLR, >> the offset we need is the one for symbol addresses in vmlinux only. >> As I said below, module symbol addresses are retrieved from vmcore. >> Other addresses should not be passed to the function for now, as far >> as I know. >> >> So I think the TODO comment is confusing, and it would be better to >> remove it or change it to something like: >> /* >> * Returns 0 if vaddr does not need the offset to be added, >> * e.g. for module address. >> */ >> >> But if s390 uses get_kaslr_offset() in its arch-specific code to >> adjust addresses other than kernel text address, we might need to >> modify it for s390, not generalize it. > > Currently, s390 doesn't use get_kaslr_offset() in its arch-specific > code. OK, I pushed a patch that generalizes it to my test repository. Could you enable s390 to use it and test? https://github.com/k-hagio/makedumpfile/tree/add-get_kaslr_offset_general >>> >>> I enabled it on s390 as below and tested, it worked. > > Thank you for testing. > It's my pleasure. >>> >>> @@ -1075,7 +1075,7 @@ int is_iomem_phys_addr_s390x(unsigned long addr); >>> #define get_phys_base()stub_true() >>> #define get_machdep_info() get_machdep_info_s390x() >>> #define get_versiondep_info() stub_true() >>> -#define get_kaslr_offset(X)stub_false() >>> +#define get_kaslr_offset(X)get_kaslr_offset_general(X) >>> #define vaddr_to_paddr(X) vaddr_to_paddr_s390x(X) >>> >>> But, there is still a problem that needs to be improved. In the >>> find_kaslr_offsets(), >>> the value of SYMBOL(_stext) is always 0(zero) and it is passed to the >>> get_kaslr_offset(). >>> For the following code in the get_kaslr_offset_general(), it does not work >>> as expected. >>> ... >>> if (_text <= vaddr && vaddr <= _end) >>> return info->kaslr_offset; >>> else >>> return 0; > > I don't know why the SYMBOL(_stext) is passed to the get_kaslr_offset() > there, but > since the return value of get_kaslr_offset() is not used in the > find_kaslr_offsets(), > it's meaningless and not harmful. So it is not worth doing > READ_SYMBOL(_stext) there > for now. > Sounds good. >> >> In addition, the above code confused me, it will always return 0 on >> s390(please refer to my logs). > > The aim of get_kaslr_offset() here is only setting info->kaslr_offset to the > value > from vmcoreinfo for the SYMBOL_INIT() macro. > (get_kaslr_offset() returns the kaslr offset in the resolve_config_entry().) > Thanks for your explanation, Kazu. > But yeah, the get_kaslr_offset(SYMBOL(_stext)) is confusing and not good. > Passing 0 might be a bit better..? > Yes, looks good to me. Lianbo > Thanks, > Kazu > >> >> Thanks. >> >>> ... >>> Here is my log: >>> get_kaslr_offset_general: info->kaslr_offset: 67ebc000, _text:10, >>> _end:10ba000, vaddr:0 >>> >>> After applied the following patch, got the expected result. >>> int >>> find_kaslr_offsets() >>> { >>> @@ -3973,6 +4042,11 @@ find_kaslr_offsets() >>> * called this function between open_vmcoreinfo() and >>> * close_vmcoreinfo() >>> */ >>> + READ_SYMBOL("_stext", _stext); >>> + if (SYMBOL(_stext) == NOT_FOUND_SYMBOL) { >>> +ERRMSG("Can't get the symbol of _stext.\n"); >>> +goto out; >>> + } >>> get_kaslr_offset(SYMBOL(_stext)); >>> >>> Here is my log: >>> get_kaslr_offset_general: info->kaslr_offset: 67ebc000, _text:10, >>> _end:10ba000, vaddr:67fbc000 >>> >>> Basically, before using the value of SYMBOL(_stext), need to ensure that >>> the SYMBOL(_stext) is parsed >>> correctly.
Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage
On Sat, Dec 28, 2019 at 10:13 PM Dan Williams wrote: > > On Sat, Dec 28, 2019 at 12:54 PM Dan Williams > wrote: > > > > On Tue, Dec 3, 2019 at 11:53 PM Dave Young wrote: > > > > > > Michael Weiser reported he got below error during a kexec rebooting: > > > esrt: Unsupported ESRT version 2904149718861218184. > > > > > > The ESRT memory stays in EFI boot services data, and it was reserved > > > in kernel via efi_mem_reserve(). The initial purpose of the reservation > > > is to reuse the EFI boot services data across kexec reboot. For example > > > the BGRT image data and some ESRT memory like Michael reported. > > > > > > But although the memory is reserved it is not updated in X86 e820 table. > > > And kexec_file_load iterate system ram in io resource list to find places > > > for kernel, initramfs and other stuff. In Michael's case the kexec loaded > > > initramfs overwritten the ESRT memory and then the failure happened. > > > > > > Since kexec_file_load depends on the e820 to be updated, just fix this > > > by updating the reserved EFI boot services memory as reserved type in > > > e820. > > > > > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are > > > bypassed in the reservation code path because they are assumed as > > > reserved. > > > But the reservation is still needed for multiple kexec reboot. > > > And it is the only possible case we come here thus just drop the code > > > chunk then everything works without side effects. > > > > > > On my machine the ESRT memory sits in an EFI runtime data range, it does > > > not trigger the problem, but I successfully tested with BGRT instead. > > > both kexec_load and kexec_file_load work and kdump works as well. > > > > > > Signed-off-by: Dave Young > > > --- > > > arch/x86/platform/efi/quirks.c |6 ++ > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c > > > +++ linux-x86/arch/x86/platform/efi/quirks.c > > > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad > > > return; > > > } > > > > > > - /* No need to reserve regions that will never be freed. */ > > > - if (md.attribute & EFI_MEMORY_RUNTIME) > > > - return; > > > - > > > size += addr % EFI_PAGE_SIZE; > > > size = round_up(size, EFI_PAGE_SIZE); > > > addr = round_down(addr, EFI_PAGE_SIZE); > > > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad > > > early_memunmap(new, new_size); > > > > > > efi_memmap_install(new_phys, num_entries); > > > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); > > > + e820__update_table(e820_table); > > > } > > > > > > /* > > > > > > > Bisect says this change (commit af1648984828) is triggering a > > regression, likely not urgent, in my testing of the new efi_fake_mem= > > facility to allow memory to be marked "soft reserved" via the kernel > > command line (commit 199c84717612 x86/efi: Add efi_fake_mem support > > for EFI_MEMORY_SP). The following command line triggers the crash > > signature below: > > > > efi_fake_mem=4G@9G:0x4,4G@13G:0x4 > > > > However, this command line works ok: > > > > efi_fake_mem=8G@9G:0x4 > > > > So, something about multiple efi_fake_mem statements interacts badly > > with this change. Nothing obvious occurs to me at the moment, I'll > > keep debugging, but wanted to highlight this in the meantime in case > > someone else sees a deeper issue or the root cause. > > Still looking, but this failure does not seem to be specific to the > "soft reservation" changes. Any update to the efi memmap that pushes > it over a page boundary triggers this failure. I.e. I can fix the > problem by over-allocating the efi memmap and then page aligning the > result. __early_ioremap "should" be handling this case, but it appears > something else is messing this up. Found it. Neither this patch nor the soft reservation changes are at fault, they are just helping to trigger a long standing bug in efi_fake_memmap(). Its usage of efi_memmap_split_count() can over count the number of splits needed for new entries. Consider the case of 2 contiguous fake entries intersecting the end of a single entry. The first call to efi_memmap_split_count() determines the resulting split will be (old1, new1, old2), the second call determines (old1, new2). The result is 2 splits when only 1 is needed to get a result of (old1, new1, new2) and the new map ends up with an empty entry. efi_memmap_install() interprets an empty entry as start = 0 end = 0x and attempts an extra split / copy past the end of the new map. I'll send a patch to fix up efi_fake_memmap(). ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec