Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage
On Thu, Dec 05, 2019 at 06:55:45PM +0800, Dave Young wrote: > >esrt: Unsupported ESRT version 2904149718861218184. > > > > The ESRT memory stays in EFI boot services data, and it was reserved > > in kernel via efi_mem_reserve(). The initial purpose of the reservation > > is to reuse the EFI boot services data across kexec reboot. For example > > the BGRT image data and some ESRT memory like Michael reported. > > > > But although the memory is reserved it is not updated in the X86 E820 > > table, > > and kexec_file_load() iterates system RAM in the IO resource list to find > > places > > for kernel, initramfs and other stuff. In Michael's case the kexec loaded > > initramfs overwrote the ESRT memory and then the failure happened. > > > > Since kexec_file_load() depends on the E820 table being updated, just fix > > this > > by updating the reserved EFI boot services memory as reserved type in E820. > Thanks for the amending, also thank all for the review and test. Same from me, particularly everyone's patience with my haphazard guesswork around an area I clearly know nothing about. :) -- Thanks, Michael ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage
Hello Dave, On Wed, Dec 04, 2019 at 03:59:17PM +0800, Dave Young wrote: > > Signed-off-by: Dave Young > > --- > > arch/x86/platform/efi/quirks.c |6 ++ > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c > > +++ linux-x86/arch/x86/platform/efi/quirks.c > > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad > > return; > > } > > > > - /* No need to reserve regions that will never be freed. */ > > - if (md.attribute & EFI_MEMORY_RUNTIME) > > - return; > > - > > size += addr % EFI_PAGE_SIZE; > > size = round_up(size, EFI_PAGE_SIZE); > > addr = round_down(addr, EFI_PAGE_SIZE); > > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad > > early_memunmap(new, new_size); > > > > efi_memmap_install(new_phys, num_entries); > > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); > > + e820__update_table(e820_table); > > } > > > > /* > Michael, could you a one more test and provide a tested-by if it works > for you? Did three successful kexecs in sequence of mainline 5.4.0 plus the patch (had problems getting recent -next to boot on my machine). ESRT region stayed reserved and intact so that the "Invalid version" error message is gone. Tested-by: Michael Weiser -- Thanks! Michael ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: kexec_file overwrites reserved EFI ESRT memory
Hi Dave, On Tue, Dec 03, 2019 at 07:54:35PM +0800, Dave Young wrote: > > Neither adding add_efi_memmap nor adding your patch and setting that option > > does make the ESRT memory region appear in /proc/iomem. kexec_file still > > loads the kernel across the ESRT region. > Hmm, sorry, my bad, actuall add_efi_memmap does not consider the > EFI_MEMORY_RUNTIME attribute, it only reads the memory descriptor types. > Will read your replied information later, did not get time today, but > probably below chunk can help? > diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c > index 3b9fd679cea9..516307617621 100644 > --- a/arch/x86/platform/efi/quirks.c > +++ b/arch/x86/platform/efi/quirks.c > @@ -293,6 +293,8 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 > size) > early_memunmap(new, new_size); > efi_memmap_install(new_phys, num_entries); > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); > + e820__update_table(e820_table); > } > /* Yes, that did it: -0fff : Reserved 1000-0009efff : System RAM 0009f000-000f : Reserved 000a-000b : PCI Bus :00 000e-000e3fff : PCI Bus :00 000e4000-000e7fff : PCI Bus :00 000e8000-000ebfff : PCI Bus :00 000ec000-000e : PCI Bus :00 000f-000f : PCI Bus :00 000f-000f : System ROM 0010-74dd1fff : System RAM 6500-6aff : Crash kernel 74dd2000-74dd2fff : Reserved <- ESRT 74dd3000-763f5fff : System RAM 763f6000-79974fff : Reserved 79975000-799f1fff : ACPI Tables 799f2000-79aa6fff : ACPI Non-volatile Storage 79a17000-79a17fff : USBC000:00 [0.001381] esrt: Reserving ESRT space from 0x74dd2f98 to 0x74dd2fd0. [0.001382] memblock_reserve: [0x74dd2f98-0x74dd2fcf] efi_mem_reserve+0x1d/0x2b [0.001383] memblock_reserve: [0x0009e640-0x0009efcf] memblock_alloc_range_nid+0x93/0xfa [0.001384] e820: update [mem 0x74dd2000-0x74dd2fff] usable ==> reserved [...] [0.043610] PM: Registered nosave memory: [mem 0x-0x0fff] [0.043611] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x max_addr=0x __register_nosave_region+0x6b/0xca [0.043612] memblock_reserve: [0x00047dff95c0-0x00047dff95df] memblock_alloc_range_nid+0x93/0xfa [0.043613] PM: Registered nosave memory: [mem 0x0009f000-0x000f] [0.043615] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x max_addr=0x __register_nosave_region+0x6b/0xca [0.043616] memblock_reserve: [0x00047dff9580-0x00047dff959f] memblock_alloc_range_nid+0x93/0xfa [0.043617] PM: Registered nosave memory: [mem 0x74dd2000-0x74dd2fff] < ESRT [0.043618] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x max_addr=0x __register_nosave_region+0x6b/0xca [0.043619] memblock_reserve: [0x00047dff9540-0x00047dff955f] memblock_alloc_range_nid+0x93/0xfa [0.043620] PM: Registered nosave memory: [mem 0x763f6000-0x79974fff] [0.043620] PM: Registered nosave memory: [mem 0x79975000-0x799f1fff] [0.043621] PM: Registered nosave memory: [mem 0x799f2000-0x79aa6fff] [0.043621] PM: Registered nosave memory: [mem 0x79aa7000-0x7a40dfff] [...] [5.993928] PCI: pci_cache_line_size set to 64 bytes [5.994563] e820: reserve RAM buffer [mem 0x0009f000-0x0009] [5.994565] e820: reserve RAM buffer [mem 0x74dd2000-0x77ff] <- ESRT [5.994565] e820: reserve RAM buffer [mem 0x763f6000-0x77ff] [5.994566] e820: reserve RAM buffer [mem 0x7a40f000-0x7bff] [5.994567] e820: reserve RAM buffer [mem 0x47e00-0x47fff] [5.995513] acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01) [5.995549] acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01) [...] [ 86.508053] kexec-bzImage64: Loaded purgatory at 0x98000 [ 86.508056] kexec_file: Considering 0x1000-0x9efff [ 86.508057] kexec-bzImage64: Loaded boot_param, command line and misc at 0x96000 bufsz=0x1240 memsz=0x1240 [ 86.508057] kexec_file: Considering 0x10-0x74dd1fff [ 86.508058] kexec-bzImage64: Loaded 64bit kernel at 0x7200 bufsz=0x1140888 memsz=0x24b7000 [ 86.508058] kexec-bzImage64: Final command line is: [ 86.584668] kexec_file: Loading segment 0: buf=0xd5ec82bc bufsz=0x5000 mem=0x98000 memsz=0x6000 [ 86.584672] kexec_file: Loading segment 1: buf=0xaf539c69 bufsz=0x1240 mem=0x96000 memsz=0x2000 [ 86.584674] kexec_file: Loading segment 2: buf=0x29f9b9a8 bufsz=0x1140888 mem=0x7200 memsz=0x24b7000 < not ESRT :) And no more invalid version error message from the kexec'd kernel. -- Thanks, Michael
Re: kexec_file overwrites reserved EFI ESRT memory
Hi Dave, On Mon, Dec 02, 2019 at 05:05:20PM +0800, Dave Young wrote: > > It seems a serious problem, the EFI modified memmap does not get an > > /proc/iomem resource update, but kexec_file relies on /proc/iomem in > > X86. > > > > There is an question from Sai about why add_efi_memmap is not enabled by > > default: > > https://www.spinics.net/lists/linux-mm/msg185166.html Incidentally, a data point I did not think to mention: I do boot the kernel as EFI application directly from the firmware as a boot entry with compiled in initrd and command line: $ grep EFI nobak/kernel/linux/.config CONFIG_EFI=y CONFIG_EFI_STUB=y # CONFIG_EFI_MIXED is not set CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y # EFI (Extensible Firmware Interface) Support CONFIG_EFI_VARS=m CONFIG_EFI_ESRT=y CONFIG_EFI_VARS_PSTORE=m # CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE is not set CONFIG_EFI_RUNTIME_MAP=y # CONFIG_EFI_FAKE_MEMMAP is not set CONFIG_EFI_RUNTIME_WRAPPERS=y # CONFIG_EFI_BOOTLOADER_CONTROL is not set # CONFIG_EFI_CAPSULE_LOADER is not set # CONFIG_EFI_TEST is not set # CONFIG_EFI_RCI2_TABLE is not set # end of EFI (Extensible Firmware Interface) Support CONFIG_UEFI_CPER=y CONFIG_UEFI_CPER_X86=y CONFIG_EFI_EARLYCON=y CONFIG_EFI_PARTITION=y CONFIG_FB_EFI=y CONFIG_EFIVAR_FS=y # CONFIG_EFI_PGT_DUMP is not set $ grep CMDLINE nobak/kernel/linux/.config CONFIG_CMDLINE_BOOL=y CONFIG_CMDLINE="root=UUID=97[...]e4 rd.luks.uuid=8a[...]c3 rd.luks.allow-discards=8a[...]c3 mem_sleep_default=deep resume=UUID=97[...]e4 resume_offset=96256 efi=debug memblock=debug" CONFIG_CMDLINE_OVERRIDE=y # CONFIG_BLK_CMDLINE_PARSER is not set # CONFIG_CMDLINE_PARTITION is not set CONFIG_FB_CMDLINE=y $ efibootmgr -v BootCurrent: 000A Timeout: 2 seconds BootOrder: 000A,0009,0008,0005,0007,0006,0004,0002,0001,,0003 [...] Boot0005* gentoo-5.4.0-next-20191127+-clear HD(1,GPT,e7[...]f2,0x800,0x64000)/File(\kernel-5.4.0-next-20191127+-clear) [...] Boot000A* gentoo-5.4.1-gentoo HD(1,GPT,e7[...]f2,0x800,0x64000)/File(\kernel-5.4.1-gentoo) So there's no boot loader that could construct an e820 table for the kernel to consume. I understand it's then up to the EFI stub to come up with a e820 table from the EFI memory map. > > Long time ago the add_efi_memmap is only enabled in case we explict > > enable it on cmdline, I'm not sure if we can do it by default, maybe we > > should. Need opinion from X86 maintainers.. > > Can you try below diff see if it works for you? (not tested, and need > > explicitly 'add_efi_memmap' in kernel cmdline param) Neither adding add_efi_memmap nor adding your patch and setting that option does make the ESRT memory region appear in /proc/iomem. kexec_file still loads the kernel across the ESRT region. What occurs to me is that nowhere does the ESRT memory region appear in any externally provided memory map. Neither e820 nor EFI seem to declare it. Is that expected or a bug of my particular system? For example, the e820 map (derived from the EFI map by the EFI stub?) has these regions: BIOS-provided physical RAM map: BIOS-e820: [mem 0x-0x0009efff] usable BIOS-e820: [mem 0x0009f000-0x000f] reserved BIOS-e820: [mem 0x0010-0x763f5fff] usable BIOS-e820: [mem 0x763f6000-0x79974fff] reserved BIOS-e820: [mem 0x79975000-0x799f1fff] ACPI data BIOS-e820: [mem 0x799f2000-0x79aa6fff] ACPI NVS BIOS-e820: [mem 0x79aa7000-0x7a40dfff] reserved BIOS-e820: [mem 0x7a40e000-0x7a40efff] usable BIOS-e820: [mem 0x7a40f000-0x7fff] reserved BIOS-e820: [mem 0xf000-0xf7ff] reserved BIOS-e820: [mem 0xfe00-0xfe010fff] reserved BIOS-e820: [mem 0xfec0-0xfec00fff] reserved BIOS-e820: [mem 0xfed0-0xfed03fff] reserved BIOS-e820: [mem 0xfee0-0xfee00fff] reserved BIOS-e820: [mem 0xff00-0x] reserved BIOS-e820: [mem 0x0001-0x00047dff] usable The ESRT region sits smack in the middle of a large system RAM region: BIOS-e820: [mem 0x0010-0x763f5fff] usable Consequently, the relevant part of /proc/iomem looks like this: -0fff : Reserved 1000-0009efff : System RAM 0009f000-000f : Reserved 000a-000b : PCI Bus :00 000e-000e3fff : PCI Bus :00 000e4000-000e7fff : PCI Bus :00 000e8000-000ebfff : PCI Bus :00 000ec000-000e : PCI Bus :00 000f-000f : PCI Bus :00 000f-000f : System ROM 0010-763f5fff : System RAM 6500-6aff : Crash kernel 763f6000-79974fff : Reserved 79975000-799f1fff : ACPI Tables 799f2000-79aa6fff : ACPI Non-volatile Storage 79a17000-79a17fff : USBC000:00 What it would need to look like for kexec to leave ESRT alone, I guess, is: -0fff : Reserved 1000-0009efff : System RAM 0009f000-000f : Reserved
Re: kexec_file overwrites reserved EFI ESRT memory
Hello Dave, On Mon, Nov 25, 2019 at 01:52:01PM +0800, Dave Young wrote: > > > Fundamentally when deciding where to place a new kernel kexec (either > > > user space or the in kernel kexec_file implementation) needs to be able > > > to ask the question which memory ares are reserved. [...] > > > So my question is why doesn't the ESRT reservation wind up in > > > /proc/iomem? > > > > My guess is that the focus was that some EFI structures need to be kept > > around accross the life cycle of *one* running kernel and > > memblock_reserve() was enough for that. Marking them so they survive > > kexecing another kernel might just never have cropped up thus far. Ard > > or Matt would know. > Can you check your un-reserved memory, if your memory falls into EFI > BOOT* then in X86 you can use something like below if it is not covered: > void __init efi_esrt_init(void) > { > ... > pr_info("Reserving ESRT space from %pa to %pa.\n", _data, ); > if (md.type == EFI_BOOT_SERVICES_DATA) > efi_mem_reserve(esrt_data, esrt_data_size); > ... > } Please bear with me if I'm a bit slow on the uptake here: On my machine, the esrt module reports at boot: [0.001244] esrt: Reserving ESRT space from 0x74dd2f98 to 0x74dd2fd0. This area is of type "Boot Data" (== BOOT_SERVICES_DATA) which makes the code you quote reserve it using memblock_reserve() shown by memblock=debug: [0.001246] memblock_reserve: [0x74dd2f98-0x74dd2fcf] efi_mem_reserve+0x1d/0x2b It also calls into arch/x86/platform/efi/quirks.c:efi_arch_mem_reserve() which tags it as EFI_MEMORY_RUNTIME while the surrounding ones aren't as shown by efi=debug: [0.178111] efi: mem10: [Boot Data | | | | | | | | | |WB|WT|WC|UC] range=[0x74dd3000-0x75becfff] (14MB) [0.178113] efi: mem11: [Boot Data |RUN| | | | | | | | |WB|WT|WC|UC] range=[0x74dd2000-0x74dd2fff] (0MB) [0.178114] efi: mem12: [Boot Data | | | | | | | | | |WB|WT|WC|UC] range=[0x6d635000-0x74dd1fff] (119MB) This prevents arch/x86/platform/efi/quirks.c:efi_free_boot_services() from calling __memblock_free_late() on it. And indeed, memblock=debug does not report this area as being free'd while the surrounding ones are: [0.178369] __memblock_free_late: [0x74dd3000-0x75becfff] efi_free_boot_services+0x126/0x1f8 [0.178658] __memblock_free_late: [0x6d635000-0x74dd1fff] efi_free_boot_services+0x126/0x1f8 The esrt area does not show up in /proc/iomem though: 0010-763f5fff : System RAM 6200-62a00d80 : Kernel code 62c0-62f15fff : Kernel rodata 6300-630ea8bf : Kernel data 63fed000-641f : Kernel bss 6500-6aff : Crash kernel And thus kexec loads the new kernel right over that area as shown when enabling -DDEBUG on kexec_file.c (0x74dd3000 being inbetween 0x7300 and 0x7300+0x24be000 = 0x754be000): [ 650.007695] kexec_file: Loading segment 0: buf=0x3a9c84d6 bufsz=0x5000 mem=0x98000 memsz=0x6000 [ 650.007699] kexec_file: Loading segment 1: buf=0x17b2b9e6 bufsz=0x1240 mem=0x96000 memsz=0x2000 [ 650.007703] kexec_file: Loading segment 2: buf=0xfdf72ba2 bufsz=0x1150888 mem=0x7300 memsz=0x24be000 ... because it looks for any memory hole large enough in iomem resources tagged as System RAM, which 0x74dd2000-0x74dd2fff would then need to be excluded from on my system. Looking some more at efi_arch_mem_reserve() I see that it also registers the area with efi.memmap and installs it using efi_memmap_install(). which seems to call memremap(MEMREMAP_WB) on it. From my understanding of the comments in the source of memremap(), MEMREMAP_WB does specifically *not* reserve that memory in any way. > Unfortunately I noticed there are different requirements/ways for > different types of "reserved" memory. But that is another topic.. I tried to reserve the area with something like this: t a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 4de244683a7e..b86a5df027a2 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -249,6 +249,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) efi_memory_desc_t md; int num_entries; void *new; + struct resource *res; if (efi_mem_desc_lookup(addr, ) || md.type != EFI_BOOT_SERVICES_DATA) { @@ -294,6 +295,21 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) early_memunmap(new, new_size); efi_memmap_install(new_phys, num_entries); + + res = memblock_alloc(sizeof(*res), SMP_CACHE_BYTES); + if (!res) { + pr_err("Failed to allocate EFI io resource allocator for " + "0x%llx:0x%llx", mr.range.start, mr.range.end); + return; + } + + res->start = mr.range.start; +
kexec_file overwrites reserved EFI ESRT memory
Hello Eric, Hello Ard, on my machine, kexec_file loads the normal (not crash) kernel image right across the EFI ESRT reserved memory range: esrt: Reserving ESRT space from 0x74dd6f98 to 0x74dd6fd0. [...] kexec_file: kernel signature verification successful. kexec_file: Loading segment 0: buf=0xe99b31ad bufsz=0x5000 mem=0x91000 memsz=0x6000 kexec_file: Loading segment 1: buf=0xe45cdeb8 bufsz=0x1240 mem=0x8f000 memsz=0x2000 kexec_file: Loading segment 2: buf=0x096e6de9 bufsz=0x1133888 mem=0x7300 memsz=0x249a000 This causes the following message by the kexec'd kernel: esrt: Unsupported ESRT version 2904149718861218184. (The image is rather large at 18MiB as it has a built-in initrd.) Poking at the involved code a bit (as a layman) I found that the EFI code reserves the memory range using memblock_reserve() which is by all appearances correctly handed over to the buddy allocator as in-use/reserved. kexec_file on the other hand by default looks at iomem regions of type System RAM using walk_system_ram_res() and does not seem to have that particular information available to consider. (As may have become clear from this explanation I'm still somewhat fuzzy (to put it midly) on the relationship of memblock, buddy and slab allocator and how (if at all) kexec_file interacts with them to a.) find available memory regions for the new kernel to load to and b.) tell them where it loaded the new kernel to so they don't use it any more.) As is to be expected, activating CONFIG_ARCH_KEEP_MEMBLOCK makes kexec_file use the preserved memblock structures and indeed end up using totally different memory regions and gets rid of the message: kexec_file: kernel signature verification successful. kexec_file: Loading segment 0: buf=0x2dea71f8 bufsz=0x5000 mem=0x47df8e000 memsz=0x6000 kexec_file: Loading segment 1: buf=0x0686ff17 bufsz=0x1240 mem=0x47df8c000 memsz=0x2000 kexec_file: Loading segment 2: buf=0xfc444e67 bufsz=0x1133888 mem=0x46900 memsz=0x2497000 This is with 5.3.11 mainline and linux-next 5.4.0-rc8-next-20191122. I'm not actually trying to use ESRT for anything at this point but want to stop the boot message from messing up silent boot and suspect that this could potentially happen to other, more important EFI memory regions as well. I'm willing to chase this further but at this point I'm wondering whether it's the EFI code not reserving this memory area with enough emphasis (as iomem?) or kexec_file not checking usability of candidate memory regions rigorously enough (based on what other criteria?). Are there maybe any upcoming patches or subsystem-specific kernel trees I should try? Please let me know what other information may be helpful or if I should open a bug on bugzilla.kernel.org. Boot messages on normal boot: Linux version 5.3.11-gentoo (m@n) (gcc version 9.2.0 (Gentoo 9.2.0-r2 p3)) #29 SMP Thu Nov 21 20:40:28 CET 2019 Command line: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format. BIOS-provided physical RAM map: BIOS-e820: [mem 0x-0x0009efff] usable BIOS-e820: [mem 0x0009f000-0x000f] reserved BIOS-e820: [mem 0x0010-0x763fafff] usable BIOS-e820: [mem 0x763fb000-0x79979fff] reserved BIOS-e820: [mem 0x7997a000-0x799f6fff] ACPI data BIOS-e820: [mem 0x799f7000-0x79aabfff] ACPI NVS BIOS-e820: [mem 0x79aac000-0x7a40dfff] reserved BIOS-e820: [mem 0x7a40e000-0x7a40efff] usable BIOS-e820: [mem 0x7a40f000-0x7fff] reserved BIOS-e820: [mem 0xf000-0xf7ff] reserved BIOS-e820: [mem 0xfe00-0xfe010fff] reserved BIOS-e820: [mem 0xfec0-0xfec00fff] reserved BIOS-e820: [mem 0xfed0-0xfed03fff] reserved BIOS-e820: [mem 0xfee0-0xfee00fff] reserved BIOS-e820: [mem 0xff00-0x] reserved BIOS-e820: [mem 0x0001-0x00047dff] usable NX (Execute Disable) protection: active efi: EFI v2.70 by American Megatrends efi: ACPI 2.0=0x79993000 ACPI=0x79993000 TPMFinalLog=0x79a35000 SMBIOS=0x7a1cf000 SMBIOS 3.0=0x7a1ce000 ESRT=0x74dd6f98 TPMEventLog=0x6d634018 efi: mem00: [Conventional Memory| | | | | | | | |WB|WT|WC|UC] range=[0x-0x0fff] (0MB) efi: mem01: [Loader Data| | | | | | | |
Re: kexec_file overwrites reserved EFI ESRT memory
Hi Eric, On Fri, Nov 22, 2019 at 02:00:22PM -0600, Eric W. Biederman wrote: > > esrt: Unsupported ESRT version 2904149718861218184. > > > > (The image is rather large at 18MiB as it has a built-in initrd.) > When did x86_64 get support for ARCH_KEEP_MEMBLOCK? I can't find it > anywhere. No, is hasn't. I temporarily hacked that in to see if it'd change anything and it did. Sorry to not be more clear about that. > Fundamentally when deciding where to place a new kernel kexec (either > user space or the in kernel kexec_file implementation) needs to be able > to ask the question which memory ares are reserved. > What the buddy > allocator does is unimportant as kexec copies memory from all over > the place and places it in the destined memory addresses at the > time of the kexec operation. > So my question is why doesn't the ESRT reservation wind up in > /proc/iomem? My guess is that the focus was that some EFI structures need to be kept around accross the life cycle of *one* running kernel and memblock_reserve() was enough for that. Marking them so they survive kexecing another kernel might just never have cropped up thus far. Ard or Matt would know. > Are you dealing with an embedded port that is being clever? I'm not an expert but think it's rather the opposite: It's just a memory area provided by EFI containing some potentially interesting information about the EFI firmware structure itself. The aim is to aid firmware upgrades. This information needs to survive kexec so the user would be able to use that information (e.g. for upgrades) after a kexec. So apart from leaving that memory untouched, I guess it could also be copied over to a staging area by kexec explicitly to be preserved across the kexec. Or it could be blanked out in such a way that the esrt driver would not find it after kexec and just be unavailable, if it's decided that you should only use data about a firmware for upgrades that you really just used to boot. I guess a bigger question could be asked whether it would actually be useful and safe for esrt to be available after kexec. > Or is there some subtle breakage now that x86 has memblock support that > /proc/iomem is no longer being properly maintained? Uuuh, let me backpaddle very hard here: x86 has not gained memblock preserve support. That was just me mucking about. Sorry. -- Thanks, Michael ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec