Re: kexec_file overwrites reserved EFI ESRT memory
On 12/03/19 at 10:11pm, Michael Weiser wrote: > Hi Dave, > > On Tue, Dec 03, 2019 at 07:54:35PM +0800, Dave Young wrote: > > > > Neither adding add_efi_memmap nor adding your patch and setting that > > > option > > > does make the ESRT memory region appear in /proc/iomem. kexec_file still > > > loads the kernel across the ESRT region. > > Hmm, sorry, my bad, actuall add_efi_memmap does not consider the > > EFI_MEMORY_RUNTIME attribute, it only reads the memory descriptor types. > > > Will read your replied information later, did not get time today, but > > probably below chunk can help? > > > diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c > > index 3b9fd679cea9..516307617621 100644 > > --- a/arch/x86/platform/efi/quirks.c > > +++ b/arch/x86/platform/efi/quirks.c > > @@ -293,6 +293,8 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 > > size) > > early_memunmap(new, new_size); > > > efi_memmap_install(new_phys, num_entries); > > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); > > + e820__update_table(e820_table); > > } > > > /* > > Yes, that did it: > > -0fff : Reserved > 1000-0009efff : System RAM > 0009f000-000f : Reserved > 000a-000b : PCI Bus :00 > 000e-000e3fff : PCI Bus :00 > 000e4000-000e7fff : PCI Bus :00 > 000e8000-000ebfff : PCI Bus :00 > 000ec000-000e : PCI Bus :00 > 000f-000f : PCI Bus :00 > 000f-000f : System ROM > 0010-74dd1fff : System RAM > 6500-6aff : Crash kernel > 74dd2000-74dd2fff : Reserved <- ESRT > 74dd3000-763f5fff : System RAM > 763f6000-79974fff : Reserved > 79975000-799f1fff : ACPI Tables > 799f2000-79aa6fff : ACPI Non-volatile Storage > 79a17000-79a17fff : USBC000:00 Ok, good to know it works. I will think about it and file a patch later. There are more things to consider, eg. kexec reboot multiple times, userspace kexec loader etc. If we choose to fix it in kexec_file path to avoid those region then we need to do same in userspace, there will be compatibility issues so I would still prefer to go with this way you tested. BTW, on my laptop the ESRT stays in EFI runtime area so I do not see the problem. This should be machine/firmware specific. Here is the info on my laptop: [0.00] efi: mem34: [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] range=[0x7a4b-0x7a676fff] (1MB) [0.020670] esrt: Reserving ESRT space from 0x7a4ec000 to 0x7a4ec088. Thanks Dave ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: kexec_file overwrites reserved EFI ESRT memory
Hi Dave, On Tue, Dec 03, 2019 at 07:54:35PM +0800, Dave Young wrote: > > Neither adding add_efi_memmap nor adding your patch and setting that option > > does make the ESRT memory region appear in /proc/iomem. kexec_file still > > loads the kernel across the ESRT region. > Hmm, sorry, my bad, actuall add_efi_memmap does not consider the > EFI_MEMORY_RUNTIME attribute, it only reads the memory descriptor types. > Will read your replied information later, did not get time today, but > probably below chunk can help? > diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c > index 3b9fd679cea9..516307617621 100644 > --- a/arch/x86/platform/efi/quirks.c > +++ b/arch/x86/platform/efi/quirks.c > @@ -293,6 +293,8 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 > size) > early_memunmap(new, new_size); > efi_memmap_install(new_phys, num_entries); > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); > + e820__update_table(e820_table); > } > /* Yes, that did it: -0fff : Reserved 1000-0009efff : System RAM 0009f000-000f : Reserved 000a-000b : PCI Bus :00 000e-000e3fff : PCI Bus :00 000e4000-000e7fff : PCI Bus :00 000e8000-000ebfff : PCI Bus :00 000ec000-000e : PCI Bus :00 000f-000f : PCI Bus :00 000f-000f : System ROM 0010-74dd1fff : System RAM 6500-6aff : Crash kernel 74dd2000-74dd2fff : Reserved <- ESRT 74dd3000-763f5fff : System RAM 763f6000-79974fff : Reserved 79975000-799f1fff : ACPI Tables 799f2000-79aa6fff : ACPI Non-volatile Storage 79a17000-79a17fff : USBC000:00 [0.001381] esrt: Reserving ESRT space from 0x74dd2f98 to 0x74dd2fd0. [0.001382] memblock_reserve: [0x74dd2f98-0x74dd2fcf] efi_mem_reserve+0x1d/0x2b [0.001383] memblock_reserve: [0x0009e640-0x0009efcf] memblock_alloc_range_nid+0x93/0xfa [0.001384] e820: update [mem 0x74dd2000-0x74dd2fff] usable ==> reserved [...] [0.043610] PM: Registered nosave memory: [mem 0x-0x0fff] [0.043611] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x max_addr=0x __register_nosave_region+0x6b/0xca [0.043612] memblock_reserve: [0x00047dff95c0-0x00047dff95df] memblock_alloc_range_nid+0x93/0xfa [0.043613] PM: Registered nosave memory: [mem 0x0009f000-0x000f] [0.043615] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x max_addr=0x __register_nosave_region+0x6b/0xca [0.043616] memblock_reserve: [0x00047dff9580-0x00047dff959f] memblock_alloc_range_nid+0x93/0xfa [0.043617] PM: Registered nosave memory: [mem 0x74dd2000-0x74dd2fff] < ESRT [0.043618] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x max_addr=0x __register_nosave_region+0x6b/0xca [0.043619] memblock_reserve: [0x00047dff9540-0x00047dff955f] memblock_alloc_range_nid+0x93/0xfa [0.043620] PM: Registered nosave memory: [mem 0x763f6000-0x79974fff] [0.043620] PM: Registered nosave memory: [mem 0x79975000-0x799f1fff] [0.043621] PM: Registered nosave memory: [mem 0x799f2000-0x79aa6fff] [0.043621] PM: Registered nosave memory: [mem 0x79aa7000-0x7a40dfff] [...] [5.993928] PCI: pci_cache_line_size set to 64 bytes [5.994563] e820: reserve RAM buffer [mem 0x0009f000-0x0009] [5.994565] e820: reserve RAM buffer [mem 0x74dd2000-0x77ff] <- ESRT [5.994565] e820: reserve RAM buffer [mem 0x763f6000-0x77ff] [5.994566] e820: reserve RAM buffer [mem 0x7a40f000-0x7bff] [5.994567] e820: reserve RAM buffer [mem 0x47e00-0x47fff] [5.995513] acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01) [5.995549] acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01) [...] [ 86.508053] kexec-bzImage64: Loaded purgatory at 0x98000 [ 86.508056] kexec_file: Considering 0x1000-0x9efff [ 86.508057] kexec-bzImage64: Loaded boot_param, command line and misc at 0x96000 bufsz=0x1240 memsz=0x1240 [ 86.508057] kexec_file: Considering 0x10-0x74dd1fff [ 86.508058] kexec-bzImage64: Loaded 64bit kernel at 0x7200 bufsz=0x1140888 memsz=0x24b7000 [ 86.508058] kexec-bzImage64: Final command line is: [ 86.584668] kexec_file: Loading segment 0: buf=0xd5ec82bc bufsz=0x5000 mem=0x98000 memsz=0x6000 [ 86.584672] kexec_file: Loading segment 1: buf=0xaf539c69 bufsz=0x1240 mem=0x96000 memsz=0x2000 [ 86.584674] kexec_file: Loading segment 2: buf=0x29f9b9a8 bufsz=0x1140888 mem=0x7200 memsz=0x24b7000 < not ESRT :) And no more invalid version error message from the kexec'd kernel. -- Thanks, Michael
Re: kexec_file overwrites reserved EFI ESRT memory
On 12/03/19 at 10:01am, Ard Biesheuvel wrote: > On Mon, 2 Dec 2019 at 09:05, Dave Young wrote: > > > > Add more cc > > On 12/02/19 at 04:58pm, Dave Young wrote: > > > On 11/29/19 at 04:27pm, Michael Weiser wrote: > > > > Hello Dave, > > > > > > > > On Mon, Nov 25, 2019 at 01:52:01PM +0800, Dave Young wrote: > > > > > > > > > > > Fundamentally when deciding where to place a new kernel kexec > > > > > > > (either > > > > > > > user space or the in kernel kexec_file implementation) needs to > > > > > > > be able > > > > > > > to ask the question which memory ares are reserved. > > > > [...] > > > > > > > So my question is why doesn't the ESRT reservation wind up in > > > > > > > /proc/iomem? > > > > > > > > > > > > My guess is that the focus was that some EFI structures need to be > > > > > > kept > > > > > > around accross the life cycle of *one* running kernel and > > > > > > memblock_reserve() was enough for that. Marking them so they survive > > > > > > kexecing another kernel might just never have cropped up thus far. > > > > > > Ard > > > > > > or Matt would know. > > > > > Can you check your un-reserved memory, if your memory falls into EFI > > > > > BOOT* then in X86 you can use something like below if it is not > > > > > covered: > > > > > > > > > void __init efi_esrt_init(void) > > > > > { > > > > > ... > > > > > pr_info("Reserving ESRT space from %pa to %pa.\n", &esrt_data, > > > > > &end); > > > > > if (md.type == EFI_BOOT_SERVICES_DATA) > > > > > efi_mem_reserve(esrt_data, esrt_data_size); > > > > > ... > > > > > } > > > > > > > > Please bear with me if I'm a bit slow on the uptake here: On my machine, > > > > the esrt module reports at boot: > > > > > > > > [0.001244] esrt: Reserving ESRT space from 0x74dd2f98 to > > > > 0x74dd2fd0. > > > > > > > > This area is of type "Boot Data" (== BOOT_SERVICES_DATA) which makes the > > > > code you quote reserve it using memblock_reserve() shown by > > > > memblock=debug: > > > > > > > > [0.001246] memblock_reserve: > > > > [0x74dd2f98-0x74dd2fcf] efi_mem_reserve+0x1d/0x2b > > > > > > > > It also calls into arch/x86/platform/efi/quirks.c:efi_arch_mem_reserve() > > > > which tags it as EFI_MEMORY_RUNTIME while the surrounding ones aren't > > > > as shown by efi=debug: > > > > > > > > [0.178111] efi: mem10: [Boot Data | | | | | | | | > > > > | |WB|WT|WC|UC] range=[0x74dd3000-0x75becfff] (14MB) > > > > [0.178113] efi: mem11: [Boot Data |RUN| | | | | | | > > > > | |WB|WT|WC|UC] range=[0x74dd2000-0x74dd2fff] (0MB) > > > > [0.178114] efi: mem12: [Boot Data | | | | | | | | > > > > | |WB|WT|WC|UC] range=[0x6d635000-0x74dd1fff] (119MB) > > > > > > > > This prevents arch/x86/platform/efi/quirks.c:efi_free_boot_services() > > > > from calling __memblock_free_late() on it. And indeed, memblock=debug > > > > does > > > > not report this area as being free'd while the surrounding ones are: > > > > > > > > [0.178369] __memblock_free_late: > > > > [0x74dd3000-0x75becfff] > > > > efi_free_boot_services+0x126/0x1f8 > > > > [0.178658] __memblock_free_late: > > > > [0x6d635000-0x74dd1fff] > > > > efi_free_boot_services+0x126/0x1f8 > > > > > > > > The esrt area does not show up in /proc/iomem though: > > > > > > > > 0010-763f5fff : System RAM > > > > 6200-62a00d80 : Kernel code > > > > 62c0-62f15fff : Kernel rodata > > > > 6300-630ea8bf : Kernel data > > > > 63fed000-641f : Kernel bss > > > > 6500-6aff : Crash kernel > > > > > > > > And thus kexec loads the new kernel right over that area as shown when > > > > enabling -DDEBUG on kexec_file.c (0x74dd3000 being inbetween 0x7300 > > > > and 0x7300+0x24be000 = 0x754be000): > > > > > > > > [ 650.007695] kexec_file: Loading segment 0: buf=0x3a9c84d6 > > > > bufsz=0x5000 mem=0x98000 memsz=0x6000 > > > > [ 650.007699] kexec_file: Loading segment 1: buf=0x17b2b9e6 > > > > bufsz=0x1240 mem=0x96000 memsz=0x2000 > > > > [ 650.007703] kexec_file: Loading segment 2: buf=0xfdf72ba2 > > > > bufsz=0x1150888 mem=0x7300 memsz=0x24be000 > > > > > > > > ... because it looks for any memory hole large enough in iomem resources > > > > tagged as System RAM, which 0x74dd2000-0x74dd2fff would then need to be > > > > excluded from on my system. > > > > > > > > Looking some more at efi_arch_mem_reserve() I see that it also registers > > > > the area with efi.memmap and installs it using efi_memmap_install(). > > > > which seems to call memremap(MEMREMAP_WB) on it. From my understanding > > > > of the comments in the source of memremap(), MEMREMAP_WB does > > > > specifically > > > > *not* reserve that memory in any way. > > > > > > > > > Unfortunately I noticed there are different requirements/ways for > > > > > different types of "reserved" memory.
Re: kexec_file overwrites reserved EFI ESRT memory
On 12/03/19 at 12:45am, Michael Weiser wrote: > Hi Dave, > > On Mon, Dec 02, 2019 at 05:05:20PM +0800, Dave Young wrote: > > > > It seems a serious problem, the EFI modified memmap does not get an > > > /proc/iomem resource update, but kexec_file relies on /proc/iomem in > > > X86. > > > > > > There is an question from Sai about why add_efi_memmap is not enabled by > > > default: > > > https://www.spinics.net/lists/linux-mm/msg185166.html > > Incidentally, a data point I did not think to mention: I do boot the > kernel as EFI application directly from the firmware as a boot entry > with compiled in initrd and command line: > > $ grep EFI nobak/kernel/linux/.config > CONFIG_EFI=y > CONFIG_EFI_STUB=y > # CONFIG_EFI_MIXED is not set > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y > # EFI (Extensible Firmware Interface) Support > CONFIG_EFI_VARS=m > CONFIG_EFI_ESRT=y > CONFIG_EFI_VARS_PSTORE=m > # CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE is not set > CONFIG_EFI_RUNTIME_MAP=y > # CONFIG_EFI_FAKE_MEMMAP is not set > CONFIG_EFI_RUNTIME_WRAPPERS=y > # CONFIG_EFI_BOOTLOADER_CONTROL is not set > # CONFIG_EFI_CAPSULE_LOADER is not set > # CONFIG_EFI_TEST is not set > # CONFIG_EFI_RCI2_TABLE is not set > # end of EFI (Extensible Firmware Interface) Support > CONFIG_UEFI_CPER=y > CONFIG_UEFI_CPER_X86=y > CONFIG_EFI_EARLYCON=y > CONFIG_EFI_PARTITION=y > CONFIG_FB_EFI=y > CONFIG_EFIVAR_FS=y > # CONFIG_EFI_PGT_DUMP is not set > > $ grep CMDLINE nobak/kernel/linux/.config > CONFIG_CMDLINE_BOOL=y > CONFIG_CMDLINE="root=UUID=97[...]e4 rd.luks.uuid=8a[...]c3 > rd.luks.allow-discards=8a[...]c3 mem_sleep_default=deep resume=UUID=97[...]e4 > resume_offset=96256 efi=debug memblock=debug" > CONFIG_CMDLINE_OVERRIDE=y > # CONFIG_BLK_CMDLINE_PARSER is not set > # CONFIG_CMDLINE_PARTITION is not set > CONFIG_FB_CMDLINE=y > > $ efibootmgr -v > BootCurrent: 000A > Timeout: 2 seconds > BootOrder: 000A,0009,0008,0005,0007,0006,0004,0002,0001,,0003 > [...] > Boot0005* gentoo-5.4.0-next-20191127+-clear > HD(1,GPT,e7[...]f2,0x800,0x64000)/File(\kernel-5.4.0-next-20191127+-clear) > [...] > Boot000A* gentoo-5.4.1-gentoo > HD(1,GPT,e7[...]f2,0x800,0x64000)/File(\kernel-5.4.1-gentoo) > > So there's no boot loader that could construct an e820 table for the > kernel to consume. I understand it's then up to the EFI stub to come up > with a e820 table from the EFI memory map. > > > > Long time ago the add_efi_memmap is only enabled in case we explict > > > enable it on cmdline, I'm not sure if we can do it by default, maybe we > > > should. Need opinion from X86 maintainers.. > > > Can you try below diff see if it works for you? (not tested, and need > > > explicitly 'add_efi_memmap' in kernel cmdline param) > > Neither adding add_efi_memmap nor adding your patch and setting that option > does make the ESRT memory region appear in /proc/iomem. kexec_file still > loads the kernel across the ESRT region. > Hmm, sorry, my bad, actuall add_efi_memmap does not consider the EFI_MEMORY_RUNTIME attribute, it only reads the memory descriptor types. Will read your replied information later, did not get time today, but probably below chunk can help? diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 3b9fd679cea9..516307617621 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -293,6 +293,8 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) early_memunmap(new, new_size); efi_memmap_install(new_phys, num_entries); + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); + e820__update_table(e820_table); } /* Thanks Dave ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: kexec_file overwrites reserved EFI ESRT memory
On Mon, 2 Dec 2019 at 09:05, Dave Young wrote: > > Add more cc > On 12/02/19 at 04:58pm, Dave Young wrote: > > On 11/29/19 at 04:27pm, Michael Weiser wrote: > > > Hello Dave, > > > > > > On Mon, Nov 25, 2019 at 01:52:01PM +0800, Dave Young wrote: > > > > > > > > > Fundamentally when deciding where to place a new kernel kexec > > > > > > (either > > > > > > user space or the in kernel kexec_file implementation) needs to be > > > > > > able > > > > > > to ask the question which memory ares are reserved. > > > [...] > > > > > > So my question is why doesn't the ESRT reservation wind up in > > > > > > /proc/iomem? > > > > > > > > > > My guess is that the focus was that some EFI structures need to be > > > > > kept > > > > > around accross the life cycle of *one* running kernel and > > > > > memblock_reserve() was enough for that. Marking them so they survive > > > > > kexecing another kernel might just never have cropped up thus far. Ard > > > > > or Matt would know. > > > > Can you check your un-reserved memory, if your memory falls into EFI > > > > BOOT* then in X86 you can use something like below if it is not covered: > > > > > > > void __init efi_esrt_init(void) > > > > { > > > > ... > > > > pr_info("Reserving ESRT space from %pa to %pa.\n", &esrt_data, &end); > > > > if (md.type == EFI_BOOT_SERVICES_DATA) > > > > efi_mem_reserve(esrt_data, esrt_data_size); > > > > ... > > > > } > > > > > > Please bear with me if I'm a bit slow on the uptake here: On my machine, > > > the esrt module reports at boot: > > > > > > [0.001244] esrt: Reserving ESRT space from 0x74dd2f98 to > > > 0x74dd2fd0. > > > > > > This area is of type "Boot Data" (== BOOT_SERVICES_DATA) which makes the > > > code you quote reserve it using memblock_reserve() shown by > > > memblock=debug: > > > > > > [0.001246] memblock_reserve: [0x74dd2f98-0x74dd2fcf] > > > efi_mem_reserve+0x1d/0x2b > > > > > > It also calls into arch/x86/platform/efi/quirks.c:efi_arch_mem_reserve() > > > which tags it as EFI_MEMORY_RUNTIME while the surrounding ones aren't > > > as shown by efi=debug: > > > > > > [0.178111] efi: mem10: [Boot Data | | | | | | | | | > > > |WB|WT|WC|UC] range=[0x74dd3000-0x75becfff] (14MB) > > > [0.178113] efi: mem11: [Boot Data |RUN| | | | | | | | > > > |WB|WT|WC|UC] range=[0x74dd2000-0x74dd2fff] (0MB) > > > [0.178114] efi: mem12: [Boot Data | | | | | | | | | > > > |WB|WT|WC|UC] range=[0x6d635000-0x74dd1fff] (119MB) > > > > > > This prevents arch/x86/platform/efi/quirks.c:efi_free_boot_services() > > > from calling __memblock_free_late() on it. And indeed, memblock=debug does > > > not report this area as being free'd while the surrounding ones are: > > > > > > [0.178369] __memblock_free_late: > > > [0x74dd3000-0x75becfff] efi_free_boot_services+0x126/0x1f8 > > > [0.178658] __memblock_free_late: > > > [0x6d635000-0x74dd1fff] efi_free_boot_services+0x126/0x1f8 > > > > > > The esrt area does not show up in /proc/iomem though: > > > > > > 0010-763f5fff : System RAM > > > 6200-62a00d80 : Kernel code > > > 62c0-62f15fff : Kernel rodata > > > 6300-630ea8bf : Kernel data > > > 63fed000-641f : Kernel bss > > > 6500-6aff : Crash kernel > > > > > > And thus kexec loads the new kernel right over that area as shown when > > > enabling -DDEBUG on kexec_file.c (0x74dd3000 being inbetween 0x7300 > > > and 0x7300+0x24be000 = 0x754be000): > > > > > > [ 650.007695] kexec_file: Loading segment 0: buf=0x3a9c84d6 > > > bufsz=0x5000 mem=0x98000 memsz=0x6000 > > > [ 650.007699] kexec_file: Loading segment 1: buf=0x17b2b9e6 > > > bufsz=0x1240 mem=0x96000 memsz=0x2000 > > > [ 650.007703] kexec_file: Loading segment 2: buf=0xfdf72ba2 > > > bufsz=0x1150888 mem=0x7300 memsz=0x24be000 > > > > > > ... because it looks for any memory hole large enough in iomem resources > > > tagged as System RAM, which 0x74dd2000-0x74dd2fff would then need to be > > > excluded from on my system. > > > > > > Looking some more at efi_arch_mem_reserve() I see that it also registers > > > the area with efi.memmap and installs it using efi_memmap_install(). > > > which seems to call memremap(MEMREMAP_WB) on it. From my understanding > > > of the comments in the source of memremap(), MEMREMAP_WB does specifically > > > *not* reserve that memory in any way. > > > > > > > Unfortunately I noticed there are different requirements/ways for > > > > different types of "reserved" memory. But that is another topic.. > > > > > > I tried to reserve the area with something like this: > > > > > > t a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c > > > index 4de244683a7e..b86a5df027a2 100644 > > > --- a/arch/x86/platform/efi/quirks.c > > > +++ b/arch/x86/platform/efi/quirks.c > > > @@ -24
Re: kexec_file overwrites reserved EFI ESRT memory
Hi Dave, On Mon, Dec 02, 2019 at 05:05:20PM +0800, Dave Young wrote: > > It seems a serious problem, the EFI modified memmap does not get an > > /proc/iomem resource update, but kexec_file relies on /proc/iomem in > > X86. > > > > There is an question from Sai about why add_efi_memmap is not enabled by > > default: > > https://www.spinics.net/lists/linux-mm/msg185166.html Incidentally, a data point I did not think to mention: I do boot the kernel as EFI application directly from the firmware as a boot entry with compiled in initrd and command line: $ grep EFI nobak/kernel/linux/.config CONFIG_EFI=y CONFIG_EFI_STUB=y # CONFIG_EFI_MIXED is not set CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y # EFI (Extensible Firmware Interface) Support CONFIG_EFI_VARS=m CONFIG_EFI_ESRT=y CONFIG_EFI_VARS_PSTORE=m # CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE is not set CONFIG_EFI_RUNTIME_MAP=y # CONFIG_EFI_FAKE_MEMMAP is not set CONFIG_EFI_RUNTIME_WRAPPERS=y # CONFIG_EFI_BOOTLOADER_CONTROL is not set # CONFIG_EFI_CAPSULE_LOADER is not set # CONFIG_EFI_TEST is not set # CONFIG_EFI_RCI2_TABLE is not set # end of EFI (Extensible Firmware Interface) Support CONFIG_UEFI_CPER=y CONFIG_UEFI_CPER_X86=y CONFIG_EFI_EARLYCON=y CONFIG_EFI_PARTITION=y CONFIG_FB_EFI=y CONFIG_EFIVAR_FS=y # CONFIG_EFI_PGT_DUMP is not set $ grep CMDLINE nobak/kernel/linux/.config CONFIG_CMDLINE_BOOL=y CONFIG_CMDLINE="root=UUID=97[...]e4 rd.luks.uuid=8a[...]c3 rd.luks.allow-discards=8a[...]c3 mem_sleep_default=deep resume=UUID=97[...]e4 resume_offset=96256 efi=debug memblock=debug" CONFIG_CMDLINE_OVERRIDE=y # CONFIG_BLK_CMDLINE_PARSER is not set # CONFIG_CMDLINE_PARTITION is not set CONFIG_FB_CMDLINE=y $ efibootmgr -v BootCurrent: 000A Timeout: 2 seconds BootOrder: 000A,0009,0008,0005,0007,0006,0004,0002,0001,,0003 [...] Boot0005* gentoo-5.4.0-next-20191127+-clear HD(1,GPT,e7[...]f2,0x800,0x64000)/File(\kernel-5.4.0-next-20191127+-clear) [...] Boot000A* gentoo-5.4.1-gentoo HD(1,GPT,e7[...]f2,0x800,0x64000)/File(\kernel-5.4.1-gentoo) So there's no boot loader that could construct an e820 table for the kernel to consume. I understand it's then up to the EFI stub to come up with a e820 table from the EFI memory map. > > Long time ago the add_efi_memmap is only enabled in case we explict > > enable it on cmdline, I'm not sure if we can do it by default, maybe we > > should. Need opinion from X86 maintainers.. > > Can you try below diff see if it works for you? (not tested, and need > > explicitly 'add_efi_memmap' in kernel cmdline param) Neither adding add_efi_memmap nor adding your patch and setting that option does make the ESRT memory region appear in /proc/iomem. kexec_file still loads the kernel across the ESRT region. What occurs to me is that nowhere does the ESRT memory region appear in any externally provided memory map. Neither e820 nor EFI seem to declare it. Is that expected or a bug of my particular system? For example, the e820 map (derived from the EFI map by the EFI stub?) has these regions: BIOS-provided physical RAM map: BIOS-e820: [mem 0x-0x0009efff] usable BIOS-e820: [mem 0x0009f000-0x000f] reserved BIOS-e820: [mem 0x0010-0x763f5fff] usable BIOS-e820: [mem 0x763f6000-0x79974fff] reserved BIOS-e820: [mem 0x79975000-0x799f1fff] ACPI data BIOS-e820: [mem 0x799f2000-0x79aa6fff] ACPI NVS BIOS-e820: [mem 0x79aa7000-0x7a40dfff] reserved BIOS-e820: [mem 0x7a40e000-0x7a40efff] usable BIOS-e820: [mem 0x7a40f000-0x7fff] reserved BIOS-e820: [mem 0xf000-0xf7ff] reserved BIOS-e820: [mem 0xfe00-0xfe010fff] reserved BIOS-e820: [mem 0xfec0-0xfec00fff] reserved BIOS-e820: [mem 0xfed0-0xfed03fff] reserved BIOS-e820: [mem 0xfee0-0xfee00fff] reserved BIOS-e820: [mem 0xff00-0x] reserved BIOS-e820: [mem 0x0001-0x00047dff] usable The ESRT region sits smack in the middle of a large system RAM region: BIOS-e820: [mem 0x0010-0x763f5fff] usable Consequently, the relevant part of /proc/iomem looks like this: -0fff : Reserved 1000-0009efff : System RAM 0009f000-000f : Reserved 000a-000b : PCI Bus :00 000e-000e3fff : PCI Bus :00 000e4000-000e7fff : PCI Bus :00 000e8000-000ebfff : PCI Bus :00 000ec000-000e : PCI Bus :00 000f-000f : PCI Bus :00 000f-000f : System ROM 0010-763f5fff : System RAM 6500-6aff : Crash kernel 763f6000-79974fff : Reserved 79975000-799f1fff : ACPI Tables 799f2000-79aa6fff : ACPI Non-volatile Storage 79a17000-79a17fff : USBC000:00 What it would need to look like for kexec to leave ESRT alone, I guess, is: -0fff : Reserved 1000-0009efff : System RAM 0009f000-000f : Reserved 000
Re: kexec_file overwrites reserved EFI ESRT memory
Add more cc On 12/02/19 at 04:58pm, Dave Young wrote: > On 11/29/19 at 04:27pm, Michael Weiser wrote: > > Hello Dave, > > > > On Mon, Nov 25, 2019 at 01:52:01PM +0800, Dave Young wrote: > > > > > > > Fundamentally when deciding where to place a new kernel kexec (either > > > > > user space or the in kernel kexec_file implementation) needs to be > > > > > able > > > > > to ask the question which memory ares are reserved. > > [...] > > > > > So my question is why doesn't the ESRT reservation wind up in > > > > > /proc/iomem? > > > > > > > > My guess is that the focus was that some EFI structures need to be kept > > > > around accross the life cycle of *one* running kernel and > > > > memblock_reserve() was enough for that. Marking them so they survive > > > > kexecing another kernel might just never have cropped up thus far. Ard > > > > or Matt would know. > > > Can you check your un-reserved memory, if your memory falls into EFI > > > BOOT* then in X86 you can use something like below if it is not covered: > > > > > void __init efi_esrt_init(void) > > > { > > > ... > > > pr_info("Reserving ESRT space from %pa to %pa.\n", &esrt_data, &end); > > > if (md.type == EFI_BOOT_SERVICES_DATA) > > > efi_mem_reserve(esrt_data, esrt_data_size); > > > ... > > > } > > > > Please bear with me if I'm a bit slow on the uptake here: On my machine, > > the esrt module reports at boot: > > > > [0.001244] esrt: Reserving ESRT space from 0x74dd2f98 to > > 0x74dd2fd0. > > > > This area is of type "Boot Data" (== BOOT_SERVICES_DATA) which makes the > > code you quote reserve it using memblock_reserve() shown by > > memblock=debug: > > > > [0.001246] memblock_reserve: [0x74dd2f98-0x74dd2fcf] > > efi_mem_reserve+0x1d/0x2b > > > > It also calls into arch/x86/platform/efi/quirks.c:efi_arch_mem_reserve() > > which tags it as EFI_MEMORY_RUNTIME while the surrounding ones aren't > > as shown by efi=debug: > > > > [0.178111] efi: mem10: [Boot Data | | | | | | | | | > > |WB|WT|WC|UC] range=[0x74dd3000-0x75becfff] (14MB) > > [0.178113] efi: mem11: [Boot Data |RUN| | | | | | | | > > |WB|WT|WC|UC] range=[0x74dd2000-0x74dd2fff] (0MB) > > [0.178114] efi: mem12: [Boot Data | | | | | | | | | > > |WB|WT|WC|UC] range=[0x6d635000-0x74dd1fff] (119MB) > > > > This prevents arch/x86/platform/efi/quirks.c:efi_free_boot_services() > > from calling __memblock_free_late() on it. And indeed, memblock=debug does > > not report this area as being free'd while the surrounding ones are: > > > > [0.178369] __memblock_free_late: > > [0x74dd3000-0x75becfff] efi_free_boot_services+0x126/0x1f8 > > [0.178658] __memblock_free_late: > > [0x6d635000-0x74dd1fff] efi_free_boot_services+0x126/0x1f8 > > > > The esrt area does not show up in /proc/iomem though: > > > > 0010-763f5fff : System RAM > > 6200-62a00d80 : Kernel code > > 62c0-62f15fff : Kernel rodata > > 6300-630ea8bf : Kernel data > > 63fed000-641f : Kernel bss > > 6500-6aff : Crash kernel > > > > And thus kexec loads the new kernel right over that area as shown when > > enabling -DDEBUG on kexec_file.c (0x74dd3000 being inbetween 0x7300 > > and 0x7300+0x24be000 = 0x754be000): > > > > [ 650.007695] kexec_file: Loading segment 0: buf=0x3a9c84d6 > > bufsz=0x5000 mem=0x98000 memsz=0x6000 > > [ 650.007699] kexec_file: Loading segment 1: buf=0x17b2b9e6 > > bufsz=0x1240 mem=0x96000 memsz=0x2000 > > [ 650.007703] kexec_file: Loading segment 2: buf=0xfdf72ba2 > > bufsz=0x1150888 mem=0x7300 memsz=0x24be000 > > > > ... because it looks for any memory hole large enough in iomem resources > > tagged as System RAM, which 0x74dd2000-0x74dd2fff would then need to be > > excluded from on my system. > > > > Looking some more at efi_arch_mem_reserve() I see that it also registers > > the area with efi.memmap and installs it using efi_memmap_install(). > > which seems to call memremap(MEMREMAP_WB) on it. From my understanding > > of the comments in the source of memremap(), MEMREMAP_WB does specifically > > *not* reserve that memory in any way. > > > > > Unfortunately I noticed there are different requirements/ways for > > > different types of "reserved" memory. But that is another topic.. > > > > I tried to reserve the area with something like this: > > > > t a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c > > index 4de244683a7e..b86a5df027a2 100644 > > --- a/arch/x86/platform/efi/quirks.c > > +++ b/arch/x86/platform/efi/quirks.c > > @@ -249,6 +249,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 > > size) > > efi_memory_desc_t md; > > int num_entries; > > void *new; > > + struct resource *res; > > > > if (efi_mem_desc_lookup(addr, &md) || > >
Re: kexec_file overwrites reserved EFI ESRT memory
On 11/29/19 at 04:27pm, Michael Weiser wrote: > Hello Dave, > > On Mon, Nov 25, 2019 at 01:52:01PM +0800, Dave Young wrote: > > > > > Fundamentally when deciding where to place a new kernel kexec (either > > > > user space or the in kernel kexec_file implementation) needs to be able > > > > to ask the question which memory ares are reserved. > [...] > > > > So my question is why doesn't the ESRT reservation wind up in > > > > /proc/iomem? > > > > > > My guess is that the focus was that some EFI structures need to be kept > > > around accross the life cycle of *one* running kernel and > > > memblock_reserve() was enough for that. Marking them so they survive > > > kexecing another kernel might just never have cropped up thus far. Ard > > > or Matt would know. > > Can you check your un-reserved memory, if your memory falls into EFI > > BOOT* then in X86 you can use something like below if it is not covered: > > > void __init efi_esrt_init(void) > > { > > ... > > pr_info("Reserving ESRT space from %pa to %pa.\n", &esrt_data, &end); > > if (md.type == EFI_BOOT_SERVICES_DATA) > > efi_mem_reserve(esrt_data, esrt_data_size); > > ... > > } > > Please bear with me if I'm a bit slow on the uptake here: On my machine, > the esrt module reports at boot: > > [0.001244] esrt: Reserving ESRT space from 0x74dd2f98 to > 0x74dd2fd0. > > This area is of type "Boot Data" (== BOOT_SERVICES_DATA) which makes the > code you quote reserve it using memblock_reserve() shown by > memblock=debug: > > [0.001246] memblock_reserve: [0x74dd2f98-0x74dd2fcf] > efi_mem_reserve+0x1d/0x2b > > It also calls into arch/x86/platform/efi/quirks.c:efi_arch_mem_reserve() > which tags it as EFI_MEMORY_RUNTIME while the surrounding ones aren't > as shown by efi=debug: > > [0.178111] efi: mem10: [Boot Data | | | | | | | | | > |WB|WT|WC|UC] range=[0x74dd3000-0x75becfff] (14MB) > [0.178113] efi: mem11: [Boot Data |RUN| | | | | | | | > |WB|WT|WC|UC] range=[0x74dd2000-0x74dd2fff] (0MB) > [0.178114] efi: mem12: [Boot Data | | | | | | | | | > |WB|WT|WC|UC] range=[0x6d635000-0x74dd1fff] (119MB) > > This prevents arch/x86/platform/efi/quirks.c:efi_free_boot_services() > from calling __memblock_free_late() on it. And indeed, memblock=debug does > not report this area as being free'd while the surrounding ones are: > > [0.178369] __memblock_free_late: [0x74dd3000-0x75becfff] > efi_free_boot_services+0x126/0x1f8 > [0.178658] __memblock_free_late: [0x6d635000-0x74dd1fff] > efi_free_boot_services+0x126/0x1f8 > > The esrt area does not show up in /proc/iomem though: > > 0010-763f5fff : System RAM > 6200-62a00d80 : Kernel code > 62c0-62f15fff : Kernel rodata > 6300-630ea8bf : Kernel data > 63fed000-641f : Kernel bss > 6500-6aff : Crash kernel > > And thus kexec loads the new kernel right over that area as shown when > enabling -DDEBUG on kexec_file.c (0x74dd3000 being inbetween 0x7300 > and 0x7300+0x24be000 = 0x754be000): > > [ 650.007695] kexec_file: Loading segment 0: buf=0x3a9c84d6 > bufsz=0x5000 mem=0x98000 memsz=0x6000 > [ 650.007699] kexec_file: Loading segment 1: buf=0x17b2b9e6 > bufsz=0x1240 mem=0x96000 memsz=0x2000 > [ 650.007703] kexec_file: Loading segment 2: buf=0xfdf72ba2 > bufsz=0x1150888 mem=0x7300 memsz=0x24be000 > > ... because it looks for any memory hole large enough in iomem resources > tagged as System RAM, which 0x74dd2000-0x74dd2fff would then need to be > excluded from on my system. > > Looking some more at efi_arch_mem_reserve() I see that it also registers > the area with efi.memmap and installs it using efi_memmap_install(). > which seems to call memremap(MEMREMAP_WB) on it. From my understanding > of the comments in the source of memremap(), MEMREMAP_WB does specifically > *not* reserve that memory in any way. > > > Unfortunately I noticed there are different requirements/ways for > > different types of "reserved" memory. But that is another topic.. > > I tried to reserve the area with something like this: > > t a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c > index 4de244683a7e..b86a5df027a2 100644 > --- a/arch/x86/platform/efi/quirks.c > +++ b/arch/x86/platform/efi/quirks.c > @@ -249,6 +249,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 > size) > efi_memory_desc_t md; > int num_entries; > void *new; > + struct resource *res; > > if (efi_mem_desc_lookup(addr, &md) || > md.type != EFI_BOOT_SERVICES_DATA) { > @@ -294,6 +295,21 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 > size) > early_memunmap(new, new_size); > > efi_memmap_install(new_phys, num_entries); > + > + res = memblock_alloc(sizeof(*res),
Re: kexec_file overwrites reserved EFI ESRT memory
Hello Dave, On Mon, Nov 25, 2019 at 01:52:01PM +0800, Dave Young wrote: > > > Fundamentally when deciding where to place a new kernel kexec (either > > > user space or the in kernel kexec_file implementation) needs to be able > > > to ask the question which memory ares are reserved. [...] > > > So my question is why doesn't the ESRT reservation wind up in > > > /proc/iomem? > > > > My guess is that the focus was that some EFI structures need to be kept > > around accross the life cycle of *one* running kernel and > > memblock_reserve() was enough for that. Marking them so they survive > > kexecing another kernel might just never have cropped up thus far. Ard > > or Matt would know. > Can you check your un-reserved memory, if your memory falls into EFI > BOOT* then in X86 you can use something like below if it is not covered: > void __init efi_esrt_init(void) > { > ... > pr_info("Reserving ESRT space from %pa to %pa.\n", &esrt_data, &end); > if (md.type == EFI_BOOT_SERVICES_DATA) > efi_mem_reserve(esrt_data, esrt_data_size); > ... > } Please bear with me if I'm a bit slow on the uptake here: On my machine, the esrt module reports at boot: [0.001244] esrt: Reserving ESRT space from 0x74dd2f98 to 0x74dd2fd0. This area is of type "Boot Data" (== BOOT_SERVICES_DATA) which makes the code you quote reserve it using memblock_reserve() shown by memblock=debug: [0.001246] memblock_reserve: [0x74dd2f98-0x74dd2fcf] efi_mem_reserve+0x1d/0x2b It also calls into arch/x86/platform/efi/quirks.c:efi_arch_mem_reserve() which tags it as EFI_MEMORY_RUNTIME while the surrounding ones aren't as shown by efi=debug: [0.178111] efi: mem10: [Boot Data | | | | | | | | | |WB|WT|WC|UC] range=[0x74dd3000-0x75becfff] (14MB) [0.178113] efi: mem11: [Boot Data |RUN| | | | | | | | |WB|WT|WC|UC] range=[0x74dd2000-0x74dd2fff] (0MB) [0.178114] efi: mem12: [Boot Data | | | | | | | | | |WB|WT|WC|UC] range=[0x6d635000-0x74dd1fff] (119MB) This prevents arch/x86/platform/efi/quirks.c:efi_free_boot_services() from calling __memblock_free_late() on it. And indeed, memblock=debug does not report this area as being free'd while the surrounding ones are: [0.178369] __memblock_free_late: [0x74dd3000-0x75becfff] efi_free_boot_services+0x126/0x1f8 [0.178658] __memblock_free_late: [0x6d635000-0x74dd1fff] efi_free_boot_services+0x126/0x1f8 The esrt area does not show up in /proc/iomem though: 0010-763f5fff : System RAM 6200-62a00d80 : Kernel code 62c0-62f15fff : Kernel rodata 6300-630ea8bf : Kernel data 63fed000-641f : Kernel bss 6500-6aff : Crash kernel And thus kexec loads the new kernel right over that area as shown when enabling -DDEBUG on kexec_file.c (0x74dd3000 being inbetween 0x7300 and 0x7300+0x24be000 = 0x754be000): [ 650.007695] kexec_file: Loading segment 0: buf=0x3a9c84d6 bufsz=0x5000 mem=0x98000 memsz=0x6000 [ 650.007699] kexec_file: Loading segment 1: buf=0x17b2b9e6 bufsz=0x1240 mem=0x96000 memsz=0x2000 [ 650.007703] kexec_file: Loading segment 2: buf=0xfdf72ba2 bufsz=0x1150888 mem=0x7300 memsz=0x24be000 ... because it looks for any memory hole large enough in iomem resources tagged as System RAM, which 0x74dd2000-0x74dd2fff would then need to be excluded from on my system. Looking some more at efi_arch_mem_reserve() I see that it also registers the area with efi.memmap and installs it using efi_memmap_install(). which seems to call memremap(MEMREMAP_WB) on it. From my understanding of the comments in the source of memremap(), MEMREMAP_WB does specifically *not* reserve that memory in any way. > Unfortunately I noticed there are different requirements/ways for > different types of "reserved" memory. But that is another topic.. I tried to reserve the area with something like this: t a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 4de244683a7e..b86a5df027a2 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -249,6 +249,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) efi_memory_desc_t md; int num_entries; void *new; + struct resource *res; if (efi_mem_desc_lookup(addr, &md) || md.type != EFI_BOOT_SERVICES_DATA) { @@ -294,6 +295,21 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) early_memunmap(new, new_size); efi_memmap_install(new_phys, num_entries); + + res = memblock_alloc(sizeof(*res), SMP_CACHE_BYTES); + if (!res) { + pr_err("Failed to allocate EFI io resource allocator for " + "0x%llx:0x%llx", mr.range.start, mr.range.end); + return; + } + + res->start = mr.range.start; +
kexec_file overwrites reserved EFI ESRT memory
Hello Eric, Hello Ard, on my machine, kexec_file loads the normal (not crash) kernel image right across the EFI ESRT reserved memory range: esrt: Reserving ESRT space from 0x74dd6f98 to 0x74dd6fd0. [...] kexec_file: kernel signature verification successful. kexec_file: Loading segment 0: buf=0xe99b31ad bufsz=0x5000 mem=0x91000 memsz=0x6000 kexec_file: Loading segment 1: buf=0xe45cdeb8 bufsz=0x1240 mem=0x8f000 memsz=0x2000 kexec_file: Loading segment 2: buf=0x096e6de9 bufsz=0x1133888 mem=0x7300 memsz=0x249a000 This causes the following message by the kexec'd kernel: esrt: Unsupported ESRT version 2904149718861218184. (The image is rather large at 18MiB as it has a built-in initrd.) Poking at the involved code a bit (as a layman) I found that the EFI code reserves the memory range using memblock_reserve() which is by all appearances correctly handed over to the buddy allocator as in-use/reserved. kexec_file on the other hand by default looks at iomem regions of type System RAM using walk_system_ram_res() and does not seem to have that particular information available to consider. (As may have become clear from this explanation I'm still somewhat fuzzy (to put it midly) on the relationship of memblock, buddy and slab allocator and how (if at all) kexec_file interacts with them to a.) find available memory regions for the new kernel to load to and b.) tell them where it loaded the new kernel to so they don't use it any more.) As is to be expected, activating CONFIG_ARCH_KEEP_MEMBLOCK makes kexec_file use the preserved memblock structures and indeed end up using totally different memory regions and gets rid of the message: kexec_file: kernel signature verification successful. kexec_file: Loading segment 0: buf=0x2dea71f8 bufsz=0x5000 mem=0x47df8e000 memsz=0x6000 kexec_file: Loading segment 1: buf=0x0686ff17 bufsz=0x1240 mem=0x47df8c000 memsz=0x2000 kexec_file: Loading segment 2: buf=0xfc444e67 bufsz=0x1133888 mem=0x46900 memsz=0x2497000 This is with 5.3.11 mainline and linux-next 5.4.0-rc8-next-20191122. I'm not actually trying to use ESRT for anything at this point but want to stop the boot message from messing up silent boot and suspect that this could potentially happen to other, more important EFI memory regions as well. I'm willing to chase this further but at this point I'm wondering whether it's the EFI code not reserving this memory area with enough emphasis (as iomem?) or kexec_file not checking usability of candidate memory regions rigorously enough (based on what other criteria?). Are there maybe any upcoming patches or subsystem-specific kernel trees I should try? Please let me know what other information may be helpful or if I should open a bug on bugzilla.kernel.org. Boot messages on normal boot: Linux version 5.3.11-gentoo (m@n) (gcc version 9.2.0 (Gentoo 9.2.0-r2 p3)) #29 SMP Thu Nov 21 20:40:28 CET 2019 Command line: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format. BIOS-provided physical RAM map: BIOS-e820: [mem 0x-0x0009efff] usable BIOS-e820: [mem 0x0009f000-0x000f] reserved BIOS-e820: [mem 0x0010-0x763fafff] usable BIOS-e820: [mem 0x763fb000-0x79979fff] reserved BIOS-e820: [mem 0x7997a000-0x799f6fff] ACPI data BIOS-e820: [mem 0x799f7000-0x79aabfff] ACPI NVS BIOS-e820: [mem 0x79aac000-0x7a40dfff] reserved BIOS-e820: [mem 0x7a40e000-0x7a40efff] usable BIOS-e820: [mem 0x7a40f000-0x7fff] reserved BIOS-e820: [mem 0xf000-0xf7ff] reserved BIOS-e820: [mem 0xfe00-0xfe010fff] reserved BIOS-e820: [mem 0xfec0-0xfec00fff] reserved BIOS-e820: [mem 0xfed0-0xfed03fff] reserved BIOS-e820: [mem 0xfee0-0xfee00fff] reserved BIOS-e820: [mem 0xff00-0x] reserved BIOS-e820: [mem 0x0001-0x00047dff] usable NX (Execute Disable) protection: active efi: EFI v2.70 by American Megatrends efi: ACPI 2.0=0x79993000 ACPI=0x79993000 TPMFinalLog=0x79a35000 SMBIOS=0x7a1cf000 SMBIOS 3.0=0x7a1ce000 ESRT=0x74dd6f98 TPMEventLog=0x6d634018 efi: mem00: [Conventional Memory| | | | | | | | |WB|WT|WC|UC] range=[0x-0x0fff] (0MB) efi: mem01: [Loader Data| | | | | | | | |W
Re: kexec_file overwrites reserved EFI ESRT memory
Michael Weiser writes: > Hello Eric, > Hello Ard, > > on my machine, kexec_file loads the normal (not crash) kernel image > right across the EFI ESRT reserved memory range: > > esrt: Reserving ESRT space from 0x74dd6f98 to 0x74dd6fd0. > [...] > kexec_file: kernel signature verification successful. > kexec_file: Loading segment 0: buf=0xe99b31ad bufsz=0x5000 > mem=0x91000 memsz=0x6000 > kexec_file: Loading segment 1: buf=0xe45cdeb8 bufsz=0x1240 > mem=0x8f000 memsz=0x2000 > kexec_file: Loading segment 2: buf=0x096e6de9 bufsz=0x1133888 > mem=0x7300 memsz=0x249a000 > > This causes the following message by the kexec'd kernel: > > esrt: Unsupported ESRT version 2904149718861218184. > > (The image is rather large at 18MiB as it has a built-in initrd.) When did x86_64 get support for ARCH_KEEP_MEMBLOCK? I can't find it anywhere. My recollection is that on x86 the definitive specification of what is reserved and what is not is the struct resource (aka /proc/iomem). While on some other architectures they do something else apparently the memblock implementation. Fundamentally when deciding where to place a new kernel kexec (either user space or the in kernel kexec_file implementation) needs to be able to ask the question which memory ares are reserved. What the buddy allocator does is unimportant as kexec copies memory from all over the place and places it in the destined memory addresses at the time of the kexec operation. So my question is why doesn't the ESRT reservation wind up in /proc/iomem? Are you dealing with an embedded port that is being clever? Or is there some subtle breakage now that x86 has memblock support that /proc/iomem is no longer being properly maintained? Eric > Poking at the involved code a bit (as a layman) I found that the EFI > code reserves the memory range using memblock_reserve() which is by all > appearances correctly handed over to the buddy allocator as > in-use/reserved. kexec_file on the other hand by default looks at iomem > regions of type System RAM using walk_system_ram_res() and does not seem > to have that particular information available to consider. (As may have > become clear from this explanation I'm still somewhat fuzzy (to put it > midly) on the relationship of memblock, buddy and slab allocator and how > (if at all) kexec_file interacts with them to a.) find available memory > regions for the new kernel to load to and b.) tell them where it > loaded the new kernel to so they don't use it any more.) > > As is to be expected, activating CONFIG_ARCH_KEEP_MEMBLOCK makes > kexec_file use the preserved memblock structures and indeed end up using > totally different memory regions and gets rid of the message: > > kexec_file: kernel signature verification successful. > kexec_file: Loading segment 0: buf=0x2dea71f8 bufsz=0x5000 > mem=0x47df8e000 memsz=0x6000 > kexec_file: Loading segment 1: buf=0x0686ff17 bufsz=0x1240 > mem=0x47df8c000 memsz=0x2000 > kexec_file: Loading segment 2: buf=0xfc444e67 bufsz=0x1133888 > mem=0x46900 memsz=0x2497000 > > This is with 5.3.11 mainline and linux-next 5.4.0-rc8-next-20191122. > > I'm not actually trying to use ESRT for anything at this point but want > to stop the boot message from messing up silent boot and suspect that > this could potentially happen to other, more important EFI memory > regions as well. > > I'm willing to chase this further but at this point I'm wondering > whether it's the EFI code not reserving this memory area with enough > emphasis (as iomem?) or kexec_file not checking usability of > candidate memory regions rigorously enough (based on what other > criteria?). > > Are there maybe any upcoming patches or subsystem-specific kernel trees > I should try? > > Please let me know what other information may be helpful or if I should > open a bug on bugzilla.kernel.org. > > Boot messages on normal boot: > Linux version 5.3.11-gentoo (m@n) (gcc version 9.2.0 (Gentoo 9.2.0-r2 p3)) > #29 SMP Thu Nov 21 20:40:28 CET 2019 > Command line: > x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' > x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' > x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' > x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' > x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' > x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 > x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 > x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 > x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using > 'compacted' format. > BIOS-provided physical RAM map: > BIOS-e820: [mem 0x-0x0009efff] usable > BIOS-e820: [mem 0x0009f000-0x000f] reserved > BIOS-e820: [mem 0x0010-0x763fafff] usable > BIOS-e820: [mem 0x763fb000-0x79979fff] reserved > BIOS-e820: [mem 0x7997a000-0x7
Re: kexec_file overwrites reserved EFI ESRT memory
On 11/22/19 at 10:07pm, Michael Weiser wrote: > Hi Eric, > > On Fri, Nov 22, 2019 at 02:00:22PM -0600, Eric W. Biederman wrote: > > > > esrt: Unsupported ESRT version 2904149718861218184. > > > > > > (The image is rather large at 18MiB as it has a built-in initrd.) > > When did x86_64 get support for ARCH_KEEP_MEMBLOCK? I can't find it > > anywhere. > > No, is hasn't. I temporarily hacked that in to see if it'd change > anything and it did. Sorry to not be more clear about that. > > > Fundamentally when deciding where to place a new kernel kexec (either > > user space or the in kernel kexec_file implementation) needs to be able > > to ask the question which memory ares are reserved. > > What the buddy > > allocator does is unimportant as kexec copies memory from all over > > the place and places it in the destined memory addresses at the > > time of the kexec operation. > > > So my question is why doesn't the ESRT reservation wind up in > > /proc/iomem? > > My guess is that the focus was that some EFI structures need to be kept > around accross the life cycle of *one* running kernel and > memblock_reserve() was enough for that. Marking them so they survive > kexecing another kernel might just never have cropped up thus far. Ard > or Matt would know. Can you check your un-reserved memory, if your memory falls into EFI BOOT* then in X86 you can use something like below if it is not covered: void __init efi_esrt_init(void) { ... pr_info("Reserving ESRT space from %pa to %pa.\n", &esrt_data, &end); if (md.type == EFI_BOOT_SERVICES_DATA) efi_mem_reserve(esrt_data, esrt_data_size); ... } Unfortunately I noticed there are different requirements/ways for different types of "reserved" memory. But that is another topic.. Thanks Dave ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: kexec_file overwrites reserved EFI ESRT memory
Hi Eric, On Fri, Nov 22, 2019 at 02:00:22PM -0600, Eric W. Biederman wrote: > > esrt: Unsupported ESRT version 2904149718861218184. > > > > (The image is rather large at 18MiB as it has a built-in initrd.) > When did x86_64 get support for ARCH_KEEP_MEMBLOCK? I can't find it > anywhere. No, is hasn't. I temporarily hacked that in to see if it'd change anything and it did. Sorry to not be more clear about that. > Fundamentally when deciding where to place a new kernel kexec (either > user space or the in kernel kexec_file implementation) needs to be able > to ask the question which memory ares are reserved. > What the buddy > allocator does is unimportant as kexec copies memory from all over > the place and places it in the destined memory addresses at the > time of the kexec operation. > So my question is why doesn't the ESRT reservation wind up in > /proc/iomem? My guess is that the focus was that some EFI structures need to be kept around accross the life cycle of *one* running kernel and memblock_reserve() was enough for that. Marking them so they survive kexecing another kernel might just never have cropped up thus far. Ard or Matt would know. > Are you dealing with an embedded port that is being clever? I'm not an expert but think it's rather the opposite: It's just a memory area provided by EFI containing some potentially interesting information about the EFI firmware structure itself. The aim is to aid firmware upgrades. This information needs to survive kexec so the user would be able to use that information (e.g. for upgrades) after a kexec. So apart from leaving that memory untouched, I guess it could also be copied over to a staging area by kexec explicitly to be preserved across the kexec. Or it could be blanked out in such a way that the esrt driver would not find it after kexec and just be unavailable, if it's decided that you should only use data about a firmware for upgrades that you really just used to boot. I guess a bigger question could be asked whether it would actually be useful and safe for esrt to be available after kexec. > Or is there some subtle breakage now that x86 has memblock support that > /proc/iomem is no longer being properly maintained? Uuuh, let me backpaddle very hard here: x86 has not gained memblock preserve support. That was just me mucking about. Sorry. -- Thanks, Michael ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec