Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

2019-12-05 Thread Michael Weiser
On Thu, Dec 05, 2019 at 06:55:45PM +0800, Dave Young wrote:

> >esrt: Unsupported ESRT version 2904149718861218184.
> > 
> >  The ESRT memory stays in EFI boot services data, and it was reserved
> >  in kernel via efi_mem_reserve().  The initial purpose of the reservation
> >  is to reuse the EFI boot services data across kexec reboot. For example
> >  the BGRT image data and some ESRT memory like Michael reported.
> > 
> >  But although the memory is reserved it is not updated in the X86 E820 
> > table,
> >  and kexec_file_load() iterates system RAM in the IO resource list to find 
> > places
> >  for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> >  initramfs overwrote the ESRT memory and then the failure happened.
> > 
> >  Since kexec_file_load() depends on the E820 table being updated, just fix 
> > this
> >  by updating the reserved EFI boot services memory as reserved type in E820.
> Thanks for the amending, also thank all for the review and test.

Same from me, particularly everyone's patience with my haphazard
guesswork around an area I clearly know nothing about. :)
-- 
Thanks,
Michael

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

2019-12-04 Thread Michael Weiser
Hello Dave,

On Wed, Dec 04, 2019 at 03:59:17PM +0800, Dave Young wrote:
> > Signed-off-by: Dave Young 
> > ---
> >  arch/x86/platform/efi/quirks.c |6 ++
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> > return;
> > }
> >  
> > -   /* No need to reserve regions that will never be freed. */
> > -   if (md.attribute & EFI_MEMORY_RUNTIME)
> > -   return;
> > -
> > size += addr % EFI_PAGE_SIZE;
> > size = round_up(size, EFI_PAGE_SIZE);
> > addr = round_down(addr, EFI_PAGE_SIZE);
> > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> > early_memunmap(new, new_size);
> >  
> > efi_memmap_install(new_phys, num_entries);
> > +   e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> > +   e820__update_table(e820_table);
> >  }
> >  
> >  /*
> Michael, could you a one more test and provide a tested-by if it works
> for you?

Did three successful kexecs in sequence of mainline 5.4.0 plus the patch
(had problems getting recent -next to boot on my machine). ESRT region
stayed reserved and intact so that the "Invalid version" error message   
is gone.

Tested-by: Michael Weiser 
--
Thanks!
Michael

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: kexec_file overwrites reserved EFI ESRT memory

2019-12-03 Thread Michael Weiser
Hi Dave,

On Tue, Dec 03, 2019 at 07:54:35PM +0800, Dave Young wrote:

> > Neither adding add_efi_memmap nor adding your patch and setting that option
> > does make the ESRT memory region appear in /proc/iomem. kexec_file still
> > loads the kernel across the ESRT region.
> Hmm, sorry, my bad, actuall add_efi_memmap does not consider the
> EFI_MEMORY_RUNTIME attribute, it only reads the memory descriptor types.

> Will read your replied information later, did not get time today, but
> probably below chunk can help?

> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index 3b9fd679cea9..516307617621 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -293,6 +293,8 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
> size)
>   early_memunmap(new, new_size);

>   efi_memmap_install(new_phys, num_entries);
> + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> + e820__update_table(e820_table);
>  }

>  /*

Yes, that did it:

-0fff : Reserved
1000-0009efff : System RAM
0009f000-000f : Reserved
  000a-000b : PCI Bus :00
  000e-000e3fff : PCI Bus :00
  000e4000-000e7fff : PCI Bus :00
  000e8000-000ebfff : PCI Bus :00
  000ec000-000e : PCI Bus :00
  000f-000f : PCI Bus :00
000f-000f : System ROM
0010-74dd1fff : System RAM
  6500-6aff : Crash kernel
74dd2000-74dd2fff : Reserved   <- ESRT
74dd3000-763f5fff : System RAM
763f6000-79974fff : Reserved
79975000-799f1fff : ACPI Tables
799f2000-79aa6fff : ACPI Non-volatile Storage
  79a17000-79a17fff : USBC000:00

[0.001381] esrt: Reserving ESRT space from 0x74dd2f98 to 
0x74dd2fd0.
[0.001382] memblock_reserve: [0x74dd2f98-0x74dd2fcf] 
efi_mem_reserve+0x1d/0x2b
[0.001383] memblock_reserve: [0x0009e640-0x0009efcf] 
memblock_alloc_range_nid+0x93/0xfa
[0.001384] e820: update [mem 0x74dd2000-0x74dd2fff] usable ==> reserved
[...]
[0.043610] PM: Registered nosave memory: [mem 0x-0x0fff]
[0.043611] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 
from=0x max_addr=0x 
__register_nosave_region+0x6b/0xca
[0.043612] memblock_reserve: [0x00047dff95c0-0x00047dff95df] 
memblock_alloc_range_nid+0x93/0xfa
[0.043613] PM: Registered nosave memory: [mem 0x0009f000-0x000f]
[0.043615] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 
from=0x max_addr=0x 
__register_nosave_region+0x6b/0xca
[0.043616] memblock_reserve: [0x00047dff9580-0x00047dff959f] 
memblock_alloc_range_nid+0x93/0xfa
[0.043617] PM: Registered nosave memory: [mem 0x74dd2000-0x74dd2fff]   
< ESRT
[0.043618] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 
from=0x max_addr=0x 
__register_nosave_region+0x6b/0xca
[0.043619] memblock_reserve: [0x00047dff9540-0x00047dff955f] 
memblock_alloc_range_nid+0x93/0xfa
[0.043620] PM: Registered nosave memory: [mem 0x763f6000-0x79974fff]
[0.043620] PM: Registered nosave memory: [mem 0x79975000-0x799f1fff]
[0.043621] PM: Registered nosave memory: [mem 0x799f2000-0x79aa6fff]
[0.043621] PM: Registered nosave memory: [mem 0x79aa7000-0x7a40dfff]
[...]
[5.993928] PCI: pci_cache_line_size set to 64 bytes
[5.994563] e820: reserve RAM buffer [mem 0x0009f000-0x0009]
[5.994565] e820: reserve RAM buffer [mem 0x74dd2000-0x77ff]
<- ESRT
[5.994565] e820: reserve RAM buffer [mem 0x763f6000-0x77ff]
[5.994566] e820: reserve RAM buffer [mem 0x7a40f000-0x7bff]
[5.994567] e820: reserve RAM buffer [mem 0x47e00-0x47fff]
[5.995513] acpi PNP0C14:02: duplicate WMI GUID 
05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
[5.995549] acpi PNP0C14:03: duplicate WMI GUID 
05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
[...]
[   86.508053] kexec-bzImage64: Loaded purgatory at 0x98000
[   86.508056] kexec_file: Considering 0x1000-0x9efff
[   86.508057] kexec-bzImage64: Loaded boot_param, command line and misc at 
0x96000 bufsz=0x1240 memsz=0x1240
[   86.508057] kexec_file: Considering 0x10-0x74dd1fff
[   86.508058] kexec-bzImage64: Loaded 64bit kernel at 0x7200 
bufsz=0x1140888 memsz=0x24b7000
[   86.508058] kexec-bzImage64: Final command line is: 
[   86.584668] kexec_file: Loading segment 0: buf=0xd5ec82bc 
bufsz=0x5000 mem=0x98000 memsz=0x6000
[   86.584672] kexec_file: Loading segment 1: buf=0xaf539c69 
bufsz=0x1240 mem=0x96000 memsz=0x2000
[   86.584674] kexec_file: Loading segment 2: buf=0x29f9b9a8 
bufsz=0x1140888 mem=0x7200 memsz=0x24b7000   < not ESRT :)

And no more invalid version error message from the kexec'd kernel.
-- 
Thanks,
Michael


Re: kexec_file overwrites reserved EFI ESRT memory

2019-12-02 Thread Michael Weiser
Hi Dave,

On Mon, Dec 02, 2019 at 05:05:20PM +0800, Dave Young wrote:

> > It seems a serious problem, the EFI modified memmap does not get an
> > /proc/iomem resource update, but kexec_file relies on /proc/iomem in
> > X86.
> > 
> > There is an question from Sai about why add_efi_memmap is not enabled by
> > default:
> > https://www.spinics.net/lists/linux-mm/msg185166.html

Incidentally, a data point I did not think to mention: I do boot the
kernel as EFI application directly from the firmware as a boot entry
with compiled in initrd and command line:

$ grep EFI nobak/kernel/linux/.config
CONFIG_EFI=y
CONFIG_EFI_STUB=y
# CONFIG_EFI_MIXED is not set
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# EFI (Extensible Firmware Interface) Support
CONFIG_EFI_VARS=m
CONFIG_EFI_ESRT=y
CONFIG_EFI_VARS_PSTORE=m
# CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE is not set
CONFIG_EFI_RUNTIME_MAP=y
# CONFIG_EFI_FAKE_MEMMAP is not set
CONFIG_EFI_RUNTIME_WRAPPERS=y
# CONFIG_EFI_BOOTLOADER_CONTROL is not set
# CONFIG_EFI_CAPSULE_LOADER is not set
# CONFIG_EFI_TEST is not set
# CONFIG_EFI_RCI2_TABLE is not set
# end of EFI (Extensible Firmware Interface) Support
CONFIG_UEFI_CPER=y
CONFIG_UEFI_CPER_X86=y
CONFIG_EFI_EARLYCON=y
CONFIG_EFI_PARTITION=y
CONFIG_FB_EFI=y
CONFIG_EFIVAR_FS=y
# CONFIG_EFI_PGT_DUMP is not set

$ grep CMDLINE nobak/kernel/linux/.config
CONFIG_CMDLINE_BOOL=y
CONFIG_CMDLINE="root=UUID=97[...]e4 rd.luks.uuid=8a[...]c3 
rd.luks.allow-discards=8a[...]c3 mem_sleep_default=deep resume=UUID=97[...]e4 
resume_offset=96256 efi=debug memblock=debug"
CONFIG_CMDLINE_OVERRIDE=y
# CONFIG_BLK_CMDLINE_PARSER is not set
# CONFIG_CMDLINE_PARTITION is not set
CONFIG_FB_CMDLINE=y

$ efibootmgr -v
BootCurrent: 000A
Timeout: 2 seconds
BootOrder: 000A,0009,0008,0005,0007,0006,0004,0002,0001,,0003
[...]
Boot0005* gentoo-5.4.0-next-20191127+-clear
HD(1,GPT,e7[...]f2,0x800,0x64000)/File(\kernel-5.4.0-next-20191127+-clear)
[...]
Boot000A* gentoo-5.4.1-gentoo
HD(1,GPT,e7[...]f2,0x800,0x64000)/File(\kernel-5.4.1-gentoo)

So there's no boot loader that could construct an e820 table for the
kernel to consume. I understand it's then up to the EFI stub to come up
with a e820 table from the EFI memory map.

> > Long time ago the add_efi_memmap is only enabled in case we explict
> > enable it on cmdline, I'm not sure if we can do it by default, maybe we
> > should.   Need opinion from X86 maintainers..
> > Can you try below diff see if it works for you? (not tested, and need
> > explicitly 'add_efi_memmap' in kernel cmdline param)

Neither adding add_efi_memmap nor adding your patch and setting that option
does make the ESRT memory region appear in /proc/iomem. kexec_file still
loads the kernel across the ESRT region.

What occurs to me is that nowhere does the ESRT memory region appear in
any externally provided memory map. Neither e820 nor EFI seem to declare
it. Is that expected or a bug of my particular system?

For example, the e820 map (derived from the EFI map by the EFI stub?)
has these regions:

BIOS-provided physical RAM map:
BIOS-e820: [mem 0x-0x0009efff] usable
BIOS-e820: [mem 0x0009f000-0x000f] reserved
BIOS-e820: [mem 0x0010-0x763f5fff] usable
BIOS-e820: [mem 0x763f6000-0x79974fff] reserved
BIOS-e820: [mem 0x79975000-0x799f1fff] ACPI data
BIOS-e820: [mem 0x799f2000-0x79aa6fff] ACPI NVS
BIOS-e820: [mem 0x79aa7000-0x7a40dfff] reserved
BIOS-e820: [mem 0x7a40e000-0x7a40efff] usable
BIOS-e820: [mem 0x7a40f000-0x7fff] reserved
BIOS-e820: [mem 0xf000-0xf7ff] reserved
BIOS-e820: [mem 0xfe00-0xfe010fff] reserved
BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
BIOS-e820: [mem 0xfed0-0xfed03fff] reserved
BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
BIOS-e820: [mem 0xff00-0x] reserved
BIOS-e820: [mem 0x0001-0x00047dff] usable

The ESRT region sits smack in the middle of a large system RAM region:

BIOS-e820: [mem 0x0010-0x763f5fff] usable

Consequently, the relevant part of /proc/iomem looks like this:

-0fff : Reserved
1000-0009efff : System RAM
0009f000-000f : Reserved
  000a-000b : PCI Bus :00
  000e-000e3fff : PCI Bus :00
  000e4000-000e7fff : PCI Bus :00
  000e8000-000ebfff : PCI Bus :00
  000ec000-000e : PCI Bus :00
  000f-000f : PCI Bus :00
000f-000f : System ROM
0010-763f5fff : System RAM
  6500-6aff : Crash kernel
763f6000-79974fff : Reserved
79975000-799f1fff : ACPI Tables
799f2000-79aa6fff : ACPI Non-volatile Storage
  79a17000-79a17fff : USBC000:00

What it would need to look like for kexec to leave ESRT alone, I guess, is:

-0fff : Reserved
1000-0009efff : System RAM
0009f000-000f : Reserved
  

Re: kexec_file overwrites reserved EFI ESRT memory

2019-11-29 Thread Michael Weiser
Hello Dave,

On Mon, Nov 25, 2019 at 01:52:01PM +0800, Dave Young wrote:

> > > Fundamentally when deciding where to place a new kernel kexec (either
> > > user space or the in kernel kexec_file implementation) needs to be able
> > > to ask the question which memory ares are reserved.
[...]
> > > So my question is why doesn't the ESRT reservation wind up in
> > > /proc/iomem?
> > 
> > My guess is that the focus was that some EFI structures need to be kept
> > around accross the life cycle of *one* running kernel and
> > memblock_reserve() was enough for that. Marking them so they survive
> > kexecing another kernel might just never have cropped up thus far. Ard
> > or Matt would know.
> Can you check your un-reserved memory, if your memory falls into EFI
> BOOT* then in X86 you can use something like below if it is not covered:

> void __init efi_esrt_init(void)
> {
> ...
>   pr_info("Reserving ESRT space from %pa to %pa.\n", _data, );
>   if (md.type == EFI_BOOT_SERVICES_DATA)
>   efi_mem_reserve(esrt_data, esrt_data_size);
> ...
> }

Please bear with me if I'm a bit slow on the uptake here: On my machine,
the esrt module reports at boot:

[0.001244] esrt: Reserving ESRT space from 0x74dd2f98 to 
0x74dd2fd0.

This area is of type "Boot Data" (== BOOT_SERVICES_DATA) which makes the
code you quote reserve it using memblock_reserve() shown by
memblock=debug:

[0.001246] memblock_reserve: [0x74dd2f98-0x74dd2fcf] 
efi_mem_reserve+0x1d/0x2b

It also calls into arch/x86/platform/efi/quirks.c:efi_arch_mem_reserve()
which tags it as EFI_MEMORY_RUNTIME while the surrounding ones aren't
as shown by efi=debug:

[0.178111] efi: mem10: [Boot Data  |   |  |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x74dd3000-0x75becfff] (14MB)
[0.178113] efi: mem11: [Boot Data  |RUN|  |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x74dd2000-0x74dd2fff] (0MB)
[0.178114] efi: mem12: [Boot Data  |   |  |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x6d635000-0x74dd1fff] (119MB)

This prevents arch/x86/platform/efi/quirks.c:efi_free_boot_services()
from calling __memblock_free_late() on it. And indeed, memblock=debug does
not report this area as being free'd while the surrounding ones are:

[0.178369] __memblock_free_late: [0x74dd3000-0x75becfff] 
efi_free_boot_services+0x126/0x1f8
[0.178658] __memblock_free_late: [0x6d635000-0x74dd1fff] 
efi_free_boot_services+0x126/0x1f8

The esrt area does not show up in /proc/iomem though:

0010-763f5fff : System RAM
  6200-62a00d80 : Kernel code
  62c0-62f15fff : Kernel rodata
  6300-630ea8bf : Kernel data
  63fed000-641f : Kernel bss
  6500-6aff : Crash kernel

And thus kexec loads the new kernel right over that area as shown when
enabling -DDEBUG on kexec_file.c (0x74dd3000 being inbetween 0x7300
and 0x7300+0x24be000 = 0x754be000):

[  650.007695] kexec_file: Loading segment 0: buf=0x3a9c84d6 
bufsz=0x5000 mem=0x98000 memsz=0x6000
[  650.007699] kexec_file: Loading segment 1: buf=0x17b2b9e6 
bufsz=0x1240 mem=0x96000 memsz=0x2000
[  650.007703] kexec_file: Loading segment 2: buf=0xfdf72ba2 
bufsz=0x1150888 mem=0x7300 memsz=0x24be000

... because it looks for any memory hole large enough in iomem resources
tagged as System RAM, which 0x74dd2000-0x74dd2fff would then need to be
excluded from on my system.

Looking some more at efi_arch_mem_reserve() I see that it also registers
the area with efi.memmap and installs it using efi_memmap_install().
which seems to call memremap(MEMREMAP_WB) on it. From my understanding
of the comments in the source of memremap(), MEMREMAP_WB does specifically
*not* reserve that memory in any way.

> Unfortunately I noticed there are different requirements/ways for
> different types of "reserved" memory.  But that is another topic..

I tried to reserve the area with something like this:

t a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 4de244683a7e..b86a5df027a2 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -249,6 +249,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
efi_memory_desc_t md;
int num_entries;
void *new;
+   struct resource *res;
 
if (efi_mem_desc_lookup(addr, ) ||
md.type != EFI_BOOT_SERVICES_DATA) {
@@ -294,6 +295,21 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
size)
early_memunmap(new, new_size);
 
efi_memmap_install(new_phys, num_entries);
+
+   res = memblock_alloc(sizeof(*res), SMP_CACHE_BYTES);
+   if (!res) {
+   pr_err("Failed to allocate EFI io resource allocator for "
+   "0x%llx:0x%llx", mr.range.start, mr.range.end);
+   return;
+   }
+
+   res->start  = mr.range.start;
+   

kexec_file overwrites reserved EFI ESRT memory

2019-11-26 Thread Michael Weiser
Hello Eric,
Hello Ard,

on my machine, kexec_file loads the normal (not crash) kernel image
right across the EFI ESRT reserved memory range:

esrt: Reserving ESRT space from 0x74dd6f98 to 0x74dd6fd0.
[...]
kexec_file: kernel signature verification successful.
kexec_file: Loading segment 0: buf=0xe99b31ad bufsz=0x5000 mem=0x91000 
memsz=0x6000
kexec_file: Loading segment 1: buf=0xe45cdeb8 bufsz=0x1240 mem=0x8f000 
memsz=0x2000
kexec_file: Loading segment 2: buf=0x096e6de9 bufsz=0x1133888 
mem=0x7300 memsz=0x249a000

This causes the following message by the kexec'd kernel:

esrt: Unsupported ESRT version 2904149718861218184.

(The image is rather large at 18MiB as it has a built-in initrd.)

Poking at the involved code a bit (as a layman) I found that the EFI
code reserves the memory range using memblock_reserve() which is by all
appearances correctly handed over to the buddy allocator as
in-use/reserved. kexec_file on the other hand by default looks at iomem
regions of type System RAM using walk_system_ram_res() and does not seem
to have that particular information available to consider. (As may have
become clear from this explanation I'm still somewhat fuzzy (to put it
midly) on the relationship of memblock, buddy and slab allocator and how
(if at all) kexec_file interacts with them to a.) find available memory
regions for the new kernel to load to and b.) tell them where it
loaded the new kernel to so they don't use it any more.)

As is to be expected, activating CONFIG_ARCH_KEEP_MEMBLOCK makes
kexec_file use the preserved memblock structures and indeed end up using
totally different memory regions and gets rid of the message:

kexec_file: kernel signature verification successful.
kexec_file: Loading segment 0: buf=0x2dea71f8 bufsz=0x5000 
mem=0x47df8e000 memsz=0x6000
kexec_file: Loading segment 1: buf=0x0686ff17 bufsz=0x1240 
mem=0x47df8c000 memsz=0x2000
kexec_file: Loading segment 2: buf=0xfc444e67 bufsz=0x1133888 
mem=0x46900 memsz=0x2497000

This is with 5.3.11 mainline and linux-next 5.4.0-rc8-next-20191122.

I'm not actually trying to use ESRT for anything at this point but want
to stop the boot message from messing up silent boot and suspect that
this could potentially happen to other, more important EFI memory
regions as well.

I'm willing to chase this further but at this point I'm wondering
whether it's the EFI code not reserving this memory area with enough
emphasis (as iomem?) or kexec_file not checking usability of
candidate memory regions rigorously enough (based on what other
criteria?).

Are there maybe any upcoming patches or subsystem-specific kernel trees
I should try?

Please let me know what other information may be helpful or if I should
open a bug on bugzilla.kernel.org.

Boot messages on normal boot:
Linux version 5.3.11-gentoo (m@n) (gcc version 9.2.0 (Gentoo 9.2.0-r2 p3)) #29 
SMP Thu Nov 21 20:40:28 CET 2019
Command line: 
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 
'compacted' format.
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x-0x0009efff] usable
BIOS-e820: [mem 0x0009f000-0x000f] reserved
BIOS-e820: [mem 0x0010-0x763fafff] usable
BIOS-e820: [mem 0x763fb000-0x79979fff] reserved
BIOS-e820: [mem 0x7997a000-0x799f6fff] ACPI data
BIOS-e820: [mem 0x799f7000-0x79aabfff] ACPI NVS
BIOS-e820: [mem 0x79aac000-0x7a40dfff] reserved
BIOS-e820: [mem 0x7a40e000-0x7a40efff] usable
BIOS-e820: [mem 0x7a40f000-0x7fff] reserved
BIOS-e820: [mem 0xf000-0xf7ff] reserved
BIOS-e820: [mem 0xfe00-0xfe010fff] reserved
BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
BIOS-e820: [mem 0xfed0-0xfed03fff] reserved
BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
BIOS-e820: [mem 0xff00-0x] reserved
BIOS-e820: [mem 0x0001-0x00047dff] usable
NX (Execute Disable) protection: active
efi: EFI v2.70 by American Megatrends
efi:  ACPI 2.0=0x79993000  ACPI=0x79993000  TPMFinalLog=0x79a35000  
SMBIOS=0x7a1cf000  SMBIOS 3.0=0x7a1ce000  ESRT=0x74dd6f98  
TPMEventLog=0x6d634018 
efi: mem00: [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC] 
range=[0x-0x0fff] (0MB)
efi: mem01: [Loader Data|   |  |  |  |  |  |  |   

Re: kexec_file overwrites reserved EFI ESRT memory

2019-11-22 Thread Michael Weiser
Hi Eric,

On Fri, Nov 22, 2019 at 02:00:22PM -0600, Eric W. Biederman wrote:

> > esrt: Unsupported ESRT version 2904149718861218184.
> >
> > (The image is rather large at 18MiB as it has a built-in initrd.)
> When did x86_64 get support for ARCH_KEEP_MEMBLOCK?  I can't find it
> anywhere.

No, is hasn't. I temporarily hacked that in to see if it'd change
anything and it did. Sorry to not be more clear about that.

> Fundamentally when deciding where to place a new kernel kexec (either
> user space or the in kernel kexec_file implementation) needs to be able
> to ask the question which memory ares are reserved.
> What the buddy
> allocator does is unimportant as kexec copies memory from all over
> the place and places it in the destined memory addresses at the
> time of the kexec operation.

> So my question is why doesn't the ESRT reservation wind up in
> /proc/iomem?

My guess is that the focus was that some EFI structures need to be kept
around accross the life cycle of *one* running kernel and
memblock_reserve() was enough for that. Marking them so they survive
kexecing another kernel might just never have cropped up thus far. Ard
or Matt would know.

> Are you dealing with an embedded port that is being clever?

I'm not an expert but think it's rather the opposite: It's just a memory
area provided by EFI containing some potentially interesting information
about the EFI firmware structure itself. The aim is to aid firmware
upgrades. This information needs to survive kexec so the user would be
able to use that information (e.g. for upgrades) after a kexec.

So apart from leaving that memory untouched, I guess it could also be
copied over to a staging area by kexec explicitly to be preserved across
the kexec. Or it could be blanked out in such a way that the esrt driver
would not find it after kexec and just be unavailable, if it's decided
that you should only use data about a firmware for upgrades that you
really just used to boot. I guess a bigger question could be asked
whether it would actually be useful and safe for esrt to be available
after kexec.

> Or is there some subtle breakage now that x86 has memblock support that
> /proc/iomem is no longer being properly maintained?

Uuuh, let me backpaddle very hard here: x86 has not gained memblock
preserve support. That was just me mucking about. Sorry.
-- 
Thanks,
Michael

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec