Re: [PATCH] efi: Fix handling of multiple contiguous efi_fake_mem= entries

2019-12-30 Thread Dave Young
Hi Dan,
On 12/30/19 at 11:58am, Dan Williams wrote:
> A recent test of efi_fake_mem=4G@9G:0x4,4G@13G:0x4 crashed with
> a signature of:
> 
> BUG: unable to handle page fault for address: ff281000
> [..]
> RIP: 0010:efi_memmap_insert+0x11d/0x191
> [..]
> Call Trace:
>  ? bgrt_init+0xbe/0xbe
>  ? efi_arch_mem_reserve+0x1cb/0x228
>  ? acpi_parse_bgrt+0xa/0xd
>  ? acpi_table_parse+0x86/0xb8
>  ? acpi_boot_init+0x494/0x4e3
>  ? acpi_parse_x2apic+0x87/0x87
>  ? setup_acpi_sci+0xa2/0xa2
>  ? setup_arch+0x8db/0x9e1
>  ? start_kernel+0x6a/0x547
>  ? secondary_startup_64+0xb6/0xc0
> 
> efi_memmap_insert() is attempting to insert entries past the end of the
> new map. This condition is setup by efi_fake_mem() leaking empty entries
> to the end of memory map, and then efi_arch_mem_reserve() trips over the
> bad entry when attempting an additional efi_memmap_insert(). The empty
> entry causes efi_memmap_insert() to attempt more memmap splits / copies
> than efi_memmap_split_count() accounted for when sizing the new map.
> 
> Update efi_fake_memmap() to cleanup lagging empty entries.
> 
> This change is related to commit af1648984828 "x86/efi: Update e820 with
> reserved EFI boot services data to fix kexec breakage" since that
> introduces more occurrences where efi_memmap_insert() is invoked after
> an efi_fake_mem= configuration has been parsed. Previously the side
> effects of vestigial empty entries were benign, but with commit
> af1648984828 that follow-on efi_memmap_insert() invocation triggers the
> above crash signature.
> 
> Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot 
> services...")
> Cc: Taku Izumi 
> Cc: Michael Weiser 
> Cc: Dave Young 
> Cc: Ard Biesheuvel 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Signed-off-by: Dan Williams 
> ---
>  drivers/firmware/efi/fake_mem.c |   22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> index bb9fc70d0cfa..6df51ba93ae8 100644
> --- a/drivers/firmware/efi/fake_mem.c
> +++ b/drivers/firmware/efi/fake_mem.c
> @@ -67,13 +67,33 @@ void __init efi_fake_memmap(void)
>   return;
>   }
>  
> + memset(new_memmap, 0, efi.memmap.desc_size * new_nr_map);
>   for (i = 0; i < nr_fake_mem; i++)
>   efi_memmap_insert(, new_memmap, _fake_mems[i]);
>  
> + /*
> +  * efi_memmap_split_count() may over count the number of
> +  * required splits in the case when contiguous fake entries are
> +  * specified. Check that all new_nr_map entries were consumed.
> +  */
> + for (i = new_nr_map; i > 0; i--) {
> + efi_memory_desc_t *md;
> + u64 start, end;
> +
> + md = new_memmap + efi.memmap.desc_size * (new_nr_map - i - 1);
> + end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
> + start = md->phys_addr;
> +
> + if (start == 0 && end + 1 == 0)
> + continue;
> + break;
> + }
> +
>   /* swap into new EFI memmap */
>   early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
>  
> - efi_memmap_install(new_memmap_phy, new_nr_map);
> + /* install only the valid entries */
> + efi_memmap_install(new_memmap_phy, i);
>  
>   /* print new EFI memmap */
>   efi_print_memmap();
> 

Although kernel bootup works with this patch, it still does not fix the
issue I noticed, you can see:
[root@localhost ~]# cat /proc/cmdline
BOOT_IMAGE=/bzImage root=/dev/vda3 ro audit=0 selinux=0 crashkernel=160M 
efi=debug console=ttyS0 console=tty0 3 
efi_fake_mem=200M@5G:0x4,300M@5600M:0x4 earlyprintk=serial
[root@localhost ~]# dmesg|grep fake_mem
[0.00] Command line: BOOT_IMAGE=/bzImage root=/dev/vda3 ro audit=0 
selinux=0 crashkernel=160M efi=debug console=ttyS0 console=tty0 3 
efi_fake_mem=200M@5G:0x4,300M@5600M:0x4 earlyprintk=serial
[0.00] efi_fake_mem: add attr=0x0004 to [mem 
0x00014000-0x00014c7f]
[0.00] efi_fake_mem: add attr=0x0004 to [mem 
0x00015e00-0x000170bf]
[root@localhost ~]# dmesg|grep SP
[0.085762] efi: mem48: [Conventional Memory|   |  |SP|  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x00015e00-0x000170bf] (300MB)


With this patch applied, there is still only one md set "SP" attr.  That
means only the last insert worked.

void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
  struct efi_mem_range *mem)

The above function will use the old_memmap as the base for each
inserting.  the old_memmap == , so when you do below:
for (i = 0; i < nr_fake_mem; i++)
efi_memmap_insert(, new_memmap, _fake_mems[i]);

Only the last inserting will take effect.  Below 

Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c

2019-12-30 Thread Chen Zhou
Hi Dave,

On 2019/12/28 17:32, Dave Young wrote:
> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>> Hi Dave
>>
>> On 2019/12/27 13:54, Dave Young wrote:
>>> Hi,
>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
 In preparation for supporting reserve_crashkernel_low in arm64 as
 x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.

 Note, in arm64, we reserve low memory if and only if crashkernel=X,low
 is specified. Different with x86_64, don't set low memory automatically.
>>>
>>> Do you have any reason for the difference?  I'd expect we have same
>>> logic if possible and remove some of the ifdefs.
>>
>> In x86_64, if we reserve crashkernel above 4G, then we call 
>> reserve_crashkernel_low()
>> to reserve low memory.
>>
>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of 
>> reserve_crashkernel()
>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() 
>> allocated something.
>> In this case, if reserve crashkernel below 4G there will be 256M low memory 
>> set automatically
>> and this needs extra considerations.
> 
> Sorry that I did not read the old thread details and thought that is
> arch dependent.  But rethink about that, it would be better that we can
> have same semantic about crashkernel parameters across arches.  If we
> make them different then it causes confusion, especially for
> distributions.
> 
> OTOH, I thought if we reserve high memory then the low memory should be
> needed.  There might be some exceptions, but I do not know the exact
> one, can we make the behavior same, and special case those systems which
> do not need low memory reservation.
> 
I thought like this and did implement with crashkernel parameters arch 
independent.
This is my v4: https://lkml.org/lkml/2019/5/6/1361, i implemented according to 
x86_64's
behavior.

>>
>> previous discusses:
>>  https://lkml.org/lkml/2019/6/5/670
>>  https://lkml.org/lkml/2019/6/13/229
> 
> Another concern from James:
> "
> With both crashk_low_res and crashk_res, we end up with two entries in 
> /proc/iomem called
> "Crash kernel". Because its sorted by address, and kexec-tools stops 
> searching when it
> find "Crash kernel", you are always going to get the kernel placed in the 
> lower portion.
> "
> 
> The kexec-tools code is iterating all "Crash kernel" ranges and add them
> in an array.  In X86 code, it uses the higher range to locate memory.

We also discussed about this: https://lkml.org/lkml/2019/6/13/227.
I guess James's opinion is that kexec-tools should take forward compatibility 
into account.
"But we can't rely on people updating user-space when they update the kernel!" 
-- James

> 
>>
>>>

 Reported-by: kbuild test robot 
 Signed-off-by: Chen Zhou 
 ---
  arch/x86/kernel/setup.c| 62 -
  include/linux/crash_core.h |  3 ++
  include/linux/kexec.h  |  2 --
  kernel/crash_core.c| 87 
 ++
  kernel/kexec_core.c| 17 -
  5 files changed, 96 insertions(+), 75 deletions(-)

 diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
 index cedfe20..5f38942 100644
 --- a/arch/x86/kernel/setup.c
 +++ b/arch/x86/kernel/setup.c
 @@ -486,59 +486,6 @@ static void __init 
 memblock_x86_reserve_range_setup_data(void)
  # define CRASH_ADDR_HIGH_MAX  SZ_64T
  #endif
  
 -static int __init reserve_crashkernel_low(void)
 -{
 -#ifdef CONFIG_X86_64
 -  unsigned long long base, low_base = 0, low_size = 0;
 -  unsigned long total_low_mem;
 -  int ret;
 -
 -  total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
 -
 -  /* crashkernel=Y,low */
 -  ret = parse_crashkernel_low(boot_command_line, total_low_mem, 
 _size, );
 -  if (ret) {
 -  /*
 -   * two parts from kernel/dma/swiotlb.c:
 -   * -swiotlb size: user-specified with swiotlb= or default.
 -   *
 -   * -swiotlb overflow buffer: now hardcoded to 32k. We round it
 -   * to 8M for other buffers that may need to stay low too. Also
 -   * make sure we allocate enough extra low memory so that we
 -   * don't run out of DMA buffers for 32-bit devices.
 -   */
 -  low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL 
 << 20);
 -  } else {
 -  /* passed with crashkernel=0,low ? */
 -  if (!low_size)
 -  return 0;
 -  }
 -
 -  low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
 -  if (!low_base) {
 -  pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
 smaller size.\n",
 - (unsigned long)(low_size >> 20));
 -  return -ENOMEM;
 -  }
 -
 -  ret = memblock_reserve(low_base, 

Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

2019-12-30 Thread Dan Williams
On Mon, Dec 30, 2019 at 1:42 AM Dan Williams  wrote:
[..]
> I'll send a patch to fix up efi_fake_memmap().

For others following this thread, that patch is here:

http://lore.kernel.org/r/157773590338.4153451.5898675419563883883.st...@dwillia2-desk3.amr.corp.intel.com

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] efi: Fix handling of multiple contiguous efi_fake_mem= entries

2019-12-30 Thread Dan Williams
A recent test of efi_fake_mem=4G@9G:0x4,4G@13G:0x4 crashed with
a signature of:

BUG: unable to handle page fault for address: ff281000
[..]
RIP: 0010:efi_memmap_insert+0x11d/0x191
[..]
Call Trace:
 ? bgrt_init+0xbe/0xbe
 ? efi_arch_mem_reserve+0x1cb/0x228
 ? acpi_parse_bgrt+0xa/0xd
 ? acpi_table_parse+0x86/0xb8
 ? acpi_boot_init+0x494/0x4e3
 ? acpi_parse_x2apic+0x87/0x87
 ? setup_acpi_sci+0xa2/0xa2
 ? setup_arch+0x8db/0x9e1
 ? start_kernel+0x6a/0x547
 ? secondary_startup_64+0xb6/0xc0

efi_memmap_insert() is attempting to insert entries past the end of the
new map. This condition is setup by efi_fake_mem() leaking empty entries
to the end of memory map, and then efi_arch_mem_reserve() trips over the
bad entry when attempting an additional efi_memmap_insert(). The empty
entry causes efi_memmap_insert() to attempt more memmap splits / copies
than efi_memmap_split_count() accounted for when sizing the new map.

Update efi_fake_memmap() to cleanup lagging empty entries.

This change is related to commit af1648984828 "x86/efi: Update e820 with
reserved EFI boot services data to fix kexec breakage" since that
introduces more occurrences where efi_memmap_insert() is invoked after
an efi_fake_mem= configuration has been parsed. Previously the side
effects of vestigial empty entries were benign, but with commit
af1648984828 that follow-on efi_memmap_insert() invocation triggers the
above crash signature.

Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
Cc: Taku Izumi 
Cc: Michael Weiser 
Cc: Dave Young 
Cc: Ard Biesheuvel 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Signed-off-by: Dan Williams 
---
 drivers/firmware/efi/fake_mem.c |   22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index bb9fc70d0cfa..6df51ba93ae8 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -67,13 +67,33 @@ void __init efi_fake_memmap(void)
return;
}
 
+   memset(new_memmap, 0, efi.memmap.desc_size * new_nr_map);
for (i = 0; i < nr_fake_mem; i++)
efi_memmap_insert(, new_memmap, _fake_mems[i]);
 
+   /*
+* efi_memmap_split_count() may over count the number of
+* required splits in the case when contiguous fake entries are
+* specified. Check that all new_nr_map entries were consumed.
+*/
+   for (i = new_nr_map; i > 0; i--) {
+   efi_memory_desc_t *md;
+   u64 start, end;
+
+   md = new_memmap + efi.memmap.desc_size * (new_nr_map - i - 1);
+   end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
+   start = md->phys_addr;
+
+   if (start == 0 && end + 1 == 0)
+   continue;
+   break;
+   }
+
/* swap into new EFI memmap */
early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
-   efi_memmap_install(new_memmap_phy, new_nr_map);
+   /* install only the valid entries */
+   efi_memmap_install(new_memmap_phy, i);
 
/* print new EFI memmap */
efi_print_memmap();


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

2019-12-30 Thread Dave Young
On 12/30/19 at 01:42am, Dan Williams wrote:
> On Sat, Dec 28, 2019 at 10:13 PM Dan Williams  
> wrote:
> >
> > On Sat, Dec 28, 2019 at 12:54 PM Dan Williams
> >  wrote:
> > >
> > > On Tue, Dec 3, 2019 at 11:53 PM Dave Young  wrote:
> > > >
> > > > Michael Weiser reported he got below error during a kexec rebooting:
> > > > esrt: Unsupported ESRT version 2904149718861218184.
> > > >
> > > > The ESRT memory stays in EFI boot services data, and it was reserved
> > > > in kernel via efi_mem_reserve().  The initial purpose of the reservation
> > > > is to reuse the EFI boot services data across kexec reboot. For example
> > > > the BGRT image data and some ESRT memory like Michael reported.
> > > >
> > > > But although the memory is reserved it is not updated in X86 e820 table.
> > > > And kexec_file_load iterate system ram in io resource list to find 
> > > > places
> > > > for kernel, initramfs and other stuff. In Michael's case the kexec 
> > > > loaded
> > > > initramfs overwritten the ESRT memory and then the failure happened.
> > > >
> > > > Since kexec_file_load depends on the e820 to be updated, just fix this
> > > > by updating the reserved EFI boot services memory as reserved type in 
> > > > e820.
> > > >
> > > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > > > bypassed in the reservation code path because they are assumed as 
> > > > reserved.
> > > > But the reservation is still needed for multiple kexec reboot.
> > > > And it is the only possible case we come here thus just drop the code
> > > > chunk then everything works without side effects.
> > > >
> > > > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > > > not trigger the problem, but I successfully tested with BGRT instead.
> > > > both kexec_load and kexec_file_load work and kdump works as well.
> > > >
> > > > Signed-off-by: Dave Young 
> > > > ---
> > > >  arch/x86/platform/efi/quirks.c |6 ++
> > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > >
> > > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > > > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > > > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> > > > return;
> > > > }
> > > >
> > > > -   /* No need to reserve regions that will never be freed. */
> > > > -   if (md.attribute & EFI_MEMORY_RUNTIME)
> > > > -   return;
> > > > -
> > > > size += addr % EFI_PAGE_SIZE;
> > > > size = round_up(size, EFI_PAGE_SIZE);
> > > > addr = round_down(addr, EFI_PAGE_SIZE);
> > > > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> > > > early_memunmap(new, new_size);
> > > >
> > > > efi_memmap_install(new_phys, num_entries);
> > > > +   e820__range_update(addr, size, E820_TYPE_RAM, 
> > > > E820_TYPE_RESERVED);
> > > > +   e820__update_table(e820_table);
> > > >  }
> > > >
> > > >  /*
> > > >
> > >
> > > Bisect says this change (commit af1648984828) is triggering a
> > > regression, likely not urgent, in my testing of the new efi_fake_mem=
> > > facility to allow memory to be marked "soft reserved" via the kernel
> > > command line (commit 199c84717612 x86/efi: Add efi_fake_mem support
> > > for EFI_MEMORY_SP). The following command line triggers the crash
> > > signature below:
> > >
> > > efi_fake_mem=4G@9G:0x4,4G@13G:0x4
> > >
> > > However, this command line works ok:
> > >
> > > efi_fake_mem=8G@9G:0x4
> > >
> > > So, something about multiple efi_fake_mem statements interacts badly
> > > with this change. Nothing obvious occurs to me at the moment, I'll
> > > keep debugging, but wanted to highlight this in the meantime in case
> > > someone else sees a deeper issue or the root cause.
> >
> > Still looking, but this failure does not seem to be specific to the
> > "soft reservation" changes. Any update to the efi memmap that pushes
> > it over a page boundary triggers this failure. I.e. I can fix the
> > problem by over-allocating the efi memmap and then page aligning the
> > result. __early_ioremap "should" be handling this case, but it appears
> > something else is messing this up.
> 
> Found it. Neither this patch nor the soft reservation changes are at
> fault, they are just helping to trigger a long standing bug in
> efi_fake_memmap(). Its usage of efi_memmap_split_count() can over
> count the number of splits needed for new entries. Consider the case
> of 2 contiguous fake entries intersecting the end of a single entry.
> The first call to efi_memmap_split_count() determines the resulting
> split will be (old1, new1, old2), the second call determines (old1,
> new2). The result is 2 splits when only 1 is needed to get a result of
> (old1, new1, new2) and the new map ends up with an empty entry.
> efi_memmap_install() interprets an empty entry as start = 0 end =
> 0x and attempts an extra split / copy past the end of
> the new map.
> 
> I'll send a patch 

Re: [PATCH] makedumpfile/s390: Add get_kaslr_offset() for s390x

2019-12-30 Thread lijiang
> Hi Lianbo,
> 
>> -Original Message-
>> 在 2019年12月26日 11:38, lijiang 写道:
>>> Hi, Kazu and Mikhail,
>>>
 Hi Mikhail,

> -Original Message-
> Hi,
>
> On 12.12.2019 17:12, Kazuhito Hagio wrote:
>> Hi Mikhail,
>>
>>> -Original Message-
>>> Hello Kazu,
>>>
>>> I think we can try to generalize the kaslr offset extraction.
>>> I won't speak for other architectures, but for s390 that 
>>> get_kaslr_offset_arm64()
>>> should work fine. The only concern of mine is this TODO statement:
>>>
>>> if (_text <= vaddr && vaddr <= _end) {
>>> DEBUG_MSG("info->kaslr_offset: %lx\n", info->kaslr_offset);
>>> return info->kaslr_offset;
>>> } else {
>>> /*
>>> * TODO: we need to check if it is vmalloc/vmmemmap/module
>>> * address, we will have different offset
>>> */
>>> return 0;
>>> }
>>>
>>> Could you explain this one?
>>
>> Probably it was considered that the check would be needed to support
>> the whole KASLR behavior when get_kaslr_offset_x86_64() was written
>> originally.
>>
>> But in the current makedumpfile for x86_64 and arm64 supporting KASLR,
>> the offset we need is the one for symbol addresses in vmlinux only.
>> As I said below, module symbol addresses are retrieved from vmcore.
>> Other addresses should not be passed to the function for now, as far
>> as I know.
>>
>> So I think the TODO comment is confusing, and it would be better to
>> remove it or change it to something like:
>> /*
>>  * Returns 0 if vaddr does not need the offset to be added,
>>  * e.g. for module address.
>>  */
>>
>> But if s390 uses get_kaslr_offset() in its arch-specific code to
>> adjust addresses other than kernel text address, we might need to
>> modify it for s390, not generalize it.
>
> Currently, s390 doesn't use get_kaslr_offset() in its arch-specific
> code.

 OK, I pushed a patch that generalizes it to my test repository.
 Could you enable s390 to use it and test?
 https://github.com/k-hagio/makedumpfile/tree/add-get_kaslr_offset_general

>>>
>>> I enabled it on s390 as below and tested, it worked.
> 
> Thank you for testing.
> 
It's my pleasure. 

>>>
>>> @@ -1075,7 +1075,7 @@ int is_iomem_phys_addr_s390x(unsigned long addr);
>>>  #define get_phys_base()stub_true()
>>>  #define get_machdep_info() get_machdep_info_s390x()
>>>  #define get_versiondep_info()  stub_true()
>>> -#define get_kaslr_offset(X)stub_false()
>>> +#define get_kaslr_offset(X)get_kaslr_offset_general(X)
>>>  #define vaddr_to_paddr(X)  vaddr_to_paddr_s390x(X)
>>>
>>> But, there is still a problem that needs to be improved. In the 
>>> find_kaslr_offsets(),
>>> the value of SYMBOL(_stext) is always 0(zero) and it is passed to the 
>>> get_kaslr_offset().
>>> For the following code in the get_kaslr_offset_general(), it does not work 
>>> as expected.
>>> ...
>>> if (_text <= vaddr && vaddr <= _end)
>>> return info->kaslr_offset;
>>> else
>>> return 0;
> 
> I don't know why the SYMBOL(_stext) is passed to the get_kaslr_offset() 
> there, but
> since the return value of get_kaslr_offset() is not used in the 
> find_kaslr_offsets(),
> it's meaningless and not harmful. So it is not worth doing 
> READ_SYMBOL(_stext) there
> for now.
> 
Sounds good.

>>
>> In addition, the above code confused me, it will always return 0 on 
>> s390(please refer to my logs).
> 
> The aim of get_kaslr_offset() here is only setting info->kaslr_offset to the 
> value
> from vmcoreinfo for the SYMBOL_INIT() macro.
> (get_kaslr_offset() returns the kaslr offset in the resolve_config_entry().)
> 
Thanks for your explanation, Kazu.

> But yeah, the get_kaslr_offset(SYMBOL(_stext)) is confusing and not good.
> Passing 0 might be a bit better..?
> 
Yes, looks good to me.

Lianbo
> Thanks,
> Kazu
> 
>>
>> Thanks.
>>
>>> ...
>>> Here is my log:
>>> get_kaslr_offset_general: info->kaslr_offset: 67ebc000, _text:10, 
>>> _end:10ba000, vaddr:0
>>>
>>> After applied the following patch, got the expected result.
>>>  int
>>>  find_kaslr_offsets()
>>>  {
>>> @@ -3973,6 +4042,11 @@ find_kaslr_offsets()
>>>  * called this function between open_vmcoreinfo() and
>>>  * close_vmcoreinfo()
>>>  */
>>> +   READ_SYMBOL("_stext", _stext);
>>> +   if (SYMBOL(_stext) == NOT_FOUND_SYMBOL) {
>>> +ERRMSG("Can't get the symbol of _stext.\n");
>>> +goto out;
>>> +   }
>>> get_kaslr_offset(SYMBOL(_stext));
>>>
>>> Here is my log:
>>> get_kaslr_offset_general: info->kaslr_offset: 67ebc000, _text:10, 
>>> _end:10ba000, vaddr:67fbc000
>>>
>>> Basically, before using the value of SYMBOL(_stext), need to ensure that 
>>> the SYMBOL(_stext) is parsed
>>> correctly.

Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

2019-12-30 Thread Dan Williams
On Sat, Dec 28, 2019 at 10:13 PM Dan Williams  wrote:
>
> On Sat, Dec 28, 2019 at 12:54 PM Dan Williams
>  wrote:
> >
> > On Tue, Dec 3, 2019 at 11:53 PM Dave Young  wrote:
> > >
> > > Michael Weiser reported he got below error during a kexec rebooting:
> > > esrt: Unsupported ESRT version 2904149718861218184.
> > >
> > > The ESRT memory stays in EFI boot services data, and it was reserved
> > > in kernel via efi_mem_reserve().  The initial purpose of the reservation
> > > is to reuse the EFI boot services data across kexec reboot. For example
> > > the BGRT image data and some ESRT memory like Michael reported.
> > >
> > > But although the memory is reserved it is not updated in X86 e820 table.
> > > And kexec_file_load iterate system ram in io resource list to find places
> > > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > > initramfs overwritten the ESRT memory and then the failure happened.
> > >
> > > Since kexec_file_load depends on the e820 to be updated, just fix this
> > > by updating the reserved EFI boot services memory as reserved type in 
> > > e820.
> > >
> > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > > bypassed in the reservation code path because they are assumed as 
> > > reserved.
> > > But the reservation is still needed for multiple kexec reboot.
> > > And it is the only possible case we come here thus just drop the code
> > > chunk then everything works without side effects.
> > >
> > > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > > not trigger the problem, but I successfully tested with BGRT instead.
> > > both kexec_load and kexec_file_load work and kdump works as well.
> > >
> > > Signed-off-by: Dave Young 
> > > ---
> > >  arch/x86/platform/efi/quirks.c |6 ++
> > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> > > return;
> > > }
> > >
> > > -   /* No need to reserve regions that will never be freed. */
> > > -   if (md.attribute & EFI_MEMORY_RUNTIME)
> > > -   return;
> > > -
> > > size += addr % EFI_PAGE_SIZE;
> > > size = round_up(size, EFI_PAGE_SIZE);
> > > addr = round_down(addr, EFI_PAGE_SIZE);
> > > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> > > early_memunmap(new, new_size);
> > >
> > > efi_memmap_install(new_phys, num_entries);
> > > +   e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> > > +   e820__update_table(e820_table);
> > >  }
> > >
> > >  /*
> > >
> >
> > Bisect says this change (commit af1648984828) is triggering a
> > regression, likely not urgent, in my testing of the new efi_fake_mem=
> > facility to allow memory to be marked "soft reserved" via the kernel
> > command line (commit 199c84717612 x86/efi: Add efi_fake_mem support
> > for EFI_MEMORY_SP). The following command line triggers the crash
> > signature below:
> >
> > efi_fake_mem=4G@9G:0x4,4G@13G:0x4
> >
> > However, this command line works ok:
> >
> > efi_fake_mem=8G@9G:0x4
> >
> > So, something about multiple efi_fake_mem statements interacts badly
> > with this change. Nothing obvious occurs to me at the moment, I'll
> > keep debugging, but wanted to highlight this in the meantime in case
> > someone else sees a deeper issue or the root cause.
>
> Still looking, but this failure does not seem to be specific to the
> "soft reservation" changes. Any update to the efi memmap that pushes
> it over a page boundary triggers this failure. I.e. I can fix the
> problem by over-allocating the efi memmap and then page aligning the
> result. __early_ioremap "should" be handling this case, but it appears
> something else is messing this up.

Found it. Neither this patch nor the soft reservation changes are at
fault, they are just helping to trigger a long standing bug in
efi_fake_memmap(). Its usage of efi_memmap_split_count() can over
count the number of splits needed for new entries. Consider the case
of 2 contiguous fake entries intersecting the end of a single entry.
The first call to efi_memmap_split_count() determines the resulting
split will be (old1, new1, old2), the second call determines (old1,
new2). The result is 2 splits when only 1 is needed to get a result of
(old1, new1, new2) and the new map ends up with an empty entry.
efi_memmap_install() interprets an empty entry as start = 0 end =
0x and attempts an extra split / copy past the end of
the new map.

I'll send a patch to fix up efi_fake_memmap().

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec