Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-05 Thread Dave Young
On Wed, 5 Jun 2024 at 19:09, Borislav Petkov  wrote:
>
> Moving Ard and Dan to To:
>
> On Wed, Jun 05, 2024 at 10:28:18AM +0800, Dave Young wrote:
> > Ok, thanks!  I think the right way is creating two patches,  one to
> > remove the __efi_memmap_free,
>
> Yap, that
>
>   f0ef6523475f ("efi: Fix efi_memmap_alloc() leaks")
>
> needs revisiting.
>
> So AFAIU, the flow is this:
>
> In a kexec-ed kernel:
>
> 1. efi_arch_mem_reserve() gets called by bgrt, erst, mokvar... whatever
>to hold on to boot services regions for longer otherwise EFI
>"implementations" explode.
>
> 2. On same kexec-ed kernel, we call into kexec_enter_virtual_mode()
>because it needs to get the runtime services regions from the first
>kernel
>
> 3. As part of that call, it'll do
>efi_memmap_init_late->__efi_memmap_init():
>
> if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB))
> __efi_memmap_free(efi.memmap.phys_map,
>
> and the memory which got allocated in step 1 is gone, thus reverting
> what efi_arch_mem_reserve() is trying to fix.
>
> IOW, we need a
>
> EFI_MEMMAP_DO_NOT_TOUCH_MY_MEMORY
>
> flag which'll stop this from happening. But I'd prefer it if Ard decides
> what the right thing to do here is.
>
> > another is  skip efi_arch_mem_reserve when the EFI_MEMORY_RUNTIME bit
> > was set already.
>
> Can that even happen?

Yes, let's say we have two different cases both go through
drivers/firmware/efi/efi-bgrt.c -> efi_mem_reserve ->
efi_arch_mem_reserve
1. normal boot (non kexec-ed)
The bgrt region is reserved and mark as EFI_MEMORY_RUNTIME with a
new efi mem range which is inserted in the memmap, later kexec will
carry over to 2nd kernel (drop those boot service areas without
EFI_MEMORY_RUNTIME)
2. kexec-ed boot
 In the same call path, the previous kernel saved bgrt region has
already set EFI_MEMORY_RUNTIME, but it is re-reserved with a new mem
entry in memmap, this is not necessary and duplicate.   I did not
check the efi boot code if it will de-duplicate the memmap later, but
anyway this is useless and it should be skipped.

Thanks
Dave


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-05 Thread Borislav Petkov
On Wed, Jun 05, 2024 at 10:17:22AM +0200, Ard Biesheuvel wrote:
> I'd argue for the opposite: ideally, the difference between the first
> boot and not-the-first-boot should be abstracted away by the
> 'bootloader' side of kexec as much as possible, so that the tricky
> early startup code doesn't have to be riddled with different code
> paths depending on !kexec vs kexec.

Well, off and on we end up needing to be able to ask whether the current
kernel is kexec-ed. So you need to be able to access that aspect in
kernel code - not in the bootloader. Perhaps read it from the
bootloader, sure.

But see my other mail from just now - it might end up not needing it
after all and I'd prefer if we never ever have to ask that question but
just from staring at EFI code it reminded me that we do need to ask that
question already:

if (efi_setup)
kexec_enter_virtual_mode();
else
__efi_enter_virtual_mode();

*exactly* because of EFI and that virtual_map call nonsense of allowing
it only once.

And we check efi_setup here because that works. But you can't use that
globally. And so on...

> TDX is a good case in point here: rather than add more conditionals,
> I'd urge to remove them so the TDX startup code doesn't have to care
> about the difference at all. If there is anything special that needs
> to be done, it belongs in the kexec implementation of the previous
> kernel.

Sure, but reality is not as easy sometimes.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-05 Thread Borislav Petkov
Moving Ard and Dan to To:

On Wed, Jun 05, 2024 at 10:28:18AM +0800, Dave Young wrote:
> Ok, thanks!  I think the right way is creating two patches,  one to
> remove the __efi_memmap_free,

Yap, that 

  f0ef6523475f ("efi: Fix efi_memmap_alloc() leaks")

needs revisiting.

So AFAIU, the flow is this:

In a kexec-ed kernel:

1. efi_arch_mem_reserve() gets called by bgrt, erst, mokvar... whatever
   to hold on to boot services regions for longer otherwise EFI
   "implementations" explode.

2. On same kexec-ed kernel, we call into kexec_enter_virtual_mode()
   because it needs to get the runtime services regions from the first
   kernel

3. As part of that call, it'll do
   efi_memmap_init_late->__efi_memmap_init():

if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB))
__efi_memmap_free(efi.memmap.phys_map,

and the memory which got allocated in step 1 is gone, thus reverting
what efi_arch_mem_reserve() is trying to fix.

IOW, we need a

EFI_MEMMAP_DO_NOT_TOUCH_MY_MEMORY

flag which'll stop this from happening. But I'd prefer it if Ard decides
what the right thing to do here is.

> another is  skip efi_arch_mem_reserve when the EFI_MEMORY_RUNTIME bit
> was set already.

Can that even happen?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-05 Thread Ard Biesheuvel
On Wed, 5 Jun 2024 at 09:43, Borislav Petkov  wrote:
>
> On Wed, Jun 05, 2024 at 10:53:44AM +0800, Dave Young wrote:
> > It's something good to have but not must for the time being,  also no
> > idea how to save the status across boot, for EFI boot case probably a
> > EFI var can be used;
>
> Yes.
>
> > but how can it be cleared in case of physical boot.  Otherwise
> > probably injecting some kernel parameters, anyway this needs more
> > thinking.
>
> Yeah, this'll need proper analysis whether we can even do that reliably.
>
> We need to increment it only on the kexec reboot paths and clear it on
> the normal reboot paths.
>

I'd argue for the opposite: ideally, the difference between the first
boot and not-the-first-boot should be abstracted away by the
'bootloader' side of kexec as much as possible, so that the tricky
early startup code doesn't have to be riddled with different code
paths depending on !kexec vs kexec.

TDX is a good case in point here: rather than add more conditionals,
I'd urge to remove them so the TDX startup code doesn't have to care
about the difference at all. If there is anything special that needs
to be done, it belongs in the kexec implementation of the previous
kernel.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-05 Thread Borislav Petkov
On Wed, Jun 05, 2024 at 10:53:44AM +0800, Dave Young wrote:
> It's something good to have but not must for the time being,  also no
> idea how to save the status across boot, for EFI boot case probably a
> EFI var can be used;

Yes.

> but how can it be cleared in case of physical boot.  Otherwise
> probably injecting some kernel parameters, anyway this needs more
> thinking.

Yeah, this'll need proper analysis whether we can even do that reliably.

We need to increment it only on the kexec reboot paths and clear it on
the normal reboot paths.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Dave Young
On Wed, 5 Jun 2024 at 02:03, Borislav Petkov  wrote:
>
> On Tue, Jun 04, 2024 at 07:09:56PM +0800, Dave Young wrote:
> > Anyway there is not such a helper for all cases.
>
> But maybe there should be...
>
> This is not the first case where the need arises to be able to say:
>
> if (am I a kexeced kernel)
>
> in code.
>
> Perhaps we should have a global var kexeced or so which gets incremented
> on each kexec-ed kernel, somewhere in very early boot of the kexec-ed
> kernel we do
>
> kexeced++;
>
> and then other code can query it and know whether this is a kexec-ed
> kernel and how many times it got kexec-ed...

It's something good to have but not must for the time being,  also no
idea how to save the status across boot, for EFI boot case probably a
EFI var can be used, but how can it be cleared in case of physical
boot.Otherwise probably injecting some kernel parameters, anyway
this needs more thinking.

>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Dave Young
On Wed, 5 Jun 2024 at 10:09, Kalra, Ashish  wrote:
>
> Hello Dave,
>
> On 6/4/2024 8:58 PM, Dave Young wrote:
> > On Wed, 5 Jun 2024 at 09:52, Dave Young  wrote:
>  ...
>  if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB)) {
>  __efi_memmap_free(efi.memmap.phys_map,
>  efi.memmap.desc_size * 
>  efi.memmap.nr_map, efi.memmap.flags);
>  }
> >>> From your debugging the memmap should not be freed.  This piece of
> >>> code was added in below commit,  added Dan Williams in cc list:
> >>> commit f0ef6523475f18ccd213e22ee593dfd131a2c5ea
> >>> Author: Dan Williams 
> >>> Date:   Mon Jan 13 18:22:44 2020 +0100
> >>>
> >>> efi: Fix efi_memmap_alloc() leaks
> >>>
> >>> With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> >>> updated and replaced multiple times. When that happens a previous
> >>> dynamically allocated efi memory map can be garbage collected. Use the
> >>> new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> >>> allocated memory map is being replaced.
> >>>
> >> Dan, probably those regions should be freed only for "fake" memmap?
> > Ashish, can you comment out the __efi_memmap_free see if it works for
> > you just confirm about the behavior.
>
> Yes, i have already tried and tested that, if i avoid __efi_memmap_free(), 
> then i don't see this memory map corruption.

Ok, thanks!  I think the right way is creating two patches,  one to
remove the __efi_memmap_free, another is  skip efi_arch_mem_reserve
when the EFI_MEMORY_RUNTIME bit was set already.  But the first one
should be the fix for the root cause.

efi fake mem is only for debugging purposes,  the "memleak" mentioned
in commit 0f96a99dab36 should be solved in another way if needed (are
they really leaked? or just not useful anymore)

Anyway this is my opinion, please wait for x86 and efi reviewer's inputs.

>
> Thanks, Ashish
>


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Kalra, Ashish
On 6/4/2024 8:48 PM, Dave Young wrote:

> On Wed, 5 Jun 2024 at 06:36, Kalra, Ashish  wrote:
>> Re-sending as the earlier response got line-wrapped.
>>
>> On 6/3/2024 12:12 PM, Borislav Petkov wrote:
>>> On Mon, Jun 03, 2024 at 12:08:48PM -0500, Kalra, Ashish wrote:
 efi_arch_mem_reserve().
>>> Now it only remains for you to explain why...
>> Here is a detailed explanation of what is causing the EFI memory map 
>> corruption, with added debug logs and memblock debugging enabled:
>>
>> Initially at boot, efi_memblock_x86_reserve_range() does early_memremap() of 
>> the EFI memory map passed as part of setup_data, as the following logs show:
>>
>> ...
>>
>> [ 0.00] efi: in efi_memblock_x86_reserve_range, phys map 0x27fff9110
>> [ 0.00] memblock_reserve: [0x00027fff9110-0x00027fffa12f] 
>> efi_memblock_x86_reserve_range+0x168/0x2a0
>>
>> ...
>>
>> Later, efi_arch_mem_reserve() is invoked, which calls efi_memmap_alloc() 
>> which does memblock_phys_alloc() to insert new EFI memory descriptor into 
>> efi.memap:
>>
>> ...
>>
>> [ 0.733263] memblock_reserve: [0x00027ffcaf80-0x00027ffcbfff] 
>> memblock_alloc_range_nid+0xf1/0x1b0
>> [ 0.734787] efi: efi_arch_mem_reserve, efi phys map 0x27ffcaf80
>>
>> ...
>>
>> Finally, at the end of boot, kexec_enter_virtual_mode() is called.
>>
>> It does mapping of efi regions which were passed via setup_data.
>>
>> So it unregisters the early mem-remapped EFI memmap and installs the new EFI 
>> memory map as below:
>>
>> ( Because of efi_arch_mem_reserve() getting invoked, the new EFI memmap phys 
>> base being remapped now is the memblock allocation done in 
>> efi_arch_mem_reserve()).
>>
>> [ 4.042160] efi: efi memmap phys map 0x27ffcaf80
>>
>> So, kexec_enter_virtual_mode() does the following :
>>
>> if (efi_memmap_init_late(efi.memmap.phys_map, <- refers to the new 
>> EFI memmap phys base allocated via memblock in efi_arch_mem_reserve().
>> efi.memmap.desc_size * efi.memmap.nr_map)) { ...
>>
>> This late init, does a memremap() on this memblock-allocated memory, but 
>> then immediately frees it :
>>
>> drivers/firmware/efi/memmap.c:
>>
>> int __init __efi_memmap_init(struct efi_memory_map_data *data)
>> {
>>
>> ..
>>
>> phys_map = data->phys_map; <- refers to the new EFI memmap phys base 
>> allocated via memblock in efi_arch_mem_reserve().
>>
>> if (data->flags & EFI_MEMMAP_LATE)
>> map.map = memremap(phys_map, data->size, MEMREMAP_WB);
>> ...
>> ...
>> if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB)) {
>> __efi_memmap_free(efi.memmap.phys_map,
>> efi.memmap.desc_size * efi.memmap.nr_map, 
>> efi.memmap.flags);
>> }
> From your debugging the memmap should not be freed.  

Yes, it looks like that it should not be freed, as the new and previous efi 
memory map can be same.

Thanks, Ashish

> This piece of
> code was added in below commit,  added Dan Williams in cc list:
> commit f0ef6523475f18ccd213e22ee593dfd131a2c5ea
> Author: Dan Williams 
> Date:   Mon Jan 13 18:22:44 2020 +0100
>
> efi: Fix efi_memmap_alloc() leaks
>
> With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> updated and replaced multiple times. When that happens a previous
> dynamically allocated efi memory map can be garbage collected. Use the
> new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> allocated memory map is being replaced.
>
>
>> ...
>> map.phys_map = data->phys_map;
>>
>> ...
>>
>> efi.memmap = map;
>>
>> ...
>>
>> This happens as kexec_enter_virtual_mode() can only handle the early mapped 
>> EFI memmap and not the one which is memblock allocated by 
>> efi_arch_mem_reserve(). As seen above this memblock allocated 
>> (EFI_MEMMAP_MEMBLOCK tagged) memory gets freed.
>>
>> This is confirmed by memblock debugging:
>>
>> [ 4.044057] memblock_free_late: [0x00027ffcaf80-0x00027ffcbfff] 
>> __efi_memmap_free+0x66/0x80
>>
>> So while this memory is memremapped, it has also been freed and then it gets 
>> into a use-after-free condition and subsequently gets corrupted.
>>
>> This corruption is seen just before kexec-ing into the new kernel:
>>
>> ...
>> [   11.045522] PEFILE: Unsigned PE binary^M
>> [   11.060801] kexec-bzImage64: efi memmap phys map 0x27ffcaf80^M
>> ...
>> [   11.061220] kexec-bzImage64: mmap entry, type = 11, va = 
>> 0xfffeffc0, pa = 0xffc0, np = 0x400, attr = 0x8001^M
>> [   11.061225] kexec-bzImage64: mmap entry, type = 6, va = 
>> 0xfffeffb04000, pa = 0x7f704000, np = 0x84, attr = 0x800f^M
>> [   11.061228] kexec-bzImage64: mmap entry, type = 4, va = 
>> 0xfffeff70, pa = 0x7f10, np = 0x300, attr = 0x0^M
>> [   11.061231] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np 
>> = 0x0, attr = 0x0^M <- CORRUPTION!!!
>> [   

Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Kalra, Ashish
Hello Dave,

On 6/4/2024 8:58 PM, Dave Young wrote:
> On Wed, 5 Jun 2024 at 09:52, Dave Young  wrote:
 ...
 if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB)) {
 __efi_memmap_free(efi.memmap.phys_map,
 efi.memmap.desc_size * efi.memmap.nr_map, 
 efi.memmap.flags);
 }
>>> From your debugging the memmap should not be freed.  This piece of
>>> code was added in below commit,  added Dan Williams in cc list:
>>> commit f0ef6523475f18ccd213e22ee593dfd131a2c5ea
>>> Author: Dan Williams 
>>> Date:   Mon Jan 13 18:22:44 2020 +0100
>>>
>>> efi: Fix efi_memmap_alloc() leaks
>>>
>>> With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
>>> updated and replaced multiple times. When that happens a previous
>>> dynamically allocated efi memory map can be garbage collected. Use the
>>> new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
>>> allocated memory map is being replaced.
>>>
>> Dan, probably those regions should be freed only for "fake" memmap?
> Ashish, can you comment out the __efi_memmap_free see if it works for
> you just confirm about the behavior.

Yes, i have already tried and tested that, if i avoid __efi_memmap_free(), then 
i don't see this memory map corruption.

Thanks, Ashish


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Dave Young
On Wed, 5 Jun 2024 at 09:52, Dave Young  wrote:
>
> > > ...
> > > if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB)) {
> > > __efi_memmap_free(efi.memmap.phys_map,
> > > efi.memmap.desc_size * efi.memmap.nr_map, 
> > > efi.memmap.flags);
> > > }
> >
> > From your debugging the memmap should not be freed.  This piece of
> > code was added in below commit,  added Dan Williams in cc list:
> > commit f0ef6523475f18ccd213e22ee593dfd131a2c5ea
> > Author: Dan Williams 
> > Date:   Mon Jan 13 18:22:44 2020 +0100
> >
> > efi: Fix efi_memmap_alloc() leaks
> >
> > With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> > updated and replaced multiple times. When that happens a previous
> > dynamically allocated efi memory map can be garbage collected. Use the
> > new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> > allocated memory map is being replaced.
> >
>
> Dan, probably those regions should be freed only for "fake" memmap?

Ashish, can you comment out the __efi_memmap_free see if it works for
you just confirm about the behavior.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Dave Young
> > ...
> > if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB)) {
> > __efi_memmap_free(efi.memmap.phys_map,
> > efi.memmap.desc_size * efi.memmap.nr_map, 
> > efi.memmap.flags);
> > }
>
> From your debugging the memmap should not be freed.  This piece of
> code was added in below commit,  added Dan Williams in cc list:
> commit f0ef6523475f18ccd213e22ee593dfd131a2c5ea
> Author: Dan Williams 
> Date:   Mon Jan 13 18:22:44 2020 +0100
>
> efi: Fix efi_memmap_alloc() leaks
>
> With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> updated and replaced multiple times. When that happens a previous
> dynamically allocated efi memory map can be garbage collected. Use the
> new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> allocated memory map is being replaced.
>

Dan, probably those regions should be freed only for "fake" memmap?


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Dave Young
On Wed, 5 Jun 2024 at 06:36, Kalra, Ashish  wrote:
>
> Re-sending as the earlier response got line-wrapped.
>
> On 6/3/2024 12:12 PM, Borislav Petkov wrote:
> > On Mon, Jun 03, 2024 at 12:08:48PM -0500, Kalra, Ashish wrote:
> >> efi_arch_mem_reserve().
> > Now it only remains for you to explain why...
>
> Here is a detailed explanation of what is causing the EFI memory map 
> corruption, with added debug logs and memblock debugging enabled:
>
> Initially at boot, efi_memblock_x86_reserve_range() does early_memremap() of 
> the EFI memory map passed as part of setup_data, as the following logs show:
>
> ...
>
> [ 0.00] efi: in efi_memblock_x86_reserve_range, phys map 0x27fff9110
> [ 0.00] memblock_reserve: [0x00027fff9110-0x00027fffa12f] 
> efi_memblock_x86_reserve_range+0x168/0x2a0
>
> ...
>
> Later, efi_arch_mem_reserve() is invoked, which calls efi_memmap_alloc() 
> which does memblock_phys_alloc() to insert new EFI memory descriptor into 
> efi.memap:
>
> ...
>
> [ 0.733263] memblock_reserve: [0x00027ffcaf80-0x00027ffcbfff] 
> memblock_alloc_range_nid+0xf1/0x1b0
> [ 0.734787] efi: efi_arch_mem_reserve, efi phys map 0x27ffcaf80
>
> ...
>
> Finally, at the end of boot, kexec_enter_virtual_mode() is called.
>
> It does mapping of efi regions which were passed via setup_data.
>
> So it unregisters the early mem-remapped EFI memmap and installs the new EFI 
> memory map as below:
>
> ( Because of efi_arch_mem_reserve() getting invoked, the new EFI memmap phys 
> base being remapped now is the memblock allocation done in 
> efi_arch_mem_reserve()).
>
> [ 4.042160] efi: efi memmap phys map 0x27ffcaf80
>
> So, kexec_enter_virtual_mode() does the following :
>
> if (efi_memmap_init_late(efi.memmap.phys_map, <- refers to the new 
> EFI memmap phys base allocated via memblock in efi_arch_mem_reserve().
> efi.memmap.desc_size * efi.memmap.nr_map)) { ...
>
> This late init, does a memremap() on this memblock-allocated memory, but then 
> immediately frees it :
>
> drivers/firmware/efi/memmap.c:
>
> int __init __efi_memmap_init(struct efi_memory_map_data *data)
> {
>
> ..
>
> phys_map = data->phys_map; <- refers to the new EFI memmap phys base 
> allocated via memblock in efi_arch_mem_reserve().
>
> if (data->flags & EFI_MEMMAP_LATE)
> map.map = memremap(phys_map, data->size, MEMREMAP_WB);
> ...
> ...
> if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB)) {
> __efi_memmap_free(efi.memmap.phys_map,
> efi.memmap.desc_size * efi.memmap.nr_map, 
> efi.memmap.flags);
> }

>From your debugging the memmap should not be freed.  This piece of
code was added in below commit,  added Dan Williams in cc list:
commit f0ef6523475f18ccd213e22ee593dfd131a2c5ea
Author: Dan Williams 
Date:   Mon Jan 13 18:22:44 2020 +0100

efi: Fix efi_memmap_alloc() leaks

With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
updated and replaced multiple times. When that happens a previous
dynamically allocated efi memory map can be garbage collected. Use the
new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
allocated memory map is being replaced.


>
> ...
> map.phys_map = data->phys_map;
>
> ...
>
> efi.memmap = map;
>
> ...
>
> This happens as kexec_enter_virtual_mode() can only handle the early mapped 
> EFI memmap and not the one which is memblock allocated by 
> efi_arch_mem_reserve(). As seen above this memblock allocated 
> (EFI_MEMMAP_MEMBLOCK tagged) memory gets freed.
>
> This is confirmed by memblock debugging:
>
> [ 4.044057] memblock_free_late: [0x00027ffcaf80-0x00027ffcbfff] 
> __efi_memmap_free+0x66/0x80
>
> So while this memory is memremapped, it has also been freed and then it gets 
> into a use-after-free condition and subsequently gets corrupted.
>
> This corruption is seen just before kexec-ing into the new kernel:
>
> ...
> [   11.045522] PEFILE: Unsigned PE binary^M
> [   11.060801] kexec-bzImage64: efi memmap phys map 0x27ffcaf80^M
> ...
> [   11.061220] kexec-bzImage64: mmap entry, type = 11, va = 
> 0xfffeffc0, pa = 0xffc0, np = 0x400, attr = 0x8001^M
> [   11.061225] kexec-bzImage64: mmap entry, type = 6, va = 
> 0xfffeffb04000, pa = 0x7f704000, np = 0x84, attr = 0x800f^M
> [   11.061228] kexec-bzImage64: mmap entry, type = 4, va = 
> 0xfffeff70, pa = 0x7f10, np = 0x300, attr = 0x0^M
> [   11.061231] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np 
> = 0x0, attr = 0x0^M <- CORRUPTION!!!
> [   11.061234] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np 
> = 0x0, attr = 0x0^M
> [   11.061236] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np 
> = 0x0, attr = 0x0^M
> [   11.061239] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np 
> = 0x0, attr 

Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Kalra, Ashish
Re-sending as the earlier response got line-wrapped.

On 6/3/2024 12:12 PM, Borislav Petkov wrote:
> On Mon, Jun 03, 2024 at 12:08:48PM -0500, Kalra, Ashish wrote:
>> efi_arch_mem_reserve().
> Now it only remains for you to explain why...

Here is a detailed explanation of what is causing the EFI memory map 
corruption, with added debug logs and memblock debugging enabled:

Initially at boot, efi_memblock_x86_reserve_range() does early_memremap() of 
the EFI memory map passed as part of setup_data, as the following logs show:

...

[ 0.00] efi: in efi_memblock_x86_reserve_range, phys map 0x27fff9110 
[ 0.00] memblock_reserve: [0x00027fff9110-0x00027fffa12f] 
efi_memblock_x86_reserve_range+0x168/0x2a0

...

Later, efi_arch_mem_reserve() is invoked, which calls efi_memmap_alloc() which 
does memblock_phys_alloc() to insert new EFI memory descriptor into efi.memap:

...

[ 0.733263] memblock_reserve: [0x00027ffcaf80-0x00027ffcbfff] 
memblock_alloc_range_nid+0xf1/0x1b0 
[ 0.734787] efi: efi_arch_mem_reserve, efi phys map 0x27ffcaf80

...

Finally, at the end of boot, kexec_enter_virtual_mode() is called.

It does mapping of efi regions which were passed via setup_data.

So it unregisters the early mem-remapped EFI memmap and installs the new EFI 
memory map as below:

( Because of efi_arch_mem_reserve() getting invoked, the new EFI memmap phys 
base being remapped now is the memblock allocation done in 
efi_arch_mem_reserve()).

[ 4.042160] efi: efi memmap phys map 0x27ffcaf80

So, kexec_enter_virtual_mode() does the following :

if (efi_memmap_init_late(efi.memmap.phys_map, <- refers to the new EFI 
memmap phys base allocated via memblock in efi_arch_mem_reserve().
efi.memmap.desc_size * efi.memmap.nr_map)) { ...

This late init, does a memremap() on this memblock-allocated memory, but then 
immediately frees it :

drivers/firmware/efi/memmap.c:

int __init __efi_memmap_init(struct efi_memory_map_data *data) 
{

..

phys_map = data->phys_map; <- refers to the new EFI memmap phys base 
allocated via memblock in efi_arch_mem_reserve().

if (data->flags & EFI_MEMMAP_LATE) 
map.map = memremap(phys_map, data->size, MEMREMAP_WB);
... 
... 
if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB)) { 
__efi_memmap_free(efi.memmap.phys_map, 
efi.memmap.desc_size * efi.memmap.nr_map, 
efi.memmap.flags); 
}

...
map.phys_map = data->phys_map;

...

efi.memmap = map;

...

This happens as kexec_enter_virtual_mode() can only handle the early mapped EFI 
memmap and not the one which is memblock allocated by efi_arch_mem_reserve(). 
As seen above this memblock allocated (EFI_MEMMAP_MEMBLOCK tagged) memory gets 
freed.

This is confirmed by memblock debugging:

[ 4.044057] memblock_free_late: [0x00027ffcaf80-0x00027ffcbfff] 
__efi_memmap_free+0x66/0x80

So while this memory is memremapped, it has also been freed and then it gets 
into a use-after-free condition and subsequently gets corrupted.

This corruption is seen just before kexec-ing into the new kernel:

...
[   11.045522] PEFILE: Unsigned PE binary^M
[   11.060801] kexec-bzImage64: efi memmap phys map 0x27ffcaf80^M
...
[   11.061220] kexec-bzImage64: mmap entry, type = 11, va = 0xfffeffc0, 
pa = 0xffc0, np = 0x400, attr = 0x8001^M
[   11.061225] kexec-bzImage64: mmap entry, type = 6, va = 0xfffeffb04000, 
pa = 0x7f704000, np = 0x84, attr = 0x800f^M
[   11.061228] kexec-bzImage64: mmap entry, type = 4, va = 0xfffeff70, 
pa = 0x7f10, np = 0x300, attr = 0x0^M
[   11.061231] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M <- CORRUPTION!!!
[   11.061234] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061236] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061239] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061241] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061243] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061245] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061248] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061250] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061252] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061255] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061257] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 
0x0, attr = 0x0^M
[   11.061259] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 

Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Kalra, Ashish
On 6/3/2024 12:12 PM, Borislav Petkov wrote:

> On Mon, Jun 03, 2024 at 12:08:48PM -0500, Kalra, Ashish wrote:
>> efi_arch_mem_reserve().
> Now it only remains for you to explain why...

Here is a detailed explanation of what is causing the EFI memory map 
corruption, with added debug logs and memblock debugging enabled:

Initially at boot, efi_memblock_x86_reserve_range() does early_memremap() of 
the EFI memory map passed as part of setup_data, as the following logs show:

...

[ 0.00] efi: in efi_memblock_x86_reserve_range, phys map 0x27fff9110 [ 
0.00] memblock_reserve: [0x00027fff9110-0x00027fffa12f] 
efi_memblock_x86_reserve_range+0x168/0x2a0

...

Later, efi_arch_mem_reserve() is invoked, which calls efi_memmap_alloc() which 
does memblock_phys_alloc() to insert new EFI memory descriptor into efi.memap:

...

[ 0.733263] memblock_reserve: [0x00027ffcaf80-0x00027ffcbfff] 
memblock_alloc_range_nid+0xf1/0x1b0 [ 0.734787] efi: efi_arch_mem_reserve, efi 
phys map 0x27ffcaf80

...

Finally, at the end of boot, kexec_enter_virtual_mode() is called.

It does mapping of efi regions which were passed via setup_data.

So it unregisters the early mem-remapped EFI memmap and installs the new EFI 
memory map as below:

( Because of efi_arch_mem_reserve() getting invoked, the new EFI memmap phys 
base being remapped now is the memblock allocation done in 
efi_arch_mem_reserve()).

[ 4.042160] efi: efi memmap phys map 0x27ffcaf80

So, kexec_enter_virtual_mode() does the following :

if (efi_memmap_init_late(efi.memmap.phys_map, < refers to the new EFI 
memmap phys base allocated via memblock in efi_arch_mem_reserve(). 
efi.memmap.desc_size * efi.memmap.nr_map)) { ...

This late init, does a memremap() on this memblock-allocated memory, but then 
immediately frees it :

drivers/firmware/efi/memmap.c:

*/ int __init __efi_memmap_init(struct efi_memory_map_data *data) {

..

phys_map = data->phys_map; <--- refers to the new EFI 
memmap phys base allocated via memblock in efi_arch_mem_reserve().

if (data->flags & EFI_MEMMAP_LATE) map.map = memremap(phys_map, data->size, 
MEMREMAP_WB); ... ... if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | 
EFI_MEMMAP_SLAB)) { __efi_memmap_free(efi.memmap.phys_map, efi.memmap.desc_size 
* efi.memmap.nr_map, efi.memmap.flags); }

map.phys_map = data->phys_map;

...

efi.memmap = map;

...

This happens as kexec_enter_virtual_mode() can only handle the early mapped EFI 
memmap and not the one which is memblock allocated by efi_arch_mem_reserve(). 
As seen above this memblock allocated (EFI_MEMMAP_MEMBLOCK tagged) memory gets 
freed.

This is confirmed by memblock debugging:

[ 4.044057] memblock_free_late: [0x00027ffcaf80-0x00027ffcbfff] 
__efi_memmap_free+0x66/0x80

So while this memory is memremapped, it has also been freed and then it gets 
into a use-after-free condition and subsequently gets corrupted.

This corruption is seen just before kexec-ing into the new kernel:

...

[ 11.045522] PEFILE: Unsigned PE binary^M [ 11.060801] kexec-bzImage64: efi 
memmap phys map 0x27ffcaf80 ... [ 11.061220] kexec-bzImage64: mmap entry, type 
= 11, va = 0xfffeffc0, pa = 0xffc0, np = 0x400, attr = 
0x8001^M [ 11.061225] kexec-bzImage64: mmap entry, type = 6, va = 
0xfffeffb04000, pa = 0x7f704000, np = 0x84, attr = 0x800f^M [ 
11.061228] kexec-bzImage64: mmap entry, type = 4, va = 0xfffeff70, pa = 
0x7f10, np = 0x300, attr = 0x0^M [ 11.061231] kexec-bzImage64: mmap entry, 
type = 0, va = 0x0, pa = 0x0, np = 0x0, attr = 0x0^M < 
CORRUPTED!!! [ 11.061234] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 
0x0, np = 0x0, attr = 0x0^M [ 11.061236] kexec-bzImage64: mmap entry, type = 0, 
va = 0x0, pa = 0x0, np = 0x0, attr = 0x0^M [ 11.061239] kexec-bzImage64: mmap 
entry, type = 0, va = 0x0, pa = 0x0, np = 0x0, attr = 0x0^M [ 11.061241] 
kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 0x0,
attr = 0x0^M [ 11.061243] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 
0x0, np = 0x0, attr = 0x0^M [ 11.061245] kexec-bzImage64: mmap entry, type = 0, 
va = 0x0, pa = 0x0, np = 0x0, attr = 0x0^M [ 11.061248] kexec-bzImage64: mmap 
entry, type = 0, va = 0x0, pa = 0x0, np = 0x0, attr = 0x0^M [ 11.061250] 
kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 0x0, attr = 
0x0^M [ 11.061252] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, 
np = 0x0, attr = 0x0^M [ 11.061255] kexec-bzImage64: mmap entry, type = 0, va = 
0x0, pa = 0x0, np = 0x0, attr = 0x0^M [ 11.061257] kexec-bzImage64: mmap entry, 
type = 0, va = 0x0, pa = 0x0, np = 0x0, attr = 0x0^M [ 11.061259] 
kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, np = 0x0, attr = 
0x0^M [ 11.061262] kexec-bzImage64: mmap entry, type = 0, va = 0x0, pa = 0x0, 
np = 0x0, attr = 0x0^M [ 11.061264] kexec-bzImage64: mmap entry, type = 0, va = 
0x0, pa = 0x0, np = 0x0, attr = 0x0^M [ 11.061266]

Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Borislav Petkov
On Tue, Jun 04, 2024 at 07:09:56PM +0800, Dave Young wrote:
> Anyway there is not such a helper for all cases.

But maybe there should be...

This is not the first case where the need arises to be able to say:

if (am I a kexeced kernel)

in code.

Perhaps we should have a global var kexeced or so which gets incremented
on each kexec-ed kernel, somewhere in very early boot of the kexec-ed
kernel we do

kexeced++;

and then other code can query it and know whether this is a kexec-ed
kernel and how many times it got kexec-ed...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Dave Young
On Tue, 4 Jun 2024 at 17:44, Borislav Petkov  wrote:
>
> On Tue, Jun 04, 2024 at 09:23:58AM +0800, Dave Young wrote:
> > kexec_in_progress is only for checking if this is in a reboot (kexec) code 
> > path.
> > But eif_mem_reserve is only called during the boot time so checking
> > kexec_in_progress is meaningless here.
> > current_kernel_is_booted_via_kexec != is_rebooting_with_kexec
>
> That's exactly what I wanna check: whether this is a kexec-ed kernel. Or
> is there a better helper for that?

No general way to check if it is a kexec-ed kernel or not,  for x86
one can check the efi_setup as Ashish's original patch did, as the
kexec booted kernel (efi boot) will have efi setup_data passed in.

Otherwise there is a type_of_loader field for x86 boot protocol,
kexec-tools is 0x0D, the kexec_file_load also uses this.  But adding
the type_of_loader was only added in kexec-tools code when Yinghai
worked on the kexec-tools bzImage64 load, so older kexec-tools will
not set this field.  Anyway the in-kernel kexec_file_load code for x86
added 0x0D as loader type from the beginning.

Anyway there is not such a helper for all cases.

>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-04 Thread Borislav Petkov
On Tue, Jun 04, 2024 at 09:23:58AM +0800, Dave Young wrote:
> kexec_in_progress is only for checking if this is in a reboot (kexec) code 
> path.
> But eif_mem_reserve is only called during the boot time so checking
> kexec_in_progress is meaningless here.
> current_kernel_is_booted_via_kexec != is_rebooting_with_kexec

That's exactly what I wanna check: whether this is a kexec-ed kernel. Or
is there a better helper for that?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Dave Young
On Mon, 3 Jun 2024 at 23:33, Mike Rapoport  wrote:
>
> On Mon, Jun 03, 2024 at 04:46:39PM +0200, Borislav Petkov wrote:
> > On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote:
> > > If we skip efi_arch_mem_reserve() (which should probably be anyway skipped
> > > for kexec case), then for kexec boot, EFI memmap is memremapped in the 
> > > same
> > > virtual address as the first kernel and not the allocated memblock 
> > > address.
> >
> > Are you saying that we should simply do
> >
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index fdf07dd6f459..410cb0743289 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -577,6 +577,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 size)
> >   if (WARN_ON_ONCE(efi_enabled(EFI_PARAVIRT)))
> >   return;
> >
> > + if (kexec_in_progress)
> > + return;
> > +

kexec_in_progress is only for checking if this is in a reboot (kexec) code path.
But eif_mem_reserve is only called during the boot time so checking
kexec_in_progress is meaningless here.
current_kernel_is_booted_via_kexec != is_rebooting_with_kexec

The code change below in the patch looks good to me, but I'm not sure
what caused the memory corruption, it indeed worth some more digging,
maybe SEV/SNP related.
+   if (md.attribute & EFI_MEMORY_RUNTIME)
+   return;

Thanks
Dave


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Mike Rapoport
On Mon, Jun 03, 2024 at 11:56:01AM -0500, Kalra, Ashish wrote:
> On 6/3/2024 10:29 AM, Mike Rapoport wrote:
> 
> > On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote:
> > > On 6/3/2024 8:39 AM, Mike Rapoport wrote:
> > > 
> > > > On Mon, Jun 03, 2024 at 08:06:56AM -0500, Kalra, Ashish wrote:
> > > > > On 6/3/2024 3:56 AM, Borislav Petkov wrote
> > > > > 
> > > > > > > EFI memory map and due to early allocation it uses memblock 
> > > > > > > allocation.
> > > > > > > 
> > > > > > > Later during boot, efi_enter_virtual_mode() calls 
> > > > > > > kexec_enter_virtual_mode()
> > > > > > > in case of a kexec-ed kernel boot.
> > > > > > > 
> > > > > > > This function kexec_enter_virtual_mode() installs the new EFI 
> > > > > > > memory map by
> > > > > > > calling efi_memmap_init_late() which remaps the efi_memmap 
> > > > > > > physically allocated
> > > > > > > in efi_arch_mem_reserve(), but this remapping is still using 
> > > > > > > memblock allocation.
> > > > > > > 
> > > > > > > Subsequently, when memblock is freed later in boot flow, this 
> > > > > > > remapped
> > > > > > > efi_memmap will have random corruption (similar to a 
> > > > > > > use-after-free scenario).
> > > > > > > 
> > > > > > > The corrupted EFI memory map is then passed to the next kexec-ed 
> > > > > > > kernel
> > > > > > > which causes a panic when trying to use the corrupted EFI memory 
> > > > > > > map.
> > > > > > This sounds fishy: memblock allocated memory is not freed later in 
> > > > > > the
> > > > > > boot - it remains reserved. Only free memory is freed from memblock 
> > > > > > to
> > > > > > the buddy allocator.
> > > > > > 
> > > > > > Or is the problem that memblock-allocated memory cannot be 
> > > > > > memremapped
> > > > > > because *raisins*?
> > > > > This is what seems to be happening:
> > > > > 
> > > > > efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for
> > > > > EFI memory map and due to early allocation it uses memblock 
> > > > > allocation.
> > > > > 
> > > > > And later efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
> > > > > in case of a kexec-ed kernel boot.
> > > > > 
> > > > > This function kexec_enter_virtual_mode() installs the new EFI memory 
> > > > > map by
> > > > > calling efi_memmap_init_late() which does memremap() on 
> > > > > memblock-allocated memory.
> > > > Does the issue happen only with SNP?
> > > This is observed under SNP as efi_arch_mem_reserve() is only being called
> > > with SNP enabled and then efi_arch_mem_reserve() allocates EFI memory map
> > > using memblock.
> > I don't see how efi_arch_mem_reserve() is only called with SNP. What did I
> > miss?
> 
> This is the call stack for efi_arch_mem_reserve():
> 
> [0.310010]  efi_arch_mem_reserve+0xb1/0x220
> [0.311382]  efi_mem_reserve+0x36/0x60
> [0.311973]  efi_bgrt_init+0x17d/0x1a0
> [0.313265]  acpi_parse_bgrt+0x12/0x20
> [0.313858]  acpi_table_parse+0x77/0xd0
> [0.314463]  acpi_boot_init+0x362/0x630
> [0.315069]  setup_arch+0xa88/0xf80
> [0.315629]  start_kernel+0x68/0xa90
> [0.316194]  x86_64_start_reservations+0x1c/0x30
> [0.316921]  x86_64_start_kernel+0xbf/0x110
> [0.317582]  common_startup_64+0x13e/0x141
> 
> So, probably it is being invoked specifically for AMD platform ?

AFAIU, efi_bgrt_init() can be called for any x86 platform, with or without
encryption. 
So if my understating is correct, efi_arch_mem_reserve() will be called with SNP
disabled as well. And if kexec works ok without SNP but fails with SNP this
may give as a clue to the root cause of the failure.
 
> > > If we skip efi_arch_mem_reserve() (which should probably be anyway skipped
> > > for kexec case), then for kexec boot, EFI memmap is memremapped in the 
> > > same
> > > virtual address as the first kernel and not the allocated memblock 
> > > address.
> > Maybe we should skip efi_arch_mem_reserve() for kexec case, but I think we
> > still need to understand what's causing memory corruption.
> 
> When, efi_arch_mem_reserve() allocates memory for EFI memory map using
> memblock and then later in boot, kexec_enter_virtual_mode() does memremap on
> this memblock allocated memory, subsequently after this i see EFI memory map
> corruption, so are there are any issues doing memremap on memblock-allocated
> memory ?

memblock-allocated memory is just RAM, so my take is that memremap() cannot
figure out the encryption bits properly.

You can check if there are issues with memrmapp()ing memblock-allocated
memory by sticking memblock_phys_alloc() somewhere, filling that memory with a
pattern and then calling memremap(addr, size, MEMREMAP_WB) and checking if
the pattern is still there.
 
> Thanks, Ashish
> 
> > > > I didn't really dig, but my theory would be that it has something to do
> > > > with arch_memremap_can_ram_remap() in arch/x86/mm/ioremap.c
> > > > > Thanks, Ashish

-- 
Sincerely yours,
Mike.

___
kexec mailing 

Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Kalra, Ashish

Re-sending this, last response got garbled.

On 6/3/2024 11:48 AM, Kalra, Ashish wrote:

On 6/3/2024 10:31 AM, Mike Rapoport wrote:


On Mon, Jun 03, 2024 at 04:46:39PM +0200, Borislav Petkov wrote:

On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote:
If we skip efi_arch_mem_reserve() (which should probably be anyway 
skipped
for kexec case), then for kexec boot, EFI memmap is memremapped in 
the same
virtual address as the first kernel and not the allocated memblock 
address.

Are you saying that we should simply do

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index fdf07dd6f459..410cb0743289 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -577,6 +577,9 @@ void __init efi_mem_reserve(phys_addr_t addr, 
u64 size)

  if (WARN_ON_ONCE(efi_enabled(EFI_PARAVIRT)))
  return;
  +    if (kexec_in_progress)
+    return;
+
  if (!memblock_is_region_reserved(addr, size))
  memblock_reserve(addr, size);
  and skip that whole call?

I think Ashish suggested rather

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index fdf07dd6f459..eccc10ab15a4 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -580,6 +580,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 
size)

  if (!memblock_is_region_reserved(addr, size))
  memblock_reserve(addr, size);
  +    if (kexec_in_progress)
+    return;
+
  /*
   * Some architectures (x86) reserve all boot services ranges
   * until efi_free_boot_services() because of buggy firmware
Yes, something similar as above, as efi_mem_reserve() is used to 
reserve boot service memory and is not necessary for kexec boot.


So, Dave Young (dyo...@redhat.com) had suggested that we skip 
efi_arch_mem_reserve() for kexec by checking the set 
EFI_MEMORY_RUNTIME attribute as below:



diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f0cc00032751..6f398c59278a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -255,15 +255,39 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, 
u64 size)

    struct efi_memory_map_data data = { 0 };
    struct efi_mem_range mr;
    efi_memory_desc_t md;
-   int num_entries;
+   int num_entries, ret;
    void *new;

-   if (efi_mem_desc_lookup(addr, ) ||
-   md.type != EFI_BOOT_SERVICES_DATA) {
+   /*
+    * efi_mem_reserve() is used to reserve boot service memory, eg. 
bgrt,
+    * but it is not neccasery for kexec, as there are no boot 
services in

+    * kexec reboot at all after the first kernel's ExitBootServices().
+    *
+    * Therefore, skip efi_mem_reserve for kexec booting by checking the
+    * EFI_MEMORY_RUNTIME attribute which indicates boot service memory
+    * ranges reserved by the first kernel using efi_mem_reserve and 
marked

+    * with EFI_MEMORY_RUNTIME attribute.
+    */
+
+   ret = efi_mem_desc_lookup(addr, );

+   if (ret) {

    pr_err("Failed to lookup EFI memory descriptor for 
%pa\n", );

    return;
    }

+   if (md.type != EFI_BOOT_SERVICES_DATA) {
+   pr_err("Skip reserving non EFI Boot Service Data memory 
for %pa\n", );

+   return;
+   }
+
+   /* Kexec copied the efi memmap from the first kernel, thus skip 
the case */

+   if (md.attribute & EFI_MEMORY_RUNTIME)
+   return;
+
    if (addr + size > md.phys_addr + (md.num_pages << 
EFI_PAGE_SHIFT)) {
    pr_err("Region spans EFI memory descriptors, %pa\n", 
);

    return;

Thanks, Ashish

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Borislav Petkov
On Mon, Jun 03, 2024 at 12:08:48PM -0500, Kalra, Ashish wrote:
> efi_arch_mem_reserve().

Now it only remains for you to explain why...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Borislav Petkov
On Mon, Jun 03, 2024 at 12:05:45PM -0500, Kalra, Ashish wrote:
> Re-sending this, last response got garbled.

And this got linewrapped.

Thunderbird section in Documentation/process/email-clients.rst.

> index f0cc00032751..6f398c59278a 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -255,15 +255,39 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64
> size)

^^^

>     struct efi_memory_map_data data = { 0 };
>     struct efi_mem_range mr;
>     efi_memory_desc_t md;
> -   int num_entries;
> +   int num_entries, ret;
>     void *new;
> 
> -   if (efi_mem_desc_lookup(addr, ) ||
> -   md.type != EFI_BOOT_SERVICES_DATA) {
> +   /*
> +    * efi_mem_reserve() is used to reserve boot service memory, eg.
> bgrt,

^^^

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Kalra, Ashish

On 6/3/2024 11:57 AM, Borislav Petkov wrote:


On Mon, Jun 03, 2024 at 11:48:03AM -0500, Kalra, Ashish wrote:

Yes, something similar as above, as efi_mem_reserve() is used to reserve
boot service memory and is not necessary for kexec boot.

So, Dave Young (dyo...@redhat.com) had suggested that we skip
efi_arch_mem_reserve() for kexec by checking the set EFI_MEMORY_RUNTIME
attribute as below:a

efi_arch_mem_reserve() or efi_mem_reserve() altogether?


efi_arch_mem_reserve().

Thanks, Ashish



Btw, that below got really gibberished by your mail client. Snipped.



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Kalra, Ashish

On 6/3/2024 10:31 AM, Mike Rapoport wrote:


On Mon, Jun 03, 2024 at 04:46:39PM +0200, Borislav Petkov wrote:

On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote:

If we skip efi_arch_mem_reserve() (which should probably be anyway skipped
for kexec case), then for kexec boot, EFI memmap is memremapped in the same
virtual address as the first kernel and not the allocated memblock address.

Are you saying that we should simply do

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index fdf07dd6f459..410cb0743289 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -577,6 +577,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 size)
if (WARN_ON_ONCE(efi_enabled(EFI_PARAVIRT)))
return;
  
+	if (kexec_in_progress)

+   return;
+
if (!memblock_is_region_reserved(addr, size))
memblock_reserve(addr, size);
  
and skip that whole call?

I think Ashish suggested rather

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index fdf07dd6f459..eccc10ab15a4 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -580,6 +580,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 size)
if (!memblock_is_region_reserved(addr, size))
memblock_reserve(addr, size);
  
+	if (kexec_in_progress)

+   return;
+
/*
 * Some architectures (x86) reserve all boot services ranges
 * until efi_free_boot_services() because of buggy firmware
  
Yes, something similar as above, as efi_mem_reserve() is used to reserve 
boot service memory and is not necessary for kexec boot.


So, Dave Young (dyo...@redhat.com) had suggested that we skip 
efi_arch_mem_reserve() for kexec by checking the set EFI_MEMORY_RUNTIME 
attribute as below:


diff 
 
--git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c 
index f0cc00032751..6f398c59278a 100644 --- 
a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ 
-255,15 +255,39 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, 
u64 size)   	struct efi_memory_map_data data = { 0 };

struct efi_mem_range mr;
efi_memory_desc_t md;
- int num_entries; + int num_entries, ret;  void *new;
 
- if (efi_mem_desc_lookup(addr, ) || - md.type != 
EFI_BOOT_SERVICES_DATA) { + /* + * efi_mem_reserve() is used to reserve 
boot service memory, eg. bgrt, + * but it is not neccasery for kexec, as 
there are no boot services in + * kexec reboot at all after the first 
kernel's ExitBootServices(). + * + * Therefore, skip efi_mem_reserve for 
kexec booting by checking the + * EFI_MEMORY_RUNTIME attribute which 
indicates boot service memory + * ranges reserved by the first kernel 
using efi_mem_reserve and marked + * with EFI_MEMORY_RUNTIME attribute. 
+ */ + + ret = efi_mem_desc_lookup(addr, ); + if (ret) {   		pr_err("Failed to lookup EFI memory descriptor for %pa\n", );

return;
}
 
+ if (md.type != EFI_BOOT_SERVICES_DATA) { + pr_err("Skip reserving non 
EFI Boot Service Data memory for %pa\n", ); + return; + } + + /* 
Kexec copied the efi memmap from the first kernel, thus skip the case */ 
+ if (md.attribute & EFI_MEMORY_RUNTIME) + return; +   	if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {

pr_err("Region spans EFI memory descriptors, %pa\n", );
return;
--
2.34.1


--
Regards/Gruss,
 Boris.

https://people.kernel.org/tglx/notes-about-netiquette


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Borislav Petkov
On Mon, Jun 03, 2024 at 11:48:03AM -0500, Kalra, Ashish wrote:
> Yes, something similar as above, as efi_mem_reserve() is used to reserve
> boot service memory and is not necessary for kexec boot.
> 
> So, Dave Young (dyo...@redhat.com) had suggested that we skip
> efi_arch_mem_reserve() for kexec by checking the set EFI_MEMORY_RUNTIME
> attribute as below:a

efi_arch_mem_reserve() or efi_mem_reserve() altogether?

Btw, that below got really gibberished by your mail client. Snipped.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Kalra, Ashish

On 6/3/2024 10:29 AM, Mike Rapoport wrote:


On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote:

On 6/3/2024 8:39 AM, Mike Rapoport wrote:


On Mon, Jun 03, 2024 at 08:06:56AM -0500, Kalra, Ashish wrote:

On 6/3/2024 3:56 AM, Borislav Petkov wrote


EFI memory map and due to early allocation it uses memblock allocation.

Later during boot, efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
in case of a kexec-ed kernel boot.

This function kexec_enter_virtual_mode() installs the new EFI memory map by
calling efi_memmap_init_late() which remaps the efi_memmap physically allocated
in efi_arch_mem_reserve(), but this remapping is still using memblock 
allocation.

Subsequently, when memblock is freed later in boot flow, this remapped
efi_memmap will have random corruption (similar to a use-after-free scenario).

The corrupted EFI memory map is then passed to the next kexec-ed kernel
which causes a panic when trying to use the corrupted EFI memory map.

This sounds fishy: memblock allocated memory is not freed later in the
boot - it remains reserved. Only free memory is freed from memblock to
the buddy allocator.

Or is the problem that memblock-allocated memory cannot be memremapped
because *raisins*?

This is what seems to be happening:

efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for
EFI memory map and due to early allocation it uses memblock allocation.

And later efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
in case of a kexec-ed kernel boot.

This function kexec_enter_virtual_mode() installs the new EFI memory map by
calling efi_memmap_init_late() which does memremap() on memblock-allocated 
memory.

Does the issue happen only with SNP?

This is observed under SNP as efi_arch_mem_reserve() is only being called
with SNP enabled and then efi_arch_mem_reserve() allocates EFI memory map
using memblock.

I don't see how efi_arch_mem_reserve() is only called with SNP. What did I
miss?
  


This is the call stack for efi_arch_mem_reserve():

[0.310010]  efi_arch_mem_reserve+0xb1/0x220
[0.311382]  efi_mem_reserve+0x36/0x60
[0.311973]  efi_bgrt_init+0x17d/0x1a0
[0.313265]  acpi_parse_bgrt+0x12/0x20
[0.313858]  acpi_table_parse+0x77/0xd0
[0.314463]  acpi_boot_init+0x362/0x630
[0.315069]  setup_arch+0xa88/0xf80
[0.315629]  start_kernel+0x68/0xa90
[0.316194]  x86_64_start_reservations+0x1c/0x30
[0.316921]  x86_64_start_kernel+0xbf/0x110
[0.317582]  common_startup_64+0x13e/0x141

So, probably it is being invoked specifically for AMD platform ?


If we skip efi_arch_mem_reserve() (which should probably be anyway skipped
for kexec case), then for kexec boot, EFI memmap is memremapped in the same
virtual address as the first kernel and not the allocated memblock address.

Maybe we should skip efi_arch_mem_reserve() for kexec case, but I think we
still need to understand what's causing memory corruption.


When, efi_arch_mem_reserve() allocates memory for EFI memory map using 
memblock and then later in boot, kexec_enter_virtual_mode() does 
memremap on this memblock allocated memory, subsequently after this i 
see EFI memory map corruption, so are there are any issues doing 
memremap on memblock-allocated memory ?


Thanks, Ashish


I didn't really dig, but my theory would be that it has something to do
with arch_memremap_can_ram_remap() in arch/x86/mm/ioremap.c

Thanks, Ashish


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Mike Rapoport
On Mon, Jun 03, 2024 at 04:46:39PM +0200, Borislav Petkov wrote:
> On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote:
> > If we skip efi_arch_mem_reserve() (which should probably be anyway skipped
> > for kexec case), then for kexec boot, EFI memmap is memremapped in the same
> > virtual address as the first kernel and not the allocated memblock address.
> 
> Are you saying that we should simply do
> 
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index fdf07dd6f459..410cb0743289 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -577,6 +577,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 size)
>   if (WARN_ON_ONCE(efi_enabled(EFI_PARAVIRT)))
>   return;
>  
> + if (kexec_in_progress)
> + return;
> +
>   if (!memblock_is_region_reserved(addr, size))
>   memblock_reserve(addr, size);
>  
> and skip that whole call?

I think Ashish suggested rather 

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index fdf07dd6f459..eccc10ab15a4 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -580,6 +580,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 size)
if (!memblock_is_region_reserved(addr, size))
memblock_reserve(addr, size);
 
+   if (kexec_in_progress)
+   return;
+
/*
 * Some architectures (x86) reserve all boot services ranges
 * until efi_free_boot_services() because of buggy firmware
 
> -- 
> Regards/Gruss,
> Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

-- 
Sincerely yours,
Mike.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Mike Rapoport
On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote:
> On 6/3/2024 8:39 AM, Mike Rapoport wrote:
> 
> > On Mon, Jun 03, 2024 at 08:06:56AM -0500, Kalra, Ashish wrote:
> > > On 6/3/2024 3:56 AM, Borislav Petkov wrote
> > > 
> > > > > EFI memory map and due to early allocation it uses memblock 
> > > > > allocation.
> > > > > 
> > > > > Later during boot, efi_enter_virtual_mode() calls 
> > > > > kexec_enter_virtual_mode()
> > > > > in case of a kexec-ed kernel boot.
> > > > > 
> > > > > This function kexec_enter_virtual_mode() installs the new EFI memory 
> > > > > map by
> > > > > calling efi_memmap_init_late() which remaps the efi_memmap physically 
> > > > > allocated
> > > > > in efi_arch_mem_reserve(), but this remapping is still using memblock 
> > > > > allocation.
> > > > > 
> > > > > Subsequently, when memblock is freed later in boot flow, this remapped
> > > > > efi_memmap will have random corruption (similar to a use-after-free 
> > > > > scenario).
> > > > > 
> > > > > The corrupted EFI memory map is then passed to the next kexec-ed 
> > > > > kernel
> > > > > which causes a panic when trying to use the corrupted EFI memory map.
> > > > This sounds fishy: memblock allocated memory is not freed later in the
> > > > boot - it remains reserved. Only free memory is freed from memblock to
> > > > the buddy allocator.
> > > > 
> > > > Or is the problem that memblock-allocated memory cannot be memremapped
> > > > because *raisins*?
> > > This is what seems to be happening:
> > > 
> > > efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for
> > > EFI memory map and due to early allocation it uses memblock allocation.
> > > 
> > > And later efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
> > > in case of a kexec-ed kernel boot.
> > > 
> > > This function kexec_enter_virtual_mode() installs the new EFI memory map 
> > > by
> > > calling efi_memmap_init_late() which does memremap() on 
> > > memblock-allocated memory.
> > Does the issue happen only with SNP?
> 
> This is observed under SNP as efi_arch_mem_reserve() is only being called
> with SNP enabled and then efi_arch_mem_reserve() allocates EFI memory map
> using memblock.

I don't see how efi_arch_mem_reserve() is only called with SNP. What did I
miss?
 
> If we skip efi_arch_mem_reserve() (which should probably be anyway skipped
> for kexec case), then for kexec boot, EFI memmap is memremapped in the same
> virtual address as the first kernel and not the allocated memblock address.

Maybe we should skip efi_arch_mem_reserve() for kexec case, but I think we
still need to understand what's causing memory corruption.

> Thanks, Ashish
> 
> > 
> > I didn't really dig, but my theory would be that it has something to do
> > with arch_memremap_can_ram_remap() in arch/x86/mm/ioremap.c
> > > Thanks, Ashish

-- 
Sincerely yours,
Mike.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Borislav Petkov
On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote:
> If we skip efi_arch_mem_reserve() (which should probably be anyway skipped
> for kexec case), then for kexec boot, EFI memmap is memremapped in the same
> virtual address as the first kernel and not the allocated memblock address.

Are you saying that we should simply do

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index fdf07dd6f459..410cb0743289 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -577,6 +577,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 size)
if (WARN_ON_ONCE(efi_enabled(EFI_PARAVIRT)))
return;
 
+   if (kexec_in_progress)
+   return;
+
if (!memblock_is_region_reserved(addr, size))
memblock_reserve(addr, size);
 
and skip that whole call?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Kalra, Ashish

On 6/3/2024 8:39 AM, Mike Rapoport wrote:


On Mon, Jun 03, 2024 at 08:06:56AM -0500, Kalra, Ashish wrote:

On 6/3/2024 3:56 AM, Borislav Petkov wrote


EFI memory map and due to early allocation it uses memblock allocation.

Later during boot, efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
in case of a kexec-ed kernel boot.

This function kexec_enter_virtual_mode() installs the new EFI memory map by
calling efi_memmap_init_late() which remaps the efi_memmap physically allocated
in efi_arch_mem_reserve(), but this remapping is still using memblock 
allocation.

Subsequently, when memblock is freed later in boot flow, this remapped
efi_memmap will have random corruption (similar to a use-after-free scenario).

The corrupted EFI memory map is then passed to the next kexec-ed kernel
which causes a panic when trying to use the corrupted EFI memory map.

This sounds fishy: memblock allocated memory is not freed later in the
boot - it remains reserved. Only free memory is freed from memblock to
the buddy allocator.

Or is the problem that memblock-allocated memory cannot be memremapped
because *raisins*?

This is what seems to be happening:

efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for
EFI memory map and due to early allocation it uses memblock allocation.

And later efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
in case of a kexec-ed kernel boot.

This function kexec_enter_virtual_mode() installs the new EFI memory map by
calling efi_memmap_init_late() which does memremap() on memblock-allocated 
memory.

Does the issue happen only with SNP?


This is observed under SNP as efi_arch_mem_reserve() is only being 
called with SNP enabled and then efi_arch_mem_reserve() allocates EFI 
memory map using memblock.


If we skip efi_arch_mem_reserve() (which should probably be anyway 
skipped for kexec case), then for kexec boot, EFI memmap is memremapped 
in the same virtual address as the first kernel and not the allocated 
memblock address.


Thanks, Ashish



I didn't really dig, but my theory would be that it has something to do
with arch_memremap_can_ram_remap() in arch/x86/mm/ioremap.c
  

Thanks, Ashish


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Mike Rapoport
On Mon, Jun 03, 2024 at 08:06:56AM -0500, Kalra, Ashish wrote:
> On 6/3/2024 3:56 AM, Borislav Petkov wrote
> 
> > > EFI memory map and due to early allocation it uses memblock allocation.
> > > 
> > > Later during boot, efi_enter_virtual_mode() calls 
> > > kexec_enter_virtual_mode()
> > > in case of a kexec-ed kernel boot.
> > > 
> > > This function kexec_enter_virtual_mode() installs the new EFI memory map 
> > > by
> > > calling efi_memmap_init_late() which remaps the efi_memmap physically 
> > > allocated
> > > in efi_arch_mem_reserve(), but this remapping is still using memblock 
> > > allocation.
> > > 
> > > Subsequently, when memblock is freed later in boot flow, this remapped
> > > efi_memmap will have random corruption (similar to a use-after-free 
> > > scenario).
> > > 
> > > The corrupted EFI memory map is then passed to the next kexec-ed kernel
> > > which causes a panic when trying to use the corrupted EFI memory map.
> > This sounds fishy: memblock allocated memory is not freed later in the
> > boot - it remains reserved. Only free memory is freed from memblock to
> > the buddy allocator.
> > 
> > Or is the problem that memblock-allocated memory cannot be memremapped
> > because *raisins*?
> 
> This is what seems to be happening:
> 
> efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for
> EFI memory map and due to early allocation it uses memblock allocation.
> 
> And later efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
> in case of a kexec-ed kernel boot.
> 
> This function kexec_enter_virtual_mode() installs the new EFI memory map by
> calling efi_memmap_init_late() which does memremap() on memblock-allocated 
> memory.

Does the issue happen only with SNP?

I didn't really dig, but my theory would be that it has something to do
with arch_memremap_can_ram_remap() in arch/x86/mm/ioremap.c
 
> Thanks, Ashish

-- 
Sincerely yours,
Mike.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Kalra, Ashish

On 6/3/2024 3:56 AM, Borislav Petkov wrote


EFI memory map and due to early allocation it uses memblock allocation.

Later during boot, efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
in case of a kexec-ed kernel boot.

This function kexec_enter_virtual_mode() installs the new EFI memory map by
calling efi_memmap_init_late() which remaps the efi_memmap physically allocated
in efi_arch_mem_reserve(), but this remapping is still using memblock 
allocation.

Subsequently, when memblock is freed later in boot flow, this remapped
efi_memmap will have random corruption (similar to a use-after-free scenario).

The corrupted EFI memory map is then passed to the next kexec-ed kernel
which causes a panic when trying to use the corrupted EFI memory map.

This sounds fishy: memblock allocated memory is not freed later in the
boot - it remains reserved. Only free memory is freed from memblock to
the buddy allocator.

Or is the problem that memblock-allocated memory cannot be memremapped
because *raisins*?


This is what seems to be happening:

efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for
EFI memory map and due to early allocation it uses memblock allocation.

And later efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
in case of a kexec-ed kernel boot.

This function kexec_enter_virtual_mode() installs the new EFI memory map by
calling efi_memmap_init_late() which does memremap() on memblock-allocated 
memory.

Thanks, Ashish



Mike?



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-06-03 Thread Borislav Petkov
On Thu, May 30, 2024 at 11:36:55PM +, Ashish Kalra wrote:
> From: Ashish Kalra 
> 
> With SNP guest kexec observe the following efi memmap corruption :
> 
> [0.00] efi: EFI v2.7 by EDK II
> [0.00] efi: SMBIOS=0x7e33f000 SMBIOS 3.0=0x7e33d000 ACPI=0x7e57e000 
> ACPI 2.0=0x7e57e014 MEMATTR=0x7cc3c018 Unaccepted=0x7c09e018
> [0.00] efi: [Firmware Bug]: Invalid EFI memory map entries:
> [0.00] efi: mem03: [type=269370880|attr=0x0e42100e42180e41] 
> range=[0x0486200e41038c18-0x200e898a0eee713ac17] (invalid)
> [0.00] efi: mem04: [type=12336|attr=0x0e410686300e4105] 
> range=[0x100e42000176-0x8c290f26248d200e175] (invalid)
> [0.00] efi: mem06: [type=1124304408|attr=0x30b40028] 
> range=[0x0e51300e45280e77-0xb44ed2142f460c1e76] (invalid)
> [0.00] efi: mem08: [type=68|attr=0x300e540583280e41] 
> range=[0x011a3cd8-0x486200e54b38c0bcd7] (invalid)
> [0.00] efi: mem10: [type=1107529240|attr=0x0e42280e41300e41] 
> range=[0x300e41058c280e42-0x38010ae54c5c328ee41] (invalid)
> [0.00] efi: mem11: [type=189335566|attr=0x048d200e42038e18] 
> range=[0x318c0048-0xe42029228ce4200047] (invalid)
> [0.00] efi: mem12: [type=239142534|attr=0x00240b4b] 
> range=[0x0e41380e0a7d700e-0x80f26238f22bfe500d] (invalid)
> [0.00] efi: mem14: [type=239207055|attr=0x0e41300e43380e0a] 
> range=[0x8c280e42048d200e-0xc70b028f2f27cc0a00d] (invalid)
> [0.00] efi: mem15: [type=239210510|attr=0x00080e660b47080e] 
> range=[0x324c001c-0xa78028634ce490001b] (invalid)
> [0.00] efi: mem16: [type=4294848528|attr=0x32940014] 
> range=[0x0e410286100e4100-0x80f252036a218f20ff] (invalid)
> [0.00] efi: mem19: [type=2250772033|attr=0x42180e42200e4328] 
> range=[0x41280e0ab9020683-0xe0e538c28b39e62682] (invalid)
> [0.00] efi: mem20: [type=16|   |  |  |  |  |  |  |  |  |   |WB|  |WC| 
>  ] range=[0x00084438-0x44340090333c437] (invalid)
> [0.00] efi: mem22: [Reserved|attr=0x00c14420] 
> range=[0x44243398-0x1033a04240003f397] (invalid)
> [0.00] efi: mem23: [type=1141080856|attr=0x080e41100e43180e] 
> range=[0x280e66300e4b280e-0x440dc5ee7141f4c080d] (invalid)
> [0.00] efi: mem25: [Reserved|attr=0x000a44a0] 
> range=[0x44a43428-0x1034304a400013427] (invalid)
> [0.00] efi: mem28: [type=16|   |  |  |  |  |  |  |  |  |   |WB|  |WC| 
>  ] range=[0x000a4488-0x448400b034bc487] (invalid)
> [0.00] efi: mem30: [Reserved|attr=0x000a4470] 
> range=[0x44743518-0x10352047400013517] (invalid)
> [0.00] efi: mem33: [type=16|   |  |  |  |  |  |  |  |  |   |WB|  |WC| 
>  ] range=[0x000a4458-0x445400b035ac457] (invalid)
> [0.00] efi: mem35: [type=269372416|attr=0x0e42100e42180e41] 
> range=[0x0486200e44038c18-0x200e8b8a0eee823ac17] (invalid)
> [0.00] efi: mem37: [type=2351435330|attr=0x0e42100e42180e42] 
> range=[0x470783380e410686-0x2002b2a041c2141e685] (invalid)
> [0.00] efi: mem38: [type=1093668417|attr=0x100e42000270] 
> range=[0x42100e42180e4220-0xfff366a4e421b78c21f] (invalid)
> [0.00] efi: mem39: [type=76357646|attr=0x180e42200e42280e] 
> range=[0x0e410686300e4105-0x4130f251a0710ae5104] (invalid)
> [0.00] efi: mem40: [type=940444268|attr=0x0e42200e42280e41] 
> range=[0x180e42200e42280e-0x300fc71c300b4f2480d] (invalid)
> [0.00] efi: mem41: [MMIO|attr=0x8c280e42048d200e] 
> range=[0x47943728-0x42138e0c87820292727] (invalid)
> [0.00] efi: mem42: [type=1191674680|attr=0x004c000b] 
> range=[0x300e41380e0a0246-0x470b0f26238f22b8245] (invalid)
> [0.00] efi: mem43: [type=2010|attr=0x0301f00e4d078338] 
> range=[0x45038e180e42028f-0xe4556bf118f282528e] (invalid)
> [0.00] efi: mem44: [type=1109921345|attr=0x300e446c] 
> range=[0x44080e42100e4218-0xfff39254e42138ac217] (invalid)
> ...
> 
> This EFI memap corruption is happening with efi_arch_mem_reserve() invocation 
> in case of kexec boot.
> 
> ( efi_arch_mem_reserve() is invoked with the following call-stack: )
> 
> [0.310010]  efi_arch_mem_reserve+0xb1/0x220
> [0.311382]  efi_mem_reserve+0x36/0x60
> [0.311973]  efi_bgrt_init+0x17d/0x1a0
> [0.313265]  acpi_parse_bgrt+0x12/0x20
> [0.313858]  acpi_table_parse+0x77/0xd0
> [0.314463]  acpi_boot_init+0x362/0x630
> [0.315069]  setup_arch+0xa88/0xf80
> [0.315629]  start_kernel+0x68/0xa90
> [0.316194]  x86_64_start_reservations+0x1c/0x30
> [0.316921]  x86_64_start_kernel+0xbf/0x110
> [0.317582]  common_startup_64+0x13e/0x141
> 
> efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for
> EFI memory map and due to early allocation it uses memblock allocation.
> 
> Later during boot, efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
> in case of a kexec-ed kernel boot.
> 
> This function kexec_enter_virtual_mode() 

Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-05-31 Thread Alexander Kuleshov

On 30.05.2024 23:36, Ashish Kalra wrote:

From: Ashish Kalra 
+* but it is not neccasery for kexec, as there are no boot services in


A typo in necessary

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

2024-05-30 Thread Ashish Kalra
From: Ashish Kalra 

With SNP guest kexec observe the following efi memmap corruption :

[0.00] efi: EFI v2.7 by EDK II
[0.00] efi: SMBIOS=0x7e33f000 SMBIOS 3.0=0x7e33d000 ACPI=0x7e57e000 
ACPI 2.0=0x7e57e014 MEMATTR=0x7cc3c018 Unaccepted=0x7c09e018
[0.00] efi: [Firmware Bug]: Invalid EFI memory map entries:
[0.00] efi: mem03: [type=269370880|attr=0x0e42100e42180e41] 
range=[0x0486200e41038c18-0x200e898a0eee713ac17] (invalid)
[0.00] efi: mem04: [type=12336|attr=0x0e410686300e4105] 
range=[0x100e42000176-0x8c290f26248d200e175] (invalid)
[0.00] efi: mem06: [type=1124304408|attr=0x30b40028] 
range=[0x0e51300e45280e77-0xb44ed2142f460c1e76] (invalid)
[0.00] efi: mem08: [type=68|attr=0x300e540583280e41] 
range=[0x011a3cd8-0x486200e54b38c0bcd7] (invalid)
[0.00] efi: mem10: [type=1107529240|attr=0x0e42280e41300e41] 
range=[0x300e41058c280e42-0x38010ae54c5c328ee41] (invalid)
[0.00] efi: mem11: [type=189335566|attr=0x048d200e42038e18] 
range=[0x318c0048-0xe42029228ce4200047] (invalid)
[0.00] efi: mem12: [type=239142534|attr=0x00240b4b] 
range=[0x0e41380e0a7d700e-0x80f26238f22bfe500d] (invalid)
[0.00] efi: mem14: [type=239207055|attr=0x0e41300e43380e0a] 
range=[0x8c280e42048d200e-0xc70b028f2f27cc0a00d] (invalid)
[0.00] efi: mem15: [type=239210510|attr=0x00080e660b47080e] 
range=[0x324c001c-0xa78028634ce490001b] (invalid)
[0.00] efi: mem16: [type=4294848528|attr=0x32940014] 
range=[0x0e410286100e4100-0x80f252036a218f20ff] (invalid)
[0.00] efi: mem19: [type=2250772033|attr=0x42180e42200e4328] 
range=[0x41280e0ab9020683-0xe0e538c28b39e62682] (invalid)
[0.00] efi: mem20: [type=16|   |  |  |  |  |  |  |  |  |   |WB|  |WC|  
] range=[0x00084438-0x44340090333c437] (invalid)
[0.00] efi: mem22: [Reserved|attr=0x00c14420] 
range=[0x44243398-0x1033a04240003f397] (invalid)
[0.00] efi: mem23: [type=1141080856|attr=0x080e41100e43180e] 
range=[0x280e66300e4b280e-0x440dc5ee7141f4c080d] (invalid)
[0.00] efi: mem25: [Reserved|attr=0x000a44a0] 
range=[0x44a43428-0x1034304a400013427] (invalid)
[0.00] efi: mem28: [type=16|   |  |  |  |  |  |  |  |  |   |WB|  |WC|  
] range=[0x000a4488-0x448400b034bc487] (invalid)
[0.00] efi: mem30: [Reserved|attr=0x000a4470] 
range=[0x44743518-0x10352047400013517] (invalid)
[0.00] efi: mem33: [type=16|   |  |  |  |  |  |  |  |  |   |WB|  |WC|  
] range=[0x000a4458-0x445400b035ac457] (invalid)
[0.00] efi: mem35: [type=269372416|attr=0x0e42100e42180e41] 
range=[0x0486200e44038c18-0x200e8b8a0eee823ac17] (invalid)
[0.00] efi: mem37: [type=2351435330|attr=0x0e42100e42180e42] 
range=[0x470783380e410686-0x2002b2a041c2141e685] (invalid)
[0.00] efi: mem38: [type=1093668417|attr=0x100e42000270] 
range=[0x42100e42180e4220-0xfff366a4e421b78c21f] (invalid)
[0.00] efi: mem39: [type=76357646|attr=0x180e42200e42280e] 
range=[0x0e410686300e4105-0x4130f251a0710ae5104] (invalid)
[0.00] efi: mem40: [type=940444268|attr=0x0e42200e42280e41] 
range=[0x180e42200e42280e-0x300fc71c300b4f2480d] (invalid)
[0.00] efi: mem41: [MMIO|attr=0x8c280e42048d200e] 
range=[0x47943728-0x42138e0c87820292727] (invalid)
[0.00] efi: mem42: [type=1191674680|attr=0x004c000b] 
range=[0x300e41380e0a0246-0x470b0f26238f22b8245] (invalid)
[0.00] efi: mem43: [type=2010|attr=0x0301f00e4d078338] 
range=[0x45038e180e42028f-0xe4556bf118f282528e] (invalid)
[0.00] efi: mem44: [type=1109921345|attr=0x300e446c] 
range=[0x44080e42100e4218-0xfff39254e42138ac217] (invalid)
...

This EFI memap corruption is happening with efi_arch_mem_reserve() invocation 
in case of kexec boot.

( efi_arch_mem_reserve() is invoked with the following call-stack: )

[0.310010]  efi_arch_mem_reserve+0xb1/0x220
[0.311382]  efi_mem_reserve+0x36/0x60
[0.311973]  efi_bgrt_init+0x17d/0x1a0
[0.313265]  acpi_parse_bgrt+0x12/0x20
[0.313858]  acpi_table_parse+0x77/0xd0
[0.314463]  acpi_boot_init+0x362/0x630
[0.315069]  setup_arch+0xa88/0xf80
[0.315629]  start_kernel+0x68/0xa90
[0.316194]  x86_64_start_reservations+0x1c/0x30
[0.316921]  x86_64_start_kernel+0xbf/0x110
[0.317582]  common_startup_64+0x13e/0x141

efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for
EFI memory map and due to early allocation it uses memblock allocation.

Later during boot, efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
in case of a kexec-ed kernel boot.

This function kexec_enter_virtual_mode() installs the new EFI memory map by
calling efi_memmap_init_late() which remaps the efi_memmap physically allocated
in efi_arch_mem_reserve(), but this remapping is still using memblock 
allocation.

Subsequently, when memblock is freed