Thank you for the comment, H. Peter, Andy and Baoquan.

在 2021年04月12日 17:52, Baoquan He 写道:
> On 04/11/21 at 06:49pm, Andy Lutomirski wrote:
>>
>>
>>> On Apr 11, 2021, at 6:14 PM, Baoquan He <[email protected]> wrote:
>>>
>>> On 04/09/21 at 07:59pm, H. Peter Anvin wrote:
>>>> Why don't we do this unconditionally? At the very best we gain half a 
>>>> megabyte of memory (except the trampoline, which has to live there, but it 
>>>> is only a few kilobytes.)
>>>
>>> This is a great suggestion, thanks. I think we can fix it in this way to
>>> make code simpler. Then the specific caring of real mode in
>>> efi_free_boot_services() can be removed too.
>>>
>>
>> This whole situation makes me think that the code is buggy before and buggy 
>> after.
>>
>> The issue here (I think) is that various pieces of code want to reserve 
>> specific pieces of otherwise-available low memory for their own nefarious 
>> uses. I don’t know *why* crash kernel needs this, but that doesn’t matter 
>> too much.
> 
> Kdump kernel also need go through real mode code path during bootup. It
> is not different than normal kernel except that it skips the firmware
> resetting. So kdump kernel needs low 1M as system RAM just as normal
> kernel does. Here we reserve the whole low 1M with memblock_reserve()
> to avoid any later kernel or driver data reside in this area. Otherwise,
> we need dump the content of this area to vmcore. As we know, when crash
> happened, the old memory of 1st kernel should be untouched until vmcore
> dumping read out its content. Meanwhile, kdump kernel need reuse low 1M.
> In the past, we used a back up region to copy out the low 1M area, and
> map the back up region into the low 1M area in vmcore elf file. In
> 6f599d84231fd27 ("x86/kdump: Always reserve the low 1M when the crashkernel
> option is specified"), we changed to lock the whole low 1M to avoid
> writting any kernel data into, like this we can skip this area when
> dumping vmcore.
> 
> Above is why we try to memblock reserve the whole low 1M. We don't want
> to use it, just don't want anyone to use it in 1st kernel.
> 
>>
>> I propose that the right solution is to give low-memory-reserving code paths 
>> two chances to do what they need: once at the very beginning and once after 
>> EFI boot services are freed.
>>
>> Alternatively, just reserve *all* otherwise unused sub 1M memory up front, 
>> then release it right after releasing boot services, and then invoke the 
>> special cases exactly once.
> 

After EFI boot services are freed, I'm worried that it's a bit late. All sub-1M 
memory regions need to be reserved early as soon as possible.

> I am not sure if I got both suggested ways clearly. They look a little
> complicated in our case. As I explained at above, we want the whole low
> 1M locked up, not one piece or some pieces of it.
> 
>>
>> In either case, the result is that the crashkernel mess gets unified with 
>> the trampoline mess.  One way the result is called twice and needs to be 
>> more careful, and the other way it’s called only once.
>>

That may still have a chance to allocate memory from sub-1M regions at some 
point, because EFI boot services will be freed after EFI enters virtual mode, 
it looks late.

>> Just skipping freeing boot services seems wrong.  It doesn’t unmap boot 
>> services, and skipping that is incorrect, I think. And it seems to result in 
>> a bogus memory map in which the system thinks that some crashkernel memory 
>> is EFI memory instead.
> 
> I like hpa's thought to lock the whole low 1M unconditionally since only
> a few KB except of trampoline area is there. Rethinking about it, doing
> it in can_free_region() may be risky because efi memory region could
> cross the 1M boundary, e.g [640K, 100M] with type of
> EFI_BOOT_SERVICES_CODE|EFI_BOOT_SERVICES_DATA, it could cause loss of memory.

Theoretically, yes. But so far I haven't seen the situation of crossing the 1M 
boundary.

Thanks.
Lianbo

> Just a wild guess, not very sure if the 1M boundary corssing can really
> happen. efi_reserve_boot_services() won't split regions.
> 
> If moving efi_reserve_boot_services() after reserve_real_mode() is not
> accepted, maybe we can call efi_mem_reserve(0, 1M) just as
> efi_esrt_init() has done.
> 

Reply via email to