Re: [RFC] arm64: extra entries in /proc/iomem for kexec

2018-04-25 Thread James Morse
Hi Akashi,

On 25/04/18 10:20, AKASHI Takahiro wrote:
> On Tue, Apr 24, 2018 at 05:08:57PM +0100, James Morse wrote:
>> On 16/04/18 11:08, AKASHI Takahiro wrote:
>>> On Thu, Apr 12, 2018 at 05:01:52PM +0100, James Morse wrote:
 On 05/04/18 03:42, AKASHI Takahiro wrote:
> On Mon, Apr 02, 2018 at 10:53:32AM +0900, AKASHI Takahiro wrote:
>> either because
>> a. new kernel (or initrd/dtb) may have been allocated on a NOMAP region
>>which are not suitable for usable memory, or
>> b. new kernel (or initrd/dtb) may have been allocated on a reserved 
>> region
>>whose contents can be overwritten.
>>
>> While we see (b) even today, (a) is a backward compatibility issue.

 (a) doesn't happen because request_standard_resources() checks
 memblock_is_nomap(), and reports those regions as 'reserved'.
>>>
>>> I might have confused you. The assumption here was that we adopt format (D),
>>> where all NOMAP regions are sub nodes of "System RAM", but still use
>>> the current kexec-tools.
>>> As I said above, this will end up an un-expected behavior.
>>
>> I'd like to fix this without having to fix user-space at the same time. It 
>> looks
>> like no-one else has second level reserved regions,
> 
> This was my assumption when I sent out a patch to kexec-tools.

But this would still leave user-space that isn't updated broken.


> # I don't know yet whether people are happy with this fix, and also have
>   kernel patches for my other approaches. They are neither not much
>   complicated.

 I don't think we should fix this in userspace, exporting all the
 memblock_reserved() regions as 'reserved' in /proc/iomem looks like the 
 right
 thing to do.
>>>
>>> Again, if you modify /proc/iomem, you have to update kexec-tools, too.
>>
>> If we squash the memblock_reserved() stuff down so it appears as a top level
>> 'reserved' region too, I don't think we do.
> 
> If I correctly understand, you're talking about my format (E).
> As I said, it will fix the issue without modifying user-space, but
> 
> || This does not only look quite noisy but also ignores the fact that
> || reserved regions are part of System RAM (or memblock.memory).

I agree its noisy, there are significantly more 'reserved' areas, but these are
all either nomap or memblock_reserved().

Why does it matter if a reserved-region is nomap or memblock_reserved()? Any new
kernel will learn the difference from the EFI memory map and make its own 
decisions.

Kexec-tools only needs to know what it can overwrite without clobbering
important data like the UEFI memory map, or the APCI tables covered by the
linear map.


>> This prevents the efi-memory-map
>> being overwritten on kernels since kexec was merged.
>>
>> Its horribly fiddly to do this. The kernel code/data are special reserved
>> regions that we already describe as a subset of system-ram, even though they 
>> are
>> both also fragments of a bigger memblock_reserved() block.
> 
> Actually, we don't have to avoid kernel code/data regions as copying
> loaded data to the final destinations will be done at the very end of kexec.

For kexec yes, but that is the existing format of the file, which we shouldn't
change, otherwise we break something else.


>> While we can walk memblock for regions that aren't reserved, allocating 
>> memory
>> in the loop changes what is reserved. That one O(N) walk ends up being 
>> four...
> 
> At most O(n^2)?

I think for_each_free_mem_range() is smart enough not to do that. Patch 
incoming...


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] arm64: extra entries in /proc/iomem for kexec

2018-04-25 Thread AKASHI Takahiro
On Tue, Apr 24, 2018 at 05:08:57PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 16/04/18 11:08, AKASHI Takahiro wrote:
> > On Thu, Apr 12, 2018 at 05:01:52PM +0100, James Morse wrote:
> >> On 05/04/18 03:42, AKASHI Takahiro wrote:
> >>> On Mon, Apr 02, 2018 at 10:53:32AM +0900, AKASHI Takahiro wrote:
>  Basically, changes that I made on /proc/iomem in my new format D were:
>  1. to move NOMAP region entries, formerly named "reserved" and now named
> "reserved (no map)", under System RAM
>  2. to add new entries for firmware-reserved regions as "reserved" also
> under System RAM
> 
>  On the other hand, current kexec-tools, in particular kexec command,
>  only scan top-level "System RAM" entries as well as "reserved" entries.
> >>
> >> as well as?
> > 
> > I had few words here.
> > The current kexec-tools assumes that "reserved" entries appear only
> > at the top level. So,
> > 
> >> Does this mean kexec will pick up the reserved region if its written as:
> >> | 1000-0009d7ff : System RAM
> >> |1000-1fff  : reserved
> > 
> > if this is the case, the range "0x1000-0x1fff" is added to an internal
> > list of memory ranges 
> 
> I found this in get_memory_ranges_iomem_cb()...
> 
> 
> > but will later be *ignored* by locate_hole() function
> > due to its memory type.
> 
> Ugh. Great.
> 
> 
> > That is, the range can potentially be overwritten by loaded kernel/initrd.
> 
> So two kernel bugs, one user-space bug, all conspiring.
> 
> 
>  either because
>  a. new kernel (or initrd/dtb) may have been allocated on a NOMAP region
> which are not suitable for usable memory, or
>  b. new kernel (or initrd/dtb) may have been allocated on a reserved 
>  region
> whose contents can be overwritten.
> 
>  While we see (b) even today, (a) is a backward compatibility issue.
> >>
> >> (a) doesn't happen because request_standard_resources() checks
> >> memblock_is_nomap(), and reports those regions as 'reserved'.
> > 
> > I might have confused you. The assumption here was that we adopt format (D),
> > where all NOMAP regions are sub nodes of "System RAM", but still use
> > the current kexec-tools.
> > As I said above, this will end up an un-expected behavior.
> 
> I'd like to fix this without having to fix user-space at the same time. It 
> looks
> like no-one else has second level reserved regions,

This was my assumption when I sent out a patch to kexec-tools.

> so we can't blame
> kexec-tools for looking straight at them, then ignoring them.
> 
> 
> >> We can't expect user-space to upgrade to fix this issue.
> > 
> > I'm not sure what you mean here; we can't fix the issue anyway
> > without changing user-space/kexec-tools as kexec_load system call totally
> > relies on parameters passed by kexec-tools.
> > (The only difference is whether we need additional kernel changes or not.)
> 
> It looks like this was always broken because the efi memory map isn't listed 
> as
> 'reserved' in /proc/iomem. The fallout for the new stuff is secondary.
> 
> 
> >>> # I don't know yet whether people are happy with this fix, and also have
> >>>   kernel patches for my other approaches. They are neither not much
> >>>   complicated.
> >>
> >> I don't think we should fix this in userspace, exporting all the
> >> memblock_reserved() regions as 'reserved' in /proc/iomem looks like the 
> >> right
> >> thing to do.
> > 
> > Again, if you modify /proc/iomem, you have to update kexec-tools, too.
> 
> If we squash the memblock_reserved() stuff down so it appears as a top level
> 'reserved' region too, I don't think we do.

If I correctly understand, you're talking about my format (E).
As I said, it will fix the issue without modifying user-space, but

|| This does not only look quite noisy but also ignores the fact that
|| reserved regions are part of System RAM (or memblock.memory).

> This prevents the efi-memory-map
> being overwritten on kernels since kexec was merged.
>
> Its horribly fiddly to do this. The kernel code/data are special reserved
> regions that we already describe as a subset of system-ram, even though they 
> are
> both also fragments of a bigger memblock_reserved() block.

Actually, we don't have to avoid kernel code/data regions as copying
loaded data to the final destinations will be done at the very end of kexec.

> While we can walk memblock for regions that aren't reserved, allocating memory
> in the loop changes what is reserved. That one O(N) walk ends up being four...

At most O(n^2)?

Thanks,
-Takhairo AKASHI

> I'm almost done tearing my hair out, I should have a working patch soon...
> 
> 
> >> wasn't there going to be another version, with the core EFI
> >> stuff split out?
> > 
> > ? I don't remember well ...
> 
> https://lkml.org/lkml/2018/2/1/496
> 
> 
> Thanks,
> 
> James
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec