Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-27 Thread Anshuman Khandual


On 1/27/21 2:59 PM, David Hildenbrand wrote:
> On 27.01.21 05:06, Anshuman Khandual wrote:
>>
>>
>> On 1/25/21 2:43 PM, David Hildenbrand wrote:
>>> On 25.01.21 07:22, Anshuman Khandual wrote:

 On 12/22/20 12:42 PM, Anshuman Khandual wrote:
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory 
> is
> that they do not have memblock entries. Hence memblock_is_map_memory() 
> will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. 
> This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() 
> needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets 
> hotplugged
> into the system via memremap_pages() called from a driver, their 
> respective
> memory sections will not have SECTION_IS_EARLY set.
>
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an 
> optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all 
> hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
>
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
>
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Ard Biesheuvel 
> Cc: Robin Murphy 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual 

 Hello David/Mike,

 Given that we would need to rework early sections, memblock semantics via a
 new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
 ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
 which fixes a problem (and improves performance) can be merged first. After
 that, I could start working on the proposed rework. Could you please let me
 know your thoughts on this. Thank you.
>>>
>>> As I said, we might have to throw in an pfn_section_valid() check, to
>>> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
>>> on arm64 as well, no?).
>>
>> pfn_section_valid() should be called only for !early_section() i.e normal
>> hotplug and ZONE_DEVICE memory ? Because early boot memory should always
>> be section aligned.
> 
> Well, at least not on x86-64 you can have early sections intersect with 
> ZONE_DEVICE memory.
> 
> E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE memory 
> which might cover the remaining 64MB. For pfn_valid() on x86-64, we always 
> return "true" for such sections, because we always have the memmap for the 
> whole early section allocated during boot. So, there it's "simple".

This is the generic pfn_valid() used on X86. As you mentioned this
does not test pfn_section_valid() if the section is early assuming
that vmemmap coverage is complete.

#ifndef CONFIG_HAVE_ARCH_PFN_VALID
static inline int pfn_valid(unsigned long pfn)
{
struct mem_section *ms;

if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
ms = __nr_to_section(pfn_to_section_nr(pfn));
if (!valid_section(ms))
return 0;
/*
 * Traditionally early sections always returned pfn_valid() for
 * the entire section-sized span.
 */
return early_section(ms) || pfn_section_valid(ms, pfn);
}
#endif

Looking at the code, seems like early sections get initialized via
sparse_init() only in section granularity but then please correct
me otherwise.

> 
> Now, arm64 seems to discard some parts of the vmemmap, so the remaining 64MB 
> in such an early section might not have a memmap anymore? TBH, I don't know.

Did not get that. Could you please be more specific on how arm64 discards
parts of the vmemmap.

> 
> Most probably only performing the check for
> !early_section() is sufficient on arm64, but I really can't tell as I don't 
> know what we're actually discarding and if something as described for x86-64 
> is even possible on arm64.

Seems like direct users for arch_add_memory() and __add_pages() like
pagemap_range() can cause subsection hotplug and vmemmap mapping. So
pfn_section_valid() should be applicable only for !early_sections().

Although a simple test on arm64 shows that both boot memory and
traditional memory hotplug gets entire subsection_map populated. But
that might not be always true for ZONE_DEVICE memory.

> 
> We should really try to take the magic out of arm64 

Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-27 Thread David Hildenbrand

On 27.01.21 05:06, Anshuman Khandual wrote:



On 1/25/21 2:43 PM, David Hildenbrand wrote:

On 25.01.21 07:22, Anshuman Khandual wrote:


On 12/22/20 12:42 PM, Anshuman Khandual wrote:

pfn_valid() asserts that there is a memblock entry for a given pfn without
MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
that they do not have memblock entries. Hence memblock_is_map_memory() will
invariably fail via memblock_search() for a ZONE_DEVICE based address. This
eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
into the system via memremap_pages() called from a driver, their respective
memory sections will not have SECTION_IS_EARLY set.

Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
for firmware reserved memory regions. memblock_is_map_memory() can just be
skipped as its always going to be positive and that will be an optimization
for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
normal memory too will not have SECTION_IS_EARLY set for their sections.

Skipping memblock_is_map_memory() for all non early memory sections would
fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
performance for normal hotplug memory as well.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Ard Biesheuvel 
Cc: Robin Murphy 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
Signed-off-by: Anshuman Khandual 


Hello David/Mike,

Given that we would need to rework early sections, memblock semantics via a
new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
which fixes a problem (and improves performance) can be merged first. After
that, I could start working on the proposed rework. Could you please let me
know your thoughts on this. Thank you.


As I said, we might have to throw in an pfn_section_valid() check, to
catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
on arm64 as well, no?).


pfn_section_valid() should be called only for !early_section() i.e normal
hotplug and ZONE_DEVICE memory ? Because early boot memory should always
be section aligned.


Well, at least not on x86-64 you can have early sections intersect with 
ZONE_DEVICE memory.


E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE 
memory which might cover the remaining 64MB. For pfn_valid() on x86-64, 
we always return "true" for such sections, because we always have the 
memmap for the whole early section allocated during boot. So, there it's 
"simple".


Now, arm64 seems to discard some parts of the vmemmap, so the remaining 
64MB in such an early section might not have a memmap anymore? TBH, I 
don't know.


Most probably only performing the check for
!early_section() is sufficient on arm64, but I really can't tell as I 
don't know what we're actually discarding and if something as described 
for x86-64 is even possible on arm64.


We should really try to take the magic out of arm64 vmemmap handling.





Apart from that, I'm fine with a simple fix upfront, that can be more
easily backported if needed. (Q: do we? is this stable material?)



Right, an upfront fix here would help in backporting. AFAICS it should be
backported to the stable as pte_devmap and ZONE_DEVICE have been around
for some time now. Do you have a particular stable version which needs to
be tagged in the patch ?


I haven't looked yet TBH. I guess it is broken since ZONE_DEVICE was 
enabled on arm64?


--
Thanks,

David / dhildenb



Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-26 Thread Anshuman Khandual



On 1/25/21 1:01 PM, Mike Rapoport wrote:
> On Mon, Jan 25, 2021 at 11:52:32AM +0530, Anshuman Khandual wrote:
>>
>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas 
>>> Cc: Will Deacon 
>>> Cc: Ard Biesheuvel 
>>> Cc: Robin Murphy 
>>> Cc: linux-arm-ker...@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual 
>>
>> Hello David/Mike,
>>
>> Given that we would need to rework early sections, memblock semantics via a
>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>> which fixes a problem (and improves performance) can be merged first. After
>> that, I could start working on the proposed rework. Could you please let me
>> know your thoughts on this. Thank you.
> 
> I didn't object to these patches, I think they are fine.
> I agree that we can look into update of arm64's pfn_valid(), maybe right
> after decrease of section size lands in.

Sure, will drop the RFC tag and prepare these patches.


Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-26 Thread Anshuman Khandual



On 1/25/21 2:43 PM, David Hildenbrand wrote:
> On 25.01.21 07:22, Anshuman Khandual wrote:
>>
>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas 
>>> Cc: Will Deacon 
>>> Cc: Ard Biesheuvel 
>>> Cc: Robin Murphy 
>>> Cc: linux-arm-ker...@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual 
>>
>> Hello David/Mike,
>>
>> Given that we would need to rework early sections, memblock semantics via a
>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>> which fixes a problem (and improves performance) can be merged first. After
>> that, I could start working on the proposed rework. Could you please let me
>> know your thoughts on this. Thank you.
> 
> As I said, we might have to throw in an pfn_section_valid() check, to
> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
> on arm64 as well, no?).

pfn_section_valid() should be called only for !early_section() i.e normal
hotplug and ZONE_DEVICE memory ? Because early boot memory should always
be section aligned.

> 
> Apart from that, I'm fine with a simple fix upfront, that can be more
> easily backported if needed. (Q: do we? is this stable material?)
> 

Right, an upfront fix here would help in backporting. AFAICS it should be
backported to the stable as pte_devmap and ZONE_DEVICE have been around
for some time now. Do you have a particular stable version which needs to
be tagged in the patch ?


Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-25 Thread David Hildenbrand
On 25.01.21 07:22, Anshuman Khandual wrote:
> 
> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: Ard Biesheuvel 
>> Cc: Robin Murphy 
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual 
> 
> Hello David/Mike,
> 
> Given that we would need to rework early sections, memblock semantics via a
> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
> which fixes a problem (and improves performance) can be merged first. After
> that, I could start working on the proposed rework. Could you please let me
> know your thoughts on this. Thank you.

As I said, we might have to throw in an pfn_section_valid() check, to
catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
on arm64 as well, no?).

Apart from that, I'm fine with a simple fix upfront, that can be more
easily backported if needed. (Q: do we? is this stable material?)

-- 
Thanks,

David / dhildenb



Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-24 Thread Mike Rapoport
On Mon, Jan 25, 2021 at 11:52:32AM +0530, Anshuman Khandual wrote:
> 
> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
> > pfn_valid() asserts that there is a memblock entry for a given pfn without
> > MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> > that they do not have memblock entries. Hence memblock_is_map_memory() will
> > invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> > eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> > to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> > into the system via memremap_pages() called from a driver, their respective
> > memory sections will not have SECTION_IS_EARLY set.
> > 
> > Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> > regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> > for firmware reserved memory regions. memblock_is_map_memory() can just be
> > skipped as its always going to be positive and that will be an optimization
> > for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> > normal memory too will not have SECTION_IS_EARLY set for their sections.
> > 
> > Skipping memblock_is_map_memory() for all non early memory sections would
> > fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> > performance for normal hotplug memory as well.
> > 
> > Cc: Catalin Marinas 
> > Cc: Will Deacon 
> > Cc: Ard Biesheuvel 
> > Cc: Robin Murphy 
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> > Signed-off-by: Anshuman Khandual 
> 
> Hello David/Mike,
> 
> Given that we would need to rework early sections, memblock semantics via a
> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
> which fixes a problem (and improves performance) can be merged first. After
> that, I could start working on the proposed rework. Could you please let me
> know your thoughts on this. Thank you.

I didn't object to these patches, I think they are fine.
I agree that we can look into update of arm64's pfn_valid(), maybe right
after decrease of section size lands in.
 
> - Anshuman

-- 
Sincerely yours,
Mike.


Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-24 Thread Anshuman Khandual


On 12/22/20 12:42 PM, Anshuman Khandual wrote:
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Ard Biesheuvel 
> Cc: Robin Murphy 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual 

Hello David/Mike,

Given that we would need to rework early sections, memblock semantics via a
new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
which fixes a problem (and improves performance) can be merged first. After
that, I could start working on the proposed rework. Could you please let me
know your thoughts on this. Thank you.

- Anshuman


Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-11 Thread Mike Rapoport
On Mon, Jan 11, 2021 at 11:31:02AM +0100, David Hildenbrand wrote:
> On 04.01.21 07:18, Anshuman Khandual wrote:

...

> >> Actually, I think we want to check for partial present sections.
> >>
> >> Maybe we can rather switch to generic pfn_valid() and tweak it to
> >> something like
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index fb3bf696c05e..7b1fcce5bd5a 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
> >> return 0;
> >> /*
> >>  * Traditionally early sections always returned pfn_valid() for
> >> -* the entire section-sized span.
> >> +* the entire section-sized span. Some archs might have holes in
> >> +* early sections, so double check with memblock if configured.
> >>  */
> >> -   return early_section(ms) || pfn_section_valid(ms, pfn);
> >> +   if (early_section(ms))
> >> +   return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
> >> +  memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> >> +   return pfn_section_valid(ms, pfn);
> >>  }
> >>  #endif
> > 
> > Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
> > create this config which could track platform scenarios where all early
> > sections might not have mmap coverage such as arm64 ?
> 
> Yes, a new config that states what's actually happening.

Just to clarify, there are several architectures that may free parts of
memmap with FLATMEM/SPARSEMEM and this functionality is gated by
CONFIG_HAVE_ARCH_PFN_VALID.

So if arm64 is to use a generic version, this should be also taken into
account.

> >> Which users are remaining that require us to add/remove memblocks when
> >> hot(un)plugging memory
> >>
> >>  $ git grep KEEP_MEM | grep memory_hotplug
> >> mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> >> mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> >> mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> > 
> > Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
> > will not be able to track MEMBLOCK_NOMAP memory at runtime.
> 
> I'd only like the hot(un)plug parts gone for now, if possible: I don't
> see the need for that handling really that cannot be handled easier, as
> in the proposed pfn_valid() changes.
> 
> I understand that current handling of memory holes in early sections and
> memory marked as MEMBLOCK_NOMAP requires ARCH_KEEP_MEMBLOCK for now.
 
This is mostly about the holes in the memory map. I believe it does not
matter if that memory is NOMAP or not, the holes in the memmap are punched
for anything in memblock.memory.

It seems to me that for arm64's pfn_valid() we can safely replace
memblock_is_map_memory() with memblock_is_memory(), not that it would
change much.

> >> I think one user we would have to handle is
> >> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
> >> does not rely on memblock_is_map_memory.
> > 
> > memblock_is_map_memory() is currently used only on arm/arm64 platforms.
> > Apart from the above example in valid_phys_addr_range(), there are some
> > other memblock_is_map_memory() call sites as well. But then, we are not
> > trying to completely drop memblock_is_map_memory() or are we ?
> 
> No, just change the semantics: only relevant for early sections. Imagine
> freezing MEMBLOCK state after boot.
> 
> Only early sections can have memory holes and might be marked
> MEMBLOCK_NOMAP. For hotplugged memory, we don't have to call
> memblock_is_map_memory().
>
> -- 
> Thanks,
> 
> David / dhildenb

-- 
Sincerely yours,
Mike.


Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-11 Thread David Hildenbrand
On 04.01.21 07:18, Anshuman Khandual wrote:
> 
> On 12/22/20 2:41 PM, David Hildenbrand wrote:
>> On 22.12.20 08:12, Anshuman Khandual wrote:
>>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>>> backing for that pfn. It should always return positive for memory ranges
>>> backed with struct page mapping. But currently pfn_valid() fails for all
>>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>>
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas 
>>> Cc: Will Deacon 
>>> Cc: Ard Biesheuvel 
>>> Cc: Robin Murphy 
>>> Cc: linux-arm-ker...@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>>  arch/arm64/mm/init.c | 12 
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 75addb36354a..ee23bda00c28 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>>  
>>> if (!valid_section(__pfn_to_section(pfn)))
>>> return 0;
>>> +
>>> +   /*
>>> +* ZONE_DEVICE memory does not have the memblock entries.
>>> +* memblock_is_map_memory() check for ZONE_DEVICE based
>>> +* addresses will always fail. Even the normal hotplugged
>>> +* memory will never have MEMBLOCK_NOMAP flag set in their
>>> +* memblock entries. Skip memblock search for all non early
>>> +* memory sections covering all of hotplug memory including
>>> +* both normal and ZONE_DEVIE based.
>>> +*/
>>> +   if (!early_section(__pfn_to_section(pfn)))
>>> +   return 1;
>>
>> Actually, I think we want to check for partial present sections.
>>
>> Maybe we can rather switch to generic pfn_valid() and tweak it to
>> something like
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index fb3bf696c05e..7b1fcce5bd5a 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
>> return 0;
>> /*
>>  * Traditionally early sections always returned pfn_valid() for
>> -* the entire section-sized span.
>> +* the entire section-sized span. Some archs might have holes in
>> +* early sections, so double check with memblock if configured.
>>  */
>> -   return early_section(ms) || pfn_section_valid(ms, pfn);
>> +   if (early_section(ms))
>> +   return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
>> +  memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
>> +   return pfn_section_valid(ms, pfn);
>>  }
>>  #endif
> 
> Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
> create this config which could track platform scenarios where all early
> sections might not have mmap coverage such as arm64 ?

Yes, a new config that states what's actually happening.

> 
>>
>>
>>
>> Which users are remaining that require us to add/remove memblocks when
>> hot(un)plugging memory
>>
>>  $ git grep KEEP_MEM | grep memory_hotplug
>> mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
>> mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
>> mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> 
> Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
> will not be able to track MEMBLOCK_NOMAP memory at runtime.

I'd only like the hot(un)plug parts gone for now, if possible: I don't
see the need for that handling really that cannot be handled easier, as
in the proposed pfn_valid() 

Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-04 Thread Anshuman Khandual


On 1/4/21 9:06 PM, Jonathan Cameron wrote:
> On Tue, 22 Dec 2020 12:42:23 +0530
> Anshuman Khandual  wrote:
> 
>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>> backing for that pfn. It should always return positive for memory ranges
>> backed with struct page mapping. But currently pfn_valid() fails for all
>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> 
> typo: ZONE_DEVIE
> 
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: Ard Biesheuvel 
>> Cc: Robin Murphy 
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  arch/arm64/mm/init.c | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 75addb36354a..ee23bda00c28 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>  
>>  if (!valid_section(__pfn_to_section(pfn)))
>>  return 0;
>> +
>> +/*
>> + * ZONE_DEVICE memory does not have the memblock entries.
>> + * memblock_is_map_memory() check for ZONE_DEVICE based
>> + * addresses will always fail. Even the normal hotplugged
>> + * memory will never have MEMBLOCK_NOMAP flag set in their
>> + * memblock entries. Skip memblock search for all non early
>> + * memory sections covering all of hotplug memory including
>> + * both normal and ZONE_DEVIE based.
> 
> Here as well + the cover letter title.

My bad, will fix all the three instances.


Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-04 Thread Jonathan Cameron
On Tue, 22 Dec 2020 12:42:23 +0530
Anshuman Khandual  wrote:

> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
> 
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged

typo: ZONE_DEVIE

> normal memory too will not have SECTION_IS_EARLY set for their sections.
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Ard Biesheuvel 
> Cc: Robin Murphy 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm64/mm/init.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 75addb36354a..ee23bda00c28 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>  
>   if (!valid_section(__pfn_to_section(pfn)))
>   return 0;
> +
> + /*
> +  * ZONE_DEVICE memory does not have the memblock entries.
> +  * memblock_is_map_memory() check for ZONE_DEVICE based
> +  * addresses will always fail. Even the normal hotplugged
> +  * memory will never have MEMBLOCK_NOMAP flag set in their
> +  * memblock entries. Skip memblock search for all non early
> +  * memory sections covering all of hotplug memory including
> +  * both normal and ZONE_DEVIE based.

Here as well + the cover letter title.

> +  */
> + if (!early_section(__pfn_to_section(pfn)))
> + return 1;
>  #endif
>   return memblock_is_map_memory(addr);
>  }



Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-01-03 Thread Anshuman Khandual


On 12/22/20 2:41 PM, David Hildenbrand wrote:
> On 22.12.20 08:12, Anshuman Khandual wrote:
>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>> backing for that pfn. It should always return positive for memory ranges
>> backed with struct page mapping. But currently pfn_valid() fails for all
>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: Ard Biesheuvel 
>> Cc: Robin Murphy 
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  arch/arm64/mm/init.c | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 75addb36354a..ee23bda00c28 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>  
>>  if (!valid_section(__pfn_to_section(pfn)))
>>  return 0;
>> +
>> +/*
>> + * ZONE_DEVICE memory does not have the memblock entries.
>> + * memblock_is_map_memory() check for ZONE_DEVICE based
>> + * addresses will always fail. Even the normal hotplugged
>> + * memory will never have MEMBLOCK_NOMAP flag set in their
>> + * memblock entries. Skip memblock search for all non early
>> + * memory sections covering all of hotplug memory including
>> + * both normal and ZONE_DEVIE based.
>> + */
>> +if (!early_section(__pfn_to_section(pfn)))
>> +return 1;
> 
> Actually, I think we want to check for partial present sections.
> 
> Maybe we can rather switch to generic pfn_valid() and tweak it to
> something like
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fb3bf696c05e..7b1fcce5bd5a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
> return 0;
> /*
>  * Traditionally early sections always returned pfn_valid() for
> -* the entire section-sized span.
> +* the entire section-sized span. Some archs might have holes in
> +* early sections, so double check with memblock if configured.
>  */
> -   return early_section(ms) || pfn_section_valid(ms, pfn);
> +   if (early_section(ms))
> +   return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
> +  memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> +   return pfn_section_valid(ms, pfn);
>  }
>  #endif

Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
create this config which could track platform scenarios where all early
sections might not have mmap coverage such as arm64 ?

> 
> 
> 
> Which users are remaining that require us to add/remove memblocks when
> hot(un)plugging memory
> 
>  $ git grep KEEP_MEM | grep memory_hotplug
> mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {

Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
will not be able to track MEMBLOCK_NOMAP memory at runtime.

> 
> 
> I think one user we would have to handle is
> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
> does not rely on memblock_is_map_memory.

memblock_is_map_memory() is currently used only on arm/arm64 platforms.
Apart from the above example in valid_phys_addr_range(), there are some
other memblock_is_map_memory() call sites as well. But 

Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2020-12-22 Thread David Hildenbrand
On 22.12.20 08:12, Anshuman Khandual wrote:
> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
> 
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Ard Biesheuvel 
> Cc: Robin Murphy 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm64/mm/init.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 75addb36354a..ee23bda00c28 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>  
>   if (!valid_section(__pfn_to_section(pfn)))
>   return 0;
> +
> + /*
> +  * ZONE_DEVICE memory does not have the memblock entries.
> +  * memblock_is_map_memory() check for ZONE_DEVICE based
> +  * addresses will always fail. Even the normal hotplugged
> +  * memory will never have MEMBLOCK_NOMAP flag set in their
> +  * memblock entries. Skip memblock search for all non early
> +  * memory sections covering all of hotplug memory including
> +  * both normal and ZONE_DEVIE based.
> +  */
> + if (!early_section(__pfn_to_section(pfn)))
> + return 1;

Actually, I think we want to check for partial present sections.

Maybe we can rather switch to generic pfn_valid() and tweak it to
something like

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..7b1fcce5bd5a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
return 0;
/*
 * Traditionally early sections always returned pfn_valid() for
-* the entire section-sized span.
+* the entire section-sized span. Some archs might have holes in
+* early sections, so double check with memblock if configured.
 */
-   return early_section(ms) || pfn_section_valid(ms, pfn);
+   if (early_section(ms))
+   return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
+  memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
+   return pfn_section_valid(ms, pfn);
 }
 #endif



Which users are remaining that require us to add/remove memblocks when
hot(un)plugging memory

 $ git grep KEEP_MEM | grep memory_hotplug
mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c:if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {


I think one user we would have to handle is
arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
does not rely on memblock_is_map_memory.

-- 
Thanks,

David / dhildenb