On Wed, Jul 08, 2020 at 09:16:01AM +0200, David Hildenbrand wrote: > On 08.07.20 09:04, Dan Williams wrote: > > On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <da...@redhat.com> wrote: > >> > >> On 08.07.20 08:22, Mike Rapoport wrote: > >>> On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: > >>>> On Tue, Jul 7, 2020 at 9:08 PM Justin He <justin...@arm.com> wrote: > >>>> [..] > >>>>>> Especially for architectures that use memblock info for numa info > >>>>>> (which seems to be everyone except x86) why not implement a generic > >>>>>> memory_add_physaddr_to_nid() that does: > >>>>>> > >>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>> { > >>>>>> unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > >>>>>> int nid; > >>>>>> > >>>>>> for_each_online_node(nid) { > >>>>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > >>>>>> if (pfn >= start_pfn && pfn <= end_pfn) > >>>>>> return nid; > >>>>>> } > >>>>>> return NUMA_NO_NODE; > >>>>>> } > >>>>> > >>>>> Thanks for your suggestion, > >>>>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > >>>>> phys_to_target_node()? > >>>> > >>>> I think it needs to be the reverse. phys_to_target_node() should call > >>>> memory_add_physaddr_to_nid() by default, but fall back to searching > >>>> reserved memory address ranges in memblock. See phys_to_target_node() > >>>> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, > >>>> but the principle is the same i.e. that a target node may not be > >>>> represented in memblock.memory, but memblock.reserved. I'm working on > >>>> a patch to provide a function similar to get_pfn_range_for_nid() that > >>>> operates on reserved memory. > >>> > >>> Do we really need yet another memblock iterator? > >>> I think only x86 has memory that is not in memblock.memory but only in > >>> memblock.reserved. > >> > >> Reading about abusing the memblock allcoator once again in memory > >> hotplug paths makes me shiver. > > > > Technical reasoning please? > > ARCH_KEEP_MEMBLOCK is (AFAIK) only a hack for arm64 to implement > pfn_valid(), because they zap out individual pages corresponding to > memory holes of full sections. > > I am not a friend of adding more post-init code to rely on memblock > data. It just makes it harder to eventually get rid of ARCH_KEEP_MEMBLOCK.
The most heavy user of memblock in post-init code is powerpc. It won't be easy to get rid of it there. > > arm64 numa information is established from memblock data. It seems > > counterproductive to ignore that fact if we're already touching > > memory_add_physaddr_to_nid() and have a use case for a driver to call > > it. > > ... and we are trying to handle the "only a single dummy node" case > (patch #2), or what am I missing? What is there to optimize currently? > > -- > Thanks, > > David / dhildenb > -- Sincerely yours, Mike.