On Tue, Feb 02, 2021 at 01:39:29PM +0100, David Hildenbrand wrote:
> On 02.02.21 13:35, Will Deacon wrote:
> > On Tue, Feb 02, 2021 at 12:32:15PM +0000, Will Deacon wrote:
> > > On Tue, Feb 02, 2021 at 09:41:53AM +0530, Anshuman Khandual wrote:
> > > > pfn_valid() validates a pfn but basically it checks for a valid struct 
> > > > page
> > > > backing for that pfn. It should always return positive for memory ranges
> > > > backed with struct page mapping. But currently pfn_valid() fails for all
> > > > ZONE_DEVICE based memory types even though they have struct page 
> > > > mapping.
> > > > 
> > > > pfn_valid() asserts that there is a memblock entry for a given pfn 
> > > > without
> > > > MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based 
> > > > memory is
> > > > that they do not have memblock entries. Hence memblock_is_map_memory() 
> > > > will
> > > > invariably fail via memblock_search() for a ZONE_DEVICE based address. 
> > > > This
> > > > eventually fails pfn_valid() which is wrong. memblock_is_map_memory() 
> > > > needs
> > > > to be skipped for such memory ranges. As ZONE_DEVICE memory gets 
> > > > hotplugged
> > > > into the system via memremap_pages() called from a driver, their 
> > > > respective
> > > > memory sections will not have SECTION_IS_EARLY set.
> > > > 
> > > > Normal hotplug memory will never have MEMBLOCK_NOMAP set in their 
> > > > memblock
> > > > regions. Because the flag MEMBLOCK_NOMAP was specifically designed and 
> > > > set
> > > > for firmware reserved memory regions. memblock_is_map_memory() can just 
> > > > be
> > > > skipped as its always going to be positive and that will be an 
> > > > optimization
> > > > for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal
> > > > hotplugged memory too will not have SECTION_IS_EARLY set for their 
> > > > sections
> > > > 
> > > > Skipping memblock_is_map_memory() for all non early memory sections 
> > > > would
> > > > fix pfn_valid() problem for ZONE_DEVICE based memory and also improve 
> > > > its
> > > > performance for normal hotplug memory as well.
> > > 
> > > Hmm. Although I follow your logic, this does seem to rely on an awful lot 
> > > of
> > > assumptions to continue to hold true as the kernel evolves. In particular,
> > > how do we ensure that early sections are always fully backed with
> > 
> > Sorry, typo here:       ^^^ should be *non-early* sections.
> 
> It might be a good idea to have a look at generic
> include/linux/mmzone.h:pfn_valid()

The generic implementation already makes assumptions that aren't true on
arm64, so that's why we've ended up with our own implementation. But the
patches here put us in a position where I worry that pfn_valid() may return
'true' in future for cases where the underlying struct page is either
non-existent or bogus, and debugging those failures really sucks. We had a
raft of those back when NOMAP was introduced and I don't want to re-live
that experience.

> As I expressed already, long term we should really get rid of the arm64
> variant and rather special-case the generic one. Then we won't go out of
> sync - just as it happened with ZONE_DEVICE handling here.

Why does this have to be long term? This ZONE_DEVICE stuff could be the
carrot on the stick :)

Will

Reply via email to