On Wed, 18 Dec 2019 18:08:04 +0100 David Hildenbrand <da...@redhat.com> wrote:

> On 01.12.19 00:21, Andrew Morton wrote:
> > On Sun, 27 Oct 2019 23:45:52 +0100 David Hildenbrand <da...@redhat.com> 
> > wrote:
> > 
> >> I think I just found an issue with try_offline_node(). 
> >> try_offline_node() is pretty much broken already (touches garbage 
> >> memmaps and will not considers mixed NIDs within sections), however, 
> >> relies on the node span to look for memory sections to probe. So it 
> >> seems to rely on the nodes getting shrunk when removing memory, not when 
> >> offlining.
> >>
> >> As we shrink the node span when offlining now and not when removing, 
> >> this can go wrong once we offline the last memory block of the node and 
> >> offline the last CPU. We could still have memory around that we could 
> >> re-online, however, the node would already be offline. Unlikely, but 
> >> possible.
> >>
> >> Note that the same is also broken without this patch in case memory is 
> >> never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the 
> >> garbage memmap, resulting in  no memory being detected as belonging to 
> >> the node. Also, resize_pgdat_range() is called when onlining memory, not 
> >> when adding it. :/ Oh this is so broken :)
> >>
> >> The right fix is probably to walk over all memory blocks that could 
> >> exist and test if they belong to the nid (if offline, check the 
> >> block->nid, if online check all pageblocks). A fix we can then move in 
> >> front of this patch.
> >>
> >> Will look into this this week.
> > 
> > And this series shows almost no sign of having been reviewed.  I'll hold
> > it over for 5.6.
> > 
> 
> Hi Andrew, any chance we can get the (now at least reviewed - thx Oscar)
> fix in patch #5 into 5.5? (I want to do the final stable backports for
> the uninitialized memmap stuff)

Sure, I queued it for the next batch of 5.5 fixes.

Reply via email to