On Wed, 18 Dec 2019 18:08:04 +0100 David Hildenbrand <da...@redhat.com> wrote:
> On 01.12.19 00:21, Andrew Morton wrote: > > On Sun, 27 Oct 2019 23:45:52 +0100 David Hildenbrand <da...@redhat.com> > > wrote: > > > >> I think I just found an issue with try_offline_node(). > >> try_offline_node() is pretty much broken already (touches garbage > >> memmaps and will not considers mixed NIDs within sections), however, > >> relies on the node span to look for memory sections to probe. So it > >> seems to rely on the nodes getting shrunk when removing memory, not when > >> offlining. > >> > >> As we shrink the node span when offlining now and not when removing, > >> this can go wrong once we offline the last memory block of the node and > >> offline the last CPU. We could still have memory around that we could > >> re-online, however, the node would already be offline. Unlikely, but > >> possible. > >> > >> Note that the same is also broken without this patch in case memory is > >> never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the > >> garbage memmap, resulting in no memory being detected as belonging to > >> the node. Also, resize_pgdat_range() is called when onlining memory, not > >> when adding it. :/ Oh this is so broken :) > >> > >> The right fix is probably to walk over all memory blocks that could > >> exist and test if they belong to the nid (if offline, check the > >> block->nid, if online check all pageblocks). A fix we can then move in > >> front of this patch. > >> > >> Will look into this this week. > > > > And this series shows almost no sign of having been reviewed. I'll hold > > it over for 5.6. > > > > Hi Andrew, any chance we can get the (now at least reviewed - thx Oscar) > fix in patch #5 into 5.5? (I want to do the final stable backports for > the uninitialized memmap stuff) Sure, I queued it for the next batch of 5.5 fixes.