On 06/13/2018 09:42 PM, David Rientjes wrote: > On Tue, 12 Jun 2018, Vlastimil Babka wrote: > >> The __alloc_pages_slowpath() function has for a long time contained code to >> ignore node restrictions from memory policies for high priority allocations. >> The current code that resets the zonelist iterator however does effectively >> nothing after commit 7810e6781e0f ("mm, page_alloc: do not break >> __GFP_THISNODE >> by zonelist reset") removed a buggy zonelist reset. Even before that commit, >> mempolicy restrictions were still not ignored, as they are passed in >> ac->nodemask which is untouched by the code. >> >> We can either remove the code, or make it work as intended. Since >> ac->nodemask can be set from task's mempolicy via alloc_pages_current() and >> thus also alloc_pages(), it may indeed affect kernel allocations, and it >> makes >> sense to ignore it to allow progress for high priority allocations. >> >> Thus, this patch resets ac->nodemask to NULL in such cases. This assumes all >> callers can handle it (i.e. there are no guarantees as in the case of >> __GFP_THISNODE) which seems to be the case. The same assumption is already >> present in check_retry_cpuset() for some time. >> >> The expected effect is that high priority kernel allocations in the context >> of >> userspace tasks (e.g. OOM victims) restricted by mempolicies will have higher >> chance to succeed if they are restricted to nodes with depleted memory, while >> there are other nodes with free memory left. >> > > Hi Vlastimil, > > Is this expected as a change back to previous behavior that we have lost > or is this new development for high priority allocations? I don't think > we have ignored mempolicies for things like GFP_ATOMIC allocations in the > past.
Well, it's not a new intention, but for the first time the code will match the intention, AFAICS. It was intended by commit 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") in v3.6 but I think it never really worked, as mempolicy restriction was already encoded in nodemask, not zonelist, at that time. So originally that was for ALLOC_NO_WATERMARK only. Then it was adjusted by e46e7b77c909 ("mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies") and cd04ae1e2dc8 ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access") to the current state. So yeah even GFP_ATOMIC would now ignore mempolicies after the initial attempts fail... if the code worked as people thought it does. >> Signed-off-by: Vlastimil Babka <vba...@suse.cz> >> Cc: Mel Gorman <mgor...@techsingularity.net> >> Cc: Michal Hocko <mho...@kernel.org> >> Cc: David Rientjes <rient...@google.com> >> Cc: Joonsoo Kim <iamjoonsoo....@lge.com> >> --- >> mm/page_alloc.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 07b3c23762ad..ec8c92ff8b3c 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -4164,11 +4164,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int >> order, >> alloc_flags = reserve_flags; >> >> /* >> - * Reset the zonelist iterators if memory policies can be ignored. >> - * These allocations are high priority and system rather than user >> - * orientated. >> + * Reset the nodemask and zonelist iterators if memory policies can be >> + * ignored. These allocations are high priority and system rather than >> + * user oriented. >> */ >> if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { >> + ac->nodemask = NULL; >> ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, >> ac->high_zoneidx, ac->nodemask); >> }