Re: [PATCH] mm, page_alloc: actually ignore mempolicies for high priority allocations
On Wed 15-08-18 15:16:52, Andrew Morton wrote: [...] > From: Vlastimil Babka > Subject: mm, page_alloc: actually ignore mempolicies for high priority > allocations > > The __alloc_pages_slowpath() function has for a long time contained code > to ignore node restrictions from memory policies for high priority > allocations. The current code that resets the zonelist iterator however > does effectively nothing after commit 7810e6781e0f ("mm, page_alloc: do > not break __GFP_THISNODE by zonelist reset") removed a buggy zonelist > reset. Even before that commit, mempolicy restrictions were still not > ignored, as they are passed in ac->nodemask which is untouched by the > code. > > We can either remove the code, or make it work as intended. Since > ac->nodemask can be set from task's mempolicy via alloc_pages_current() > and thus also alloc_pages(), it may indeed affect kernel allocations, and > it makes sense to ignore it to allow progress for high priority > allocations. > > Thus, this patch resets ac->nodemask to NULL in such cases. This assumes > all callers can handle it (i.e. there are no guarantees as in the case of > __GFP_THISNODE) which seems to be the case. The same assumption is > already present in check_retry_cpuset() for some time. > > The expected effect is that high priority kernel allocations in the > context of userspace tasks (e.g. OOM victims) restricted by mempolicies > will have higher chance to succeed if they are restricted to nodes with > depleted memory, while there are other nodes with free memory left. > > > Ot's not a new intention, but for the first time the code will match the > intention, AFAICS. It was intended by commit 183f6371aac2 ("mm: ignore > mempolicies when using ALLOC_NO_WATERMARK") in v3.6 but I think it never > really worked, as mempolicy restriction was already encoded in nodemask, > not zonelist, at that time. > > So originally that was for ALLOC_NO_WATERMARK only. Then it was adjusted > by e46e7b77c909 ("mm, page_alloc: recalculate the preferred zoneref if the > context can ignore memory policies") and cd04ae1e2dc8 ("mm, oom: do not > rely on TIF_MEMDIE for memory reserves access") to the current state. So > even GFP_ATOMIC would now ignore mempolicies after the initial attempts > fail - if the code worked as people thought it does. > > Link: http://lkml.kernel.org/r/20180612122624.8045-1-vba...@suse.cz > Signed-off-by: Vlastimil Babka > Cc: Mel Gorman > Cc: Michal Hocko > Cc: David Rientjes > Cc: Joonsoo Kim > Signed-off-by: Andrew Morton The code is quite subtle and we have a bad history of copying stuff without rethinking whether the code still is needed. Which is sad and a clear sign that the code is too complex. I cannot say this change doesn't have any subtle side effects but it makes the intention clear at least so I _think_ it is good to go. If we find some unintended side effects we should simply rethink the whole reset zonelist thing. That being said Acked-by: Michal Hocko > --- > > mm/page_alloc.c |7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > --- > a/mm/page_alloc.c~mm-page_alloc-actually-ignore-mempolicies-for-high-priority-allocations > +++ a/mm/page_alloc.c > @@ -4165,11 +4165,12 @@ retry: > alloc_flags = reserve_flags; > > /* > - * Reset the zonelist iterators if memory policies can be ignored. > - * These allocations are high priority and system rather than user > - * orientated. > + * Reset the nodemask and zonelist iterators if memory policies can be > + * ignored. These allocations are high priority and system rather than > + * user oriented. >*/ > if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { > + ac->nodemask = NULL; > ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, > ac->high_zoneidx, ac->nodemask); > } > _ > -- Michal Hocko SUSE Labs
Re: [PATCH] mm, page_alloc: actually ignore mempolicies for high priority allocations
On Wed, Aug 15, 2018 at 03:16:52PM -0700, Andrew Morton wrote: > From: Vlastimil Babka > Subject: mm, page_alloc: actually ignore mempolicies for high priority > allocations > > The __alloc_pages_slowpath() function has for a long time contained code > to ignore node restrictions from memory policies for high priority > allocations. The current code that resets the zonelist iterator however > does effectively nothing after commit 7810e6781e0f ("mm, page_alloc: do > not break __GFP_THISNODE by zonelist reset") removed a buggy zonelist > reset. Even before that commit, mempolicy restrictions were still not > ignored, as they are passed in ac->nodemask which is untouched by the > code. > > We can either remove the code, or make it work as intended. Since > ac->nodemask can be set from task's mempolicy via alloc_pages_current() > and thus also alloc_pages(), it may indeed affect kernel allocations, and > it makes sense to ignore it to allow progress for high priority > allocations. > > Thus, this patch resets ac->nodemask to NULL in such cases. This assumes > all callers can handle it (i.e. there are no guarantees as in the case of > __GFP_THISNODE) which seems to be the case. The same assumption is > already present in check_retry_cpuset() for some time. > > The expected effect is that high priority kernel allocations in the > context of userspace tasks (e.g. OOM victims) restricted by mempolicies > will have higher chance to succeed if they are restricted to nodes with > depleted memory, while there are other nodes with free memory left. > > > Ot's not a new intention, but for the first time the code will match the > intention, AFAICS. It was intended by commit 183f6371aac2 ("mm: ignore > mempolicies when using ALLOC_NO_WATERMARK") in v3.6 but I think it never > really worked, as mempolicy restriction was already encoded in nodemask, > not zonelist, at that time. > > So originally that was for ALLOC_NO_WATERMARK only. Then it was adjusted > by e46e7b77c909 ("mm, page_alloc: recalculate the preferred zoneref if the > context can ignore memory policies") and cd04ae1e2dc8 ("mm, oom: do not > rely on TIF_MEMDIE for memory reserves access") to the current state. So > even GFP_ATOMIC would now ignore mempolicies after the initial attempts > fail - if the code worked as people thought it does. > > Link: http://lkml.kernel.org/r/20180612122624.8045-1-vba...@suse.cz > Signed-off-by: Vlastimil Babka > Cc: Mel Gorman > Cc: Michal Hocko > Cc: David Rientjes > Cc: Joonsoo Kim > Signed-off-by: Andrew Morton FWIW, I thought I acked this already. Acked-by: Mel Gorman -- Mel Gorman SUSE Labs
Re: [PATCH] mm, page_alloc: actually ignore mempolicies for high priority allocations
On Tue, 12 Jun 2018 14:26:24 +0200 Vlastimil Babka wrote: > The __alloc_pages_slowpath() function has for a long time contained code to > ignore node restrictions from memory policies for high priority allocations. > The current code that resets the zonelist iterator however does effectively > nothing after commit 7810e6781e0f ("mm, page_alloc: do not break > __GFP_THISNODE > by zonelist reset") removed a buggy zonelist reset. Even before that commit, > mempolicy restrictions were still not ignored, as they are passed in > ac->nodemask which is untouched by the code. > > We can either remove the code, or make it work as intended. Since > ac->nodemask can be set from task's mempolicy via alloc_pages_current() and > thus also alloc_pages(), it may indeed affect kernel allocations, and it makes > sense to ignore it to allow progress for high priority allocations. > > Thus, this patch resets ac->nodemask to NULL in such cases. This assumes all > callers can handle it (i.e. there are no guarantees as in the case of > __GFP_THISNODE) which seems to be the case. The same assumption is already > present in check_retry_cpuset() for some time. > > The expected effect is that high priority kernel allocations in the context of > userspace tasks (e.g. OOM victims) restricted by mempolicies will have higher > chance to succeed if they are restricted to nodes with depleted memory, while > there are other nodes with free memory left. We don't have any reviews or acks on ths one, perhaps because linux-mm wasn't cc'ed. Could people please take a look? From: Vlastimil Babka Subject: mm, page_alloc: actually ignore mempolicies for high priority allocations The __alloc_pages_slowpath() function has for a long time contained code to ignore node restrictions from memory policies for high priority allocations. The current code that resets the zonelist iterator however does effectively nothing after commit 7810e6781e0f ("mm, page_alloc: do not break __GFP_THISNODE by zonelist reset") removed a buggy zonelist reset. Even before that commit, mempolicy restrictions were still not ignored, as they are passed in ac->nodemask which is untouched by the code. We can either remove the code, or make it work as intended. Since ac->nodemask can be set from task's mempolicy via alloc_pages_current() and thus also alloc_pages(), it may indeed affect kernel allocations, and it makes sense to ignore it to allow progress for high priority allocations. Thus, this patch resets ac->nodemask to NULL in such cases. This assumes all callers can handle it (i.e. there are no guarantees as in the case of __GFP_THISNODE) which seems to be the case. The same assumption is already present in check_retry_cpuset() for some time. The expected effect is that high priority kernel allocations in the context of userspace tasks (e.g. OOM victims) restricted by mempolicies will have higher chance to succeed if they are restricted to nodes with depleted memory, while there are other nodes with free memory left. Ot's not a new intention, but for the first time the code will match the intention, AFAICS. It was intended by commit 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") in v3.6 but I think it never really worked, as mempolicy restriction was already encoded in nodemask, not zonelist, at that time. So originally that was for ALLOC_NO_WATERMARK only. Then it was adjusted by e46e7b77c909 ("mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies") and cd04ae1e2dc8 ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access") to the current state. So even GFP_ATOMIC would now ignore mempolicies after the initial attempts fail - if the code worked as people thought it does. Link: http://lkml.kernel.org/r/20180612122624.8045-1-vba...@suse.cz Signed-off-by: Vlastimil Babka Cc: Mel Gorman Cc: Michal Hocko Cc: David Rientjes Cc: Joonsoo Kim Signed-off-by: Andrew Morton --- mm/page_alloc.c |7 --- 1 file changed, 4 insertions(+), 3 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-actually-ignore-mempolicies-for-high-priority-allocations +++ a/mm/page_alloc.c @@ -4165,11 +4165,12 @@ retry: alloc_flags = reserve_flags; /* -* Reset the zonelist iterators if memory policies can be ignored. -* These allocations are high priority and system rather than user -* orientated. +* Reset the nodemask and zonelist iterators if memory policies can be +* ignored. These allocations are high priority and system rather than +* user oriented. */ if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { + ac->nodemask = NULL; ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->high_zoneidx, ac->nodemask); } _
Re: [PATCH] mm, page_alloc: actually ignore mempolicies for high priority allocations
On 06/13/2018 09:42 PM, David Rientjes wrote: > On Tue, 12 Jun 2018, Vlastimil Babka wrote: > >> The __alloc_pages_slowpath() function has for a long time contained code to >> ignore node restrictions from memory policies for high priority allocations. >> The current code that resets the zonelist iterator however does effectively >> nothing after commit 7810e6781e0f ("mm, page_alloc: do not break >> __GFP_THISNODE >> by zonelist reset") removed a buggy zonelist reset. Even before that commit, >> mempolicy restrictions were still not ignored, as they are passed in >> ac->nodemask which is untouched by the code. >> >> We can either remove the code, or make it work as intended. Since >> ac->nodemask can be set from task's mempolicy via alloc_pages_current() and >> thus also alloc_pages(), it may indeed affect kernel allocations, and it >> makes >> sense to ignore it to allow progress for high priority allocations. >> >> Thus, this patch resets ac->nodemask to NULL in such cases. This assumes all >> callers can handle it (i.e. there are no guarantees as in the case of >> __GFP_THISNODE) which seems to be the case. The same assumption is already >> present in check_retry_cpuset() for some time. >> >> The expected effect is that high priority kernel allocations in the context >> of >> userspace tasks (e.g. OOM victims) restricted by mempolicies will have higher >> chance to succeed if they are restricted to nodes with depleted memory, while >> there are other nodes with free memory left. >> > > Hi Vlastimil, > > Is this expected as a change back to previous behavior that we have lost > or is this new development for high priority allocations? I don't think > we have ignored mempolicies for things like GFP_ATOMIC allocations in the > past. Well, it's not a new intention, but for the first time the code will match the intention, AFAICS. It was intended by commit 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") in v3.6 but I think it never really worked, as mempolicy restriction was already encoded in nodemask, not zonelist, at that time. So originally that was for ALLOC_NO_WATERMARK only. Then it was adjusted by e46e7b77c909 ("mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies") and cd04ae1e2dc8 ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access") to the current state. So yeah even GFP_ATOMIC would now ignore mempolicies after the initial attempts fail... if the code worked as people thought it does. >> Signed-off-by: Vlastimil Babka >> Cc: Mel Gorman >> Cc: Michal Hocko >> Cc: David Rientjes >> Cc: Joonsoo Kim >> --- >> mm/page_alloc.c | 7 --- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 07b3c23762ad..ec8c92ff8b3c 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -4164,11 +4164,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int >> order, >> alloc_flags = reserve_flags; >> >> /* >> - * Reset the zonelist iterators if memory policies can be ignored. >> - * These allocations are high priority and system rather than user >> - * orientated. >> + * Reset the nodemask and zonelist iterators if memory policies can be >> + * ignored. These allocations are high priority and system rather than >> + * user oriented. >> */ >> if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { >> +ac->nodemask = NULL; >> ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, >> ac->high_zoneidx, ac->nodemask); >> }
Re: [PATCH] mm, page_alloc: actually ignore mempolicies for high priority allocations
On Tue, 12 Jun 2018, Vlastimil Babka wrote: > The __alloc_pages_slowpath() function has for a long time contained code to > ignore node restrictions from memory policies for high priority allocations. > The current code that resets the zonelist iterator however does effectively > nothing after commit 7810e6781e0f ("mm, page_alloc: do not break > __GFP_THISNODE > by zonelist reset") removed a buggy zonelist reset. Even before that commit, > mempolicy restrictions were still not ignored, as they are passed in > ac->nodemask which is untouched by the code. > > We can either remove the code, or make it work as intended. Since > ac->nodemask can be set from task's mempolicy via alloc_pages_current() and > thus also alloc_pages(), it may indeed affect kernel allocations, and it makes > sense to ignore it to allow progress for high priority allocations. > > Thus, this patch resets ac->nodemask to NULL in such cases. This assumes all > callers can handle it (i.e. there are no guarantees as in the case of > __GFP_THISNODE) which seems to be the case. The same assumption is already > present in check_retry_cpuset() for some time. > > The expected effect is that high priority kernel allocations in the context of > userspace tasks (e.g. OOM victims) restricted by mempolicies will have higher > chance to succeed if they are restricted to nodes with depleted memory, while > there are other nodes with free memory left. > Hi Vlastimil, Is this expected as a change back to previous behavior that we have lost or is this new development for high priority allocations? I don't think we have ignored mempolicies for things like GFP_ATOMIC allocations in the past. > Signed-off-by: Vlastimil Babka > Cc: Mel Gorman > Cc: Michal Hocko > Cc: David Rientjes > Cc: Joonsoo Kim > --- > mm/page_alloc.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 07b3c23762ad..ec8c92ff8b3c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4164,11 +4164,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int > order, > alloc_flags = reserve_flags; > > /* > - * Reset the zonelist iterators if memory policies can be ignored. > - * These allocations are high priority and system rather than user > - * orientated. > + * Reset the nodemask and zonelist iterators if memory policies can be > + * ignored. These allocations are high priority and system rather than > + * user oriented. >*/ > if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { > + ac->nodemask = NULL; > ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, > ac->high_zoneidx, ac->nodemask); > }
[PATCH] mm, page_alloc: actually ignore mempolicies for high priority allocations
The __alloc_pages_slowpath() function has for a long time contained code to ignore node restrictions from memory policies for high priority allocations. The current code that resets the zonelist iterator however does effectively nothing after commit 7810e6781e0f ("mm, page_alloc: do not break __GFP_THISNODE by zonelist reset") removed a buggy zonelist reset. Even before that commit, mempolicy restrictions were still not ignored, as they are passed in ac->nodemask which is untouched by the code. We can either remove the code, or make it work as intended. Since ac->nodemask can be set from task's mempolicy via alloc_pages_current() and thus also alloc_pages(), it may indeed affect kernel allocations, and it makes sense to ignore it to allow progress for high priority allocations. Thus, this patch resets ac->nodemask to NULL in such cases. This assumes all callers can handle it (i.e. there are no guarantees as in the case of __GFP_THISNODE) which seems to be the case. The same assumption is already present in check_retry_cpuset() for some time. The expected effect is that high priority kernel allocations in the context of userspace tasks (e.g. OOM victims) restricted by mempolicies will have higher chance to succeed if they are restricted to nodes with depleted memory, while there are other nodes with free memory left. Signed-off-by: Vlastimil Babka Cc: Mel Gorman Cc: Michal Hocko Cc: David Rientjes Cc: Joonsoo Kim --- mm/page_alloc.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 07b3c23762ad..ec8c92ff8b3c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4164,11 +4164,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, alloc_flags = reserve_flags; /* -* Reset the zonelist iterators if memory policies can be ignored. -* These allocations are high priority and system rather than user -* orientated. +* Reset the nodemask and zonelist iterators if memory policies can be +* ignored. These allocations are high priority and system rather than +* user oriented. */ if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { + ac->nodemask = NULL; ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->high_zoneidx, ac->nodemask); } -- 2.17.1