Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Mon 19-11-18 14:05:34, David Rientjes wrote: > On Thu, 15 Nov 2018, Michal Hocko wrote: > > > > The userspace had a single way to determine if thp had been disabled for > > > a > > > specific vma and that was broken with your commit. We have since fixed > > > it. Modifying our software stack to start looking for some field > > > somewhere else will not help anybody else that this has affected or will > > > affect. I'm interested in not breaking userspace, not trying a wait and > > > see approach to see if anybody else complains once we start looking for > > > some other field. The risk outweighs the reward, it already broke us, > > > and > > > I'd prefer not to even open the possibility of breaking anybody else. > > > > I very much agree on "do not break userspace" part but this is kind of > > gray area. VMA flags are a deep internal implementation detail and > > nobody should really depend on it for anything important. The original > > motivation for introducing it was CRIU where it is kind of > > understandable. I would argue they should find a different way but it is > > just too late for them. > > > > For this particular case there was no other bug report except for yours > > and if it is possible to fix it on your end then I would really love to > > make the a sensible user interface to query the status. If we are going > > to change the semantic of the exported flag again then we risk yet > > another breakage. > > > > Therefore I am asking whether changing your particular usecase to a new > > interface is possible because that would allow to have a longerm > > sensible user interface rather than another kludge which still doesn't > > cover all the usecases (e.g. there is no way to reliably query the > > madvise status after your patch). > > > > Providing another interface is great, I have no objection other than > emitting another line for every vma on the system for smaps is probably > overkill for something as rare as PR_SET_THP_DISABLE. Let me think about a full patch and see how it looks like. > > That said, I think the current handling of the "nh" flag being emitted in > smaps is logical and ensures no further userspace breakage. I have already expressed a concern that there is no way to query for MADV_NOHUGEPAGE if we overload the flag. So this is not a riskfree option. > If that is to > be removed, I consider it an unnecessary risk. That would raised in code > review. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Mon 19-11-18 14:05:34, David Rientjes wrote: > On Thu, 15 Nov 2018, Michal Hocko wrote: > > > > The userspace had a single way to determine if thp had been disabled for > > > a > > > specific vma and that was broken with your commit. We have since fixed > > > it. Modifying our software stack to start looking for some field > > > somewhere else will not help anybody else that this has affected or will > > > affect. I'm interested in not breaking userspace, not trying a wait and > > > see approach to see if anybody else complains once we start looking for > > > some other field. The risk outweighs the reward, it already broke us, > > > and > > > I'd prefer not to even open the possibility of breaking anybody else. > > > > I very much agree on "do not break userspace" part but this is kind of > > gray area. VMA flags are a deep internal implementation detail and > > nobody should really depend on it for anything important. The original > > motivation for introducing it was CRIU where it is kind of > > understandable. I would argue they should find a different way but it is > > just too late for them. > > > > For this particular case there was no other bug report except for yours > > and if it is possible to fix it on your end then I would really love to > > make the a sensible user interface to query the status. If we are going > > to change the semantic of the exported flag again then we risk yet > > another breakage. > > > > Therefore I am asking whether changing your particular usecase to a new > > interface is possible because that would allow to have a longerm > > sensible user interface rather than another kludge which still doesn't > > cover all the usecases (e.g. there is no way to reliably query the > > madvise status after your patch). > > > > Providing another interface is great, I have no objection other than > emitting another line for every vma on the system for smaps is probably > overkill for something as rare as PR_SET_THP_DISABLE. Let me think about a full patch and see how it looks like. > > That said, I think the current handling of the "nh" flag being emitted in > smaps is logical and ensures no further userspace breakage. I have already expressed a concern that there is no way to query for MADV_NOHUGEPAGE if we overload the flag. So this is not a riskfree option. > If that is to > be removed, I consider it an unnecessary risk. That would raised in code > review. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu, 15 Nov 2018, Michal Hocko wrote: > > The userspace had a single way to determine if thp had been disabled for a > > specific vma and that was broken with your commit. We have since fixed > > it. Modifying our software stack to start looking for some field > > somewhere else will not help anybody else that this has affected or will > > affect. I'm interested in not breaking userspace, not trying a wait and > > see approach to see if anybody else complains once we start looking for > > some other field. The risk outweighs the reward, it already broke us, and > > I'd prefer not to even open the possibility of breaking anybody else. > > I very much agree on "do not break userspace" part but this is kind of > gray area. VMA flags are a deep internal implementation detail and > nobody should really depend on it for anything important. The original > motivation for introducing it was CRIU where it is kind of > understandable. I would argue they should find a different way but it is > just too late for them. > > For this particular case there was no other bug report except for yours > and if it is possible to fix it on your end then I would really love to > make the a sensible user interface to query the status. If we are going > to change the semantic of the exported flag again then we risk yet > another breakage. > > Therefore I am asking whether changing your particular usecase to a new > interface is possible because that would allow to have a longerm > sensible user interface rather than another kludge which still doesn't > cover all the usecases (e.g. there is no way to reliably query the > madvise status after your patch). > Providing another interface is great, I have no objection other than emitting another line for every vma on the system for smaps is probably overkill for something as rare as PR_SET_THP_DISABLE. That said, I think the current handling of the "nh" flag being emitted in smaps is logical and ensures no further userspace breakage. If that is to be removed, I consider it an unnecessary risk. That would raised in code review.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu, 15 Nov 2018, Michal Hocko wrote: > > The userspace had a single way to determine if thp had been disabled for a > > specific vma and that was broken with your commit. We have since fixed > > it. Modifying our software stack to start looking for some field > > somewhere else will not help anybody else that this has affected or will > > affect. I'm interested in not breaking userspace, not trying a wait and > > see approach to see if anybody else complains once we start looking for > > some other field. The risk outweighs the reward, it already broke us, and > > I'd prefer not to even open the possibility of breaking anybody else. > > I very much agree on "do not break userspace" part but this is kind of > gray area. VMA flags are a deep internal implementation detail and > nobody should really depend on it for anything important. The original > motivation for introducing it was CRIU where it is kind of > understandable. I would argue they should find a different way but it is > just too late for them. > > For this particular case there was no other bug report except for yours > and if it is possible to fix it on your end then I would really love to > make the a sensible user interface to query the status. If we are going > to change the semantic of the exported flag again then we risk yet > another breakage. > > Therefore I am asking whether changing your particular usecase to a new > interface is possible because that would allow to have a longerm > sensible user interface rather than another kludge which still doesn't > cover all the usecases (e.g. there is no way to reliably query the > madvise status after your patch). > Providing another interface is great, I have no objection other than emitting another line for every vma on the system for smaps is probably overkill for something as rare as PR_SET_THP_DISABLE. That said, I think the current handling of the "nh" flag being emitted in smaps is logical and ensures no further userspace breakage. If that is to be removed, I consider it an unnecessary risk. That would raised in code review.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu 15-11-18 10:02:42, Michal Hocko wrote: > On Wed 14-11-18 13:41:12, David Rientjes wrote: > > On Wed, 14 Nov 2018, Michal Hocko wrote: > > > > > > > > Do you know of any other userspace except your usecase? Is there > > > > > > anything fundamental that would prevent a proper API adoption for > > > > > > you? > > > > > > > > > > > > > > > > Yes, it would require us to go back in time and build patched > > > > > binaries. > > > > > > > > I read that as there is a fundamental problem to update existing > > > > binaries. If that is the case then there surely is no way around it > > > > and another sad page in the screwed up APIs book we provide. > > > > > > > > But I was under impression that the SW stack which actually does the > > > > monitoring is under your controll. Moreover I was under impression that > > > > you do not use the current vanilla kernel so there is no need for an > > > > immediate change on your end. It is trivial to come up with a backward > > > > compatible way to check for the new flag (if it is not present then > > > > fallback to vma flags). > > > > > > > > The userspace had a single way to determine if thp had been disabled for a > > specific vma and that was broken with your commit. We have since fixed > > it. Modifying our software stack to start looking for some field > > somewhere else will not help anybody else that this has affected or will > > affect. I'm interested in not breaking userspace, not trying a wait and > > see approach to see if anybody else complains once we start looking for > > some other field. The risk outweighs the reward, it already broke us, and > > I'd prefer not to even open the possibility of breaking anybody else. > > I very much agree on "do not break userspace" part but this is kind of > gray area. VMA flags are a deep internal implementation detail and > nobody should really depend on it for anything important. The original > motivation for introducing it was CRIU where it is kind of > understandable. I would argue they should find a different way but it is > just too late for them. > > For this particular case there was no other bug report except for yours > and if it is possible to fix it on your end then I would really love to > make the a sensible user interface to query the status. If we are going > to change the semantic of the exported flag again then we risk yet > another breakage. > > Therefore I am asking whether changing your particular usecase to a new > interface is possible because that would allow to have a longerm > sensible user interface rather than another kludge which still doesn't > cover all the usecases (e.g. there is no way to reliably query the > madvise status after your patch). Btw. this is essentially the same kind of problem as http://lkml.kernel.org/r/20181002100531.gc4...@quack2.suse.cz where the conclusion was to come up with a saner interface rather than mimic the previous one. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu 15-11-18 10:02:42, Michal Hocko wrote: > On Wed 14-11-18 13:41:12, David Rientjes wrote: > > On Wed, 14 Nov 2018, Michal Hocko wrote: > > > > > > > > Do you know of any other userspace except your usecase? Is there > > > > > > anything fundamental that would prevent a proper API adoption for > > > > > > you? > > > > > > > > > > > > > > > > Yes, it would require us to go back in time and build patched > > > > > binaries. > > > > > > > > I read that as there is a fundamental problem to update existing > > > > binaries. If that is the case then there surely is no way around it > > > > and another sad page in the screwed up APIs book we provide. > > > > > > > > But I was under impression that the SW stack which actually does the > > > > monitoring is under your controll. Moreover I was under impression that > > > > you do not use the current vanilla kernel so there is no need for an > > > > immediate change on your end. It is trivial to come up with a backward > > > > compatible way to check for the new flag (if it is not present then > > > > fallback to vma flags). > > > > > > > > The userspace had a single way to determine if thp had been disabled for a > > specific vma and that was broken with your commit. We have since fixed > > it. Modifying our software stack to start looking for some field > > somewhere else will not help anybody else that this has affected or will > > affect. I'm interested in not breaking userspace, not trying a wait and > > see approach to see if anybody else complains once we start looking for > > some other field. The risk outweighs the reward, it already broke us, and > > I'd prefer not to even open the possibility of breaking anybody else. > > I very much agree on "do not break userspace" part but this is kind of > gray area. VMA flags are a deep internal implementation detail and > nobody should really depend on it for anything important. The original > motivation for introducing it was CRIU where it is kind of > understandable. I would argue they should find a different way but it is > just too late for them. > > For this particular case there was no other bug report except for yours > and if it is possible to fix it on your end then I would really love to > make the a sensible user interface to query the status. If we are going > to change the semantic of the exported flag again then we risk yet > another breakage. > > Therefore I am asking whether changing your particular usecase to a new > interface is possible because that would allow to have a longerm > sensible user interface rather than another kludge which still doesn't > cover all the usecases (e.g. there is no way to reliably query the > madvise status after your patch). Btw. this is essentially the same kind of problem as http://lkml.kernel.org/r/20181002100531.gc4...@quack2.suse.cz where the conclusion was to come up with a saner interface rather than mimic the previous one. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed 14-11-18 13:41:12, David Rientjes wrote: > On Wed, 14 Nov 2018, Michal Hocko wrote: > > > > > > Do you know of any other userspace except your usecase? Is there > > > > > anything fundamental that would prevent a proper API adoption for you? > > > > > > > > > > > > > Yes, it would require us to go back in time and build patched binaries. > > > > > > I read that as there is a fundamental problem to update existing > > > binaries. If that is the case then there surely is no way around it > > > and another sad page in the screwed up APIs book we provide. > > > > > > But I was under impression that the SW stack which actually does the > > > monitoring is under your controll. Moreover I was under impression that > > > you do not use the current vanilla kernel so there is no need for an > > > immediate change on your end. It is trivial to come up with a backward > > > compatible way to check for the new flag (if it is not present then > > > fallback to vma flags). > > > > > The userspace had a single way to determine if thp had been disabled for a > specific vma and that was broken with your commit. We have since fixed > it. Modifying our software stack to start looking for some field > somewhere else will not help anybody else that this has affected or will > affect. I'm interested in not breaking userspace, not trying a wait and > see approach to see if anybody else complains once we start looking for > some other field. The risk outweighs the reward, it already broke us, and > I'd prefer not to even open the possibility of breaking anybody else. I very much agree on "do not break userspace" part but this is kind of gray area. VMA flags are a deep internal implementation detail and nobody should really depend on it for anything important. The original motivation for introducing it was CRIU where it is kind of understandable. I would argue they should find a different way but it is just too late for them. For this particular case there was no other bug report except for yours and if it is possible to fix it on your end then I would really love to make the a sensible user interface to query the status. If we are going to change the semantic of the exported flag again then we risk yet another breakage. Therefore I am asking whether changing your particular usecase to a new interface is possible because that would allow to have a longerm sensible user interface rather than another kludge which still doesn't cover all the usecases (e.g. there is no way to reliably query the madvise status after your patch). -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed 14-11-18 13:41:12, David Rientjes wrote: > On Wed, 14 Nov 2018, Michal Hocko wrote: > > > > > > Do you know of any other userspace except your usecase? Is there > > > > > anything fundamental that would prevent a proper API adoption for you? > > > > > > > > > > > > > Yes, it would require us to go back in time and build patched binaries. > > > > > > I read that as there is a fundamental problem to update existing > > > binaries. If that is the case then there surely is no way around it > > > and another sad page in the screwed up APIs book we provide. > > > > > > But I was under impression that the SW stack which actually does the > > > monitoring is under your controll. Moreover I was under impression that > > > you do not use the current vanilla kernel so there is no need for an > > > immediate change on your end. It is trivial to come up with a backward > > > compatible way to check for the new flag (if it is not present then > > > fallback to vma flags). > > > > > The userspace had a single way to determine if thp had been disabled for a > specific vma and that was broken with your commit. We have since fixed > it. Modifying our software stack to start looking for some field > somewhere else will not help anybody else that this has affected or will > affect. I'm interested in not breaking userspace, not trying a wait and > see approach to see if anybody else complains once we start looking for > some other field. The risk outweighs the reward, it already broke us, and > I'd prefer not to even open the possibility of breaking anybody else. I very much agree on "do not break userspace" part but this is kind of gray area. VMA flags are a deep internal implementation detail and nobody should really depend on it for anything important. The original motivation for introducing it was CRIU where it is kind of understandable. I would argue they should find a different way but it is just too late for them. For this particular case there was no other bug report except for yours and if it is possible to fix it on your end then I would really love to make the a sensible user interface to query the status. If we are going to change the semantic of the exported flag again then we risk yet another breakage. Therefore I am asking whether changing your particular usecase to a new interface is possible because that would allow to have a longerm sensible user interface rather than another kludge which still doesn't cover all the usecases (e.g. there is no way to reliably query the madvise status after your patch). -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed, 14 Nov 2018, Michal Hocko wrote: > > > > Do you know of any other userspace except your usecase? Is there > > > > anything fundamental that would prevent a proper API adoption for you? > > > > > > > > > > Yes, it would require us to go back in time and build patched binaries. > > > > I read that as there is a fundamental problem to update existing > > binaries. If that is the case then there surely is no way around it > > and another sad page in the screwed up APIs book we provide. > > > > But I was under impression that the SW stack which actually does the > > monitoring is under your controll. Moreover I was under impression that > > you do not use the current vanilla kernel so there is no need for an > > immediate change on your end. It is trivial to come up with a backward > > compatible way to check for the new flag (if it is not present then > > fallback to vma flags). > > The userspace had a single way to determine if thp had been disabled for a specific vma and that was broken with your commit. We have since fixed it. Modifying our software stack to start looking for some field somewhere else will not help anybody else that this has affected or will affect. I'm interested in not breaking userspace, not trying a wait and see approach to see if anybody else complains once we start looking for some other field. The risk outweighs the reward, it already broke us, and I'd prefer not to even open the possibility of breaking anybody else.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed, 14 Nov 2018, Michal Hocko wrote: > > > > Do you know of any other userspace except your usecase? Is there > > > > anything fundamental that would prevent a proper API adoption for you? > > > > > > > > > > Yes, it would require us to go back in time and build patched binaries. > > > > I read that as there is a fundamental problem to update existing > > binaries. If that is the case then there surely is no way around it > > and another sad page in the screwed up APIs book we provide. > > > > But I was under impression that the SW stack which actually does the > > monitoring is under your controll. Moreover I was under impression that > > you do not use the current vanilla kernel so there is no need for an > > immediate change on your end. It is trivial to come up with a backward > > compatible way to check for the new flag (if it is not present then > > fallback to vma flags). > > The userspace had a single way to determine if thp had been disabled for a specific vma and that was broken with your commit. We have since fixed it. Modifying our software stack to start looking for some field somewhere else will not help anybody else that this has affected or will affect. I'm interested in not breaking userspace, not trying a wait and see approach to see if anybody else complains once we start looking for some other field. The risk outweighs the reward, it already broke us, and I'd prefer not to even open the possibility of breaking anybody else.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu 18-10-18 09:00:31, Michal Hocko wrote: > On Wed 17-10-18 12:59:18, David Rientjes wrote: > > On Wed, 17 Oct 2018, Michal Hocko wrote: > > > > > Do you know of any other userspace except your usecase? Is there > > > anything fundamental that would prevent a proper API adoption for you? > > > > > > > Yes, it would require us to go back in time and build patched binaries. > > I read that as there is a fundamental problem to update existing > binaries. If that is the case then there surely is no way around it > and another sad page in the screwed up APIs book we provide. > > But I was under impression that the SW stack which actually does the > monitoring is under your controll. Moreover I was under impression that > you do not use the current vanilla kernel so there is no need for an > immediate change on your end. It is trivial to come up with a backward > compatible way to check for the new flag (if it is not present then > fallback to vma flags). > > I am sorry for pushing here but if this is just a matter of a _single_ > user which _can_ be fixed with a reasonable effort then I would love to > see the future api unscrewed. ping -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu 18-10-18 09:00:31, Michal Hocko wrote: > On Wed 17-10-18 12:59:18, David Rientjes wrote: > > On Wed, 17 Oct 2018, Michal Hocko wrote: > > > > > Do you know of any other userspace except your usecase? Is there > > > anything fundamental that would prevent a proper API adoption for you? > > > > > > > Yes, it would require us to go back in time and build patched binaries. > > I read that as there is a fundamental problem to update existing > binaries. If that is the case then there surely is no way around it > and another sad page in the screwed up APIs book we provide. > > But I was under impression that the SW stack which actually does the > monitoring is under your controll. Moreover I was under impression that > you do not use the current vanilla kernel so there is no need for an > immediate change on your end. It is trivial to come up with a backward > compatible way to check for the new flag (if it is not present then > fallback to vma flags). > > I am sorry for pushing here but if this is just a matter of a _single_ > user which _can_ be fixed with a reasonable effort then I would love to > see the future api unscrewed. ping -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed 17-10-18 12:59:18, David Rientjes wrote: > On Wed, 17 Oct 2018, Michal Hocko wrote: > > > Do you know of any other userspace except your usecase? Is there > > anything fundamental that would prevent a proper API adoption for you? > > > > Yes, it would require us to go back in time and build patched binaries. I read that as there is a fundamental problem to update existing binaries. If that is the case then there surely is no way around it and another sad page in the screwed up APIs book we provide. But I was under impression that the SW stack which actually does the monitoring is under your controll. Moreover I was under impression that you do not use the current vanilla kernel so there is no need for an immediate change on your end. It is trivial to come up with a backward compatible way to check for the new flag (if it is not present then fallback to vma flags). I am sorry for pushing here but if this is just a matter of a _single_ user which _can_ be fixed with a reasonable effort then I would love to see the future api unscrewed. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed 17-10-18 12:59:18, David Rientjes wrote: > On Wed, 17 Oct 2018, Michal Hocko wrote: > > > Do you know of any other userspace except your usecase? Is there > > anything fundamental that would prevent a proper API adoption for you? > > > > Yes, it would require us to go back in time and build patched binaries. I read that as there is a fundamental problem to update existing binaries. If that is the case then there surely is no way around it and another sad page in the screwed up APIs book we provide. But I was under impression that the SW stack which actually does the monitoring is under your controll. Moreover I was under impression that you do not use the current vanilla kernel so there is no need for an immediate change on your end. It is trivial to come up with a backward compatible way to check for the new flag (if it is not present then fallback to vma flags). I am sorry for pushing here but if this is just a matter of a _single_ user which _can_ be fixed with a reasonable effort then I would love to see the future api unscrewed. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed, 17 Oct 2018, Michal Hocko wrote: > Do you know of any other userspace except your usecase? Is there > anything fundamental that would prevent a proper API adoption for you? > Yes, it would require us to go back in time and build patched binaries.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed, 17 Oct 2018, Michal Hocko wrote: > Do you know of any other userspace except your usecase? Is there > anything fundamental that would prevent a proper API adoption for you? > Yes, it would require us to go back in time and build patched binaries.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue 16-10-18 14:24:19, David Rientjes wrote: > On Tue, 16 Oct 2018, Michal Hocko wrote: > > > > I don't understand the point of extending smaps with yet another line. > > > > Because abusing a vma flag part is just wrong. What are you going to do > > when a next bug report states that the flag is set even though no > > userspace has set it and that leads to some malfunctioning? Can you rule > > that out? Even your abuse of the flag is surprising so why others > > wouldn't be? > > > > The flag has taken on the meaning of "thp disabled for this vma", how it > is set is not the scope of the flag. If a thp is explicitly disabled from > being eligible for thp, whether by madvise, prctl, or any future > mechanism, it should use VM_NOHUGEPAGE or show_smap_vma_flags() needs to > be modified. No, this is not the meaning which is documented nh - no-huge page advise flag and as far as I know it is only you who has complained so far. > > As I've said there are two things. Exporting PR_SET_THP_DISABLE to > > userspace so that a 3rd party process can query it. I've already > > explained why that might be useful. If you really insist on having > > a per-vma field then let's do it properly now. Are you going to agree on > > that? If yes, I am willing to spend my time on that but I am not going > > to bother if this will lead to "I want my vma field abuse anyway". > > I think what you and I want is largely irrelevant :) What's important is > that there are userspace implementations that query this today so > continuing to support it as the way to determine if a vma has been thp > disabled doesn't seem problematic and guarantees that userspace doesn't > break. Do you know of any other userspace except your usecase? Is there anything fundamental that would prevent a proper API adoption for you? -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue 16-10-18 14:24:19, David Rientjes wrote: > On Tue, 16 Oct 2018, Michal Hocko wrote: > > > > I don't understand the point of extending smaps with yet another line. > > > > Because abusing a vma flag part is just wrong. What are you going to do > > when a next bug report states that the flag is set even though no > > userspace has set it and that leads to some malfunctioning? Can you rule > > that out? Even your abuse of the flag is surprising so why others > > wouldn't be? > > > > The flag has taken on the meaning of "thp disabled for this vma", how it > is set is not the scope of the flag. If a thp is explicitly disabled from > being eligible for thp, whether by madvise, prctl, or any future > mechanism, it should use VM_NOHUGEPAGE or show_smap_vma_flags() needs to > be modified. No, this is not the meaning which is documented nh - no-huge page advise flag and as far as I know it is only you who has complained so far. > > As I've said there are two things. Exporting PR_SET_THP_DISABLE to > > userspace so that a 3rd party process can query it. I've already > > explained why that might be useful. If you really insist on having > > a per-vma field then let's do it properly now. Are you going to agree on > > that? If yes, I am willing to spend my time on that but I am not going > > to bother if this will lead to "I want my vma field abuse anyway". > > I think what you and I want is largely irrelevant :) What's important is > that there are userspace implementations that query this today so > continuing to support it as the way to determine if a vma has been thp > disabled doesn't seem problematic and guarantees that userspace doesn't > break. Do you know of any other userspace except your usecase? Is there anything fundamental that would prevent a proper API adoption for you? -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue, 16 Oct 2018, Michal Hocko wrote: > > I don't understand the point of extending smaps with yet another line. > > Because abusing a vma flag part is just wrong. What are you going to do > when a next bug report states that the flag is set even though no > userspace has set it and that leads to some malfunctioning? Can you rule > that out? Even your abuse of the flag is surprising so why others > wouldn't be? > The flag has taken on the meaning of "thp disabled for this vma", how it is set is not the scope of the flag. If a thp is explicitly disabled from being eligible for thp, whether by madvise, prctl, or any future mechanism, it should use VM_NOHUGEPAGE or show_smap_vma_flags() needs to be modified. I agree with you that this could have been done better if an interface was defined earlier that userspace could have used. PR_SET_THP_DISABLE was merged long after thp had already been merged so this can be a reminder that defining clean, robust, and extensible APIs is important, but I'm afraid we can't go back in time and change how userspace queries information, especially in cases where there was only one way to do it. > > The only way for a different process to determine if a single vma from > > another process is thp disabled is by the "nh" flag, so it is reasonable > > that userspace reads this. My patch fixes that. If smaps is extended > > with another line per your patch, it doesn't change the fact that previous > > binaries are built to check for "nh" so it does not deprecate that. > > ("THP_Enabled" is also ambiguous since it only refers to prctl and not the > > default thp setting or madvise.) > > As I've said there are two things. Exporting PR_SET_THP_DISABLE to > userspace so that a 3rd party process can query it. I've already > explained why that might be useful. If you really insist on having > a per-vma field then let's do it properly now. Are you going to agree on > that? If yes, I am willing to spend my time on that but I am not going > to bother if this will lead to "I want my vma field abuse anyway". I think what you and I want is largely irrelevant :) What's important is that there are userspace implementations that query this today so continuing to support it as the way to determine if a vma has been thp disabled doesn't seem problematic and guarantees that userspace doesn't break.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue, 16 Oct 2018, Michal Hocko wrote: > > I don't understand the point of extending smaps with yet another line. > > Because abusing a vma flag part is just wrong. What are you going to do > when a next bug report states that the flag is set even though no > userspace has set it and that leads to some malfunctioning? Can you rule > that out? Even your abuse of the flag is surprising so why others > wouldn't be? > The flag has taken on the meaning of "thp disabled for this vma", how it is set is not the scope of the flag. If a thp is explicitly disabled from being eligible for thp, whether by madvise, prctl, or any future mechanism, it should use VM_NOHUGEPAGE or show_smap_vma_flags() needs to be modified. I agree with you that this could have been done better if an interface was defined earlier that userspace could have used. PR_SET_THP_DISABLE was merged long after thp had already been merged so this can be a reminder that defining clean, robust, and extensible APIs is important, but I'm afraid we can't go back in time and change how userspace queries information, especially in cases where there was only one way to do it. > > The only way for a different process to determine if a single vma from > > another process is thp disabled is by the "nh" flag, so it is reasonable > > that userspace reads this. My patch fixes that. If smaps is extended > > with another line per your patch, it doesn't change the fact that previous > > binaries are built to check for "nh" so it does not deprecate that. > > ("THP_Enabled" is also ambiguous since it only refers to prctl and not the > > default thp setting or madvise.) > > As I've said there are two things. Exporting PR_SET_THP_DISABLE to > userspace so that a 3rd party process can query it. I've already > explained why that might be useful. If you really insist on having > a per-vma field then let's do it properly now. Are you going to agree on > that? If yes, I am willing to spend my time on that but I am not going > to bother if this will lead to "I want my vma field abuse anyway". I think what you and I want is largely irrelevant :) What's important is that there are userspace implementations that query this today so continuing to support it as the way to determine if a vma has been thp disabled doesn't seem problematic and guarantees that userspace doesn't break.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Mon 15-10-18 15:25:14, David Rientjes wrote: > On Mon, 15 Oct 2018, Michal Hocko wrote: > > > > > No, because the offending commit actually changed the precedence > > > > itself: > > > > PR_SET_THP_DISABLE used to be honored for future mappings and the > > > > commit > > > > changed that for all current mappings. > > > > > > Which is the actual and the full point of the fix as described in the > > > changelog. The original implementation was poor and inconsistent. > > > > > > > So as a result of the commit > > > > itself we would have had to change the documentation and userspace > > > > can't > > > > be expected to keep up with yet a fourth variable: kernel version. It > > > > really needs to be simpler, just a per-mapping specifier. > > > > > > As I've said, if you really need a per-vma granularity then make it a > > > dedicated line in the output with a clear semantic. Do not make VMA > > > flags even more confusing. > > > > Can we settle with something please? > > I don't understand the point of extending smaps with yet another line. Because abusing a vma flag part is just wrong. What are you going to do when a next bug report states that the flag is set even though no userspace has set it and that leads to some malfunctioning? Can you rule that out? Even your abuse of the flag is surprising so why others wouldn't be? > The only way for a different process to determine if a single vma from > another process is thp disabled is by the "nh" flag, so it is reasonable > that userspace reads this. My patch fixes that. If smaps is extended > with another line per your patch, it doesn't change the fact that previous > binaries are built to check for "nh" so it does not deprecate that. > ("THP_Enabled" is also ambiguous since it only refers to prctl and not the > default thp setting or madvise.) As I've said there are two things. Exporting PR_SET_THP_DISABLE to userspace so that a 3rd party process can query it. I've already explained why that might be useful. If you really insist on having a per-vma field then let's do it properly now. Are you going to agree on that? If yes, I am willing to spend my time on that but I am not going to bother if this will lead to "I want my vma field abuse anyway". -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Mon 15-10-18 15:25:14, David Rientjes wrote: > On Mon, 15 Oct 2018, Michal Hocko wrote: > > > > > No, because the offending commit actually changed the precedence > > > > itself: > > > > PR_SET_THP_DISABLE used to be honored for future mappings and the > > > > commit > > > > changed that for all current mappings. > > > > > > Which is the actual and the full point of the fix as described in the > > > changelog. The original implementation was poor and inconsistent. > > > > > > > So as a result of the commit > > > > itself we would have had to change the documentation and userspace > > > > can't > > > > be expected to keep up with yet a fourth variable: kernel version. It > > > > really needs to be simpler, just a per-mapping specifier. > > > > > > As I've said, if you really need a per-vma granularity then make it a > > > dedicated line in the output with a clear semantic. Do not make VMA > > > flags even more confusing. > > > > Can we settle with something please? > > I don't understand the point of extending smaps with yet another line. Because abusing a vma flag part is just wrong. What are you going to do when a next bug report states that the flag is set even though no userspace has set it and that leads to some malfunctioning? Can you rule that out? Even your abuse of the flag is surprising so why others wouldn't be? > The only way for a different process to determine if a single vma from > another process is thp disabled is by the "nh" flag, so it is reasonable > that userspace reads this. My patch fixes that. If smaps is extended > with another line per your patch, it doesn't change the fact that previous > binaries are built to check for "nh" so it does not deprecate that. > ("THP_Enabled" is also ambiguous since it only refers to prctl and not the > default thp setting or madvise.) As I've said there are two things. Exporting PR_SET_THP_DISABLE to userspace so that a 3rd party process can query it. I've already explained why that might be useful. If you really insist on having a per-vma field then let's do it properly now. Are you going to agree on that? If yes, I am willing to spend my time on that but I am not going to bother if this will lead to "I want my vma field abuse anyway". -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Mon, 15 Oct 2018, Michal Hocko wrote: > > > No, because the offending commit actually changed the precedence itself: > > > PR_SET_THP_DISABLE used to be honored for future mappings and the commit > > > changed that for all current mappings. > > > > Which is the actual and the full point of the fix as described in the > > changelog. The original implementation was poor and inconsistent. > > > > > So as a result of the commit > > > itself we would have had to change the documentation and userspace can't > > > be expected to keep up with yet a fourth variable: kernel version. It > > > really needs to be simpler, just a per-mapping specifier. > > > > As I've said, if you really need a per-vma granularity then make it a > > dedicated line in the output with a clear semantic. Do not make VMA > > flags even more confusing. > > Can we settle with something please? I don't understand the point of extending smaps with yet another line. The only way for a different process to determine if a single vma from another process is thp disabled is by the "nh" flag, so it is reasonable that userspace reads this. My patch fixes that. If smaps is extended with another line per your patch, it doesn't change the fact that previous binaries are built to check for "nh" so it does not deprecate that. ("THP_Enabled" is also ambiguous since it only refers to prctl and not the default thp setting or madvise.)
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Mon, 15 Oct 2018, Michal Hocko wrote: > > > No, because the offending commit actually changed the precedence itself: > > > PR_SET_THP_DISABLE used to be honored for future mappings and the commit > > > changed that for all current mappings. > > > > Which is the actual and the full point of the fix as described in the > > changelog. The original implementation was poor and inconsistent. > > > > > So as a result of the commit > > > itself we would have had to change the documentation and userspace can't > > > be expected to keep up with yet a fourth variable: kernel version. It > > > really needs to be simpler, just a per-mapping specifier. > > > > As I've said, if you really need a per-vma granularity then make it a > > dedicated line in the output with a clear semantic. Do not make VMA > > flags even more confusing. > > Can we settle with something please? I don't understand the point of extending smaps with yet another line. The only way for a different process to determine if a single vma from another process is thp disabled is by the "nh" flag, so it is reasonable that userspace reads this. My patch fixes that. If smaps is extended with another line per your patch, it doesn't change the fact that previous binaries are built to check for "nh" so it does not deprecate that. ("THP_Enabled" is also ambiguous since it only refers to prctl and not the default thp setting or madvise.)
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue 09-10-18 10:33:26, Michal Hocko wrote: > On Thu 04-10-18 11:34:11, David Rientjes wrote: > > On Thu, 4 Oct 2018, Michal Hocko wrote: > > > > > > And prior to the offending commit, there were three ways to control thp > > > > but two ways to determine if a mapping was eligible for thp based on > > > > the > > > > implementation detail of one of those ways. > > > > > > Yes, it is really unfortunate that we have ever allowed to leak such an > > > internal stuff like VMA flags to userspace. > > > > > > > Right, I don't like userspace dependencies on VmFlags in smaps myself, but > > it's the only way we have available that shows whether a single mapping is > > eligible to be backed by thp :/ > > Which is not the case due to reasons mentioned earlier. It only speaks > about madvise status on the VMA. > > > > > If there are three ways to > > > > control thp, userspace is still in the dark wrt which takes precedence > > > > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it > > > > set > > > > to "always", or we have MADV_HUGEPAGE set per smaps but > > > > PR_SET_THP_DISABLE > > > > shown in /proc/pid/status, etc. > > > > > > > > Which one is the ultimate authority? > > > > > > Isn't our documentation good enough? If not then we should document it > > > properly. > > > > > > > No, because the offending commit actually changed the precedence itself: > > PR_SET_THP_DISABLE used to be honored for future mappings and the commit > > changed that for all current mappings. > > Which is the actual and the full point of the fix as described in the > changelog. The original implementation was poor and inconsistent. > > > So as a result of the commit > > itself we would have had to change the documentation and userspace can't > > be expected to keep up with yet a fourth variable: kernel version. It > > really needs to be simpler, just a per-mapping specifier. > > As I've said, if you really need a per-vma granularity then make it a > dedicated line in the output with a clear semantic. Do not make VMA > flags even more confusing. Can we settle with something please? -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue 09-10-18 10:33:26, Michal Hocko wrote: > On Thu 04-10-18 11:34:11, David Rientjes wrote: > > On Thu, 4 Oct 2018, Michal Hocko wrote: > > > > > > And prior to the offending commit, there were three ways to control thp > > > > but two ways to determine if a mapping was eligible for thp based on > > > > the > > > > implementation detail of one of those ways. > > > > > > Yes, it is really unfortunate that we have ever allowed to leak such an > > > internal stuff like VMA flags to userspace. > > > > > > > Right, I don't like userspace dependencies on VmFlags in smaps myself, but > > it's the only way we have available that shows whether a single mapping is > > eligible to be backed by thp :/ > > Which is not the case due to reasons mentioned earlier. It only speaks > about madvise status on the VMA. > > > > > If there are three ways to > > > > control thp, userspace is still in the dark wrt which takes precedence > > > > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it > > > > set > > > > to "always", or we have MADV_HUGEPAGE set per smaps but > > > > PR_SET_THP_DISABLE > > > > shown in /proc/pid/status, etc. > > > > > > > > Which one is the ultimate authority? > > > > > > Isn't our documentation good enough? If not then we should document it > > > properly. > > > > > > > No, because the offending commit actually changed the precedence itself: > > PR_SET_THP_DISABLE used to be honored for future mappings and the commit > > changed that for all current mappings. > > Which is the actual and the full point of the fix as described in the > changelog. The original implementation was poor and inconsistent. > > > So as a result of the commit > > itself we would have had to change the documentation and userspace can't > > be expected to keep up with yet a fourth variable: kernel version. It > > really needs to be simpler, just a per-mapping specifier. > > As I've said, if you really need a per-vma granularity then make it a > dedicated line in the output with a clear semantic. Do not make VMA > flags even more confusing. Can we settle with something please? -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu 04-10-18 11:34:11, David Rientjes wrote: > On Thu, 4 Oct 2018, Michal Hocko wrote: > > > > And prior to the offending commit, there were three ways to control thp > > > but two ways to determine if a mapping was eligible for thp based on the > > > implementation detail of one of those ways. > > > > Yes, it is really unfortunate that we have ever allowed to leak such an > > internal stuff like VMA flags to userspace. > > > > Right, I don't like userspace dependencies on VmFlags in smaps myself, but > it's the only way we have available that shows whether a single mapping is > eligible to be backed by thp :/ Which is not the case due to reasons mentioned earlier. It only speaks about madvise status on the VMA. > > > If there are three ways to > > > control thp, userspace is still in the dark wrt which takes precedence > > > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set > > > to "always", or we have MADV_HUGEPAGE set per smaps but > > > PR_SET_THP_DISABLE > > > shown in /proc/pid/status, etc. > > > > > > Which one is the ultimate authority? > > > > Isn't our documentation good enough? If not then we should document it > > properly. > > > > No, because the offending commit actually changed the precedence itself: > PR_SET_THP_DISABLE used to be honored for future mappings and the commit > changed that for all current mappings. Which is the actual and the full point of the fix as described in the changelog. The original implementation was poor and inconsistent. > So as a result of the commit > itself we would have had to change the documentation and userspace can't > be expected to keep up with yet a fourth variable: kernel version. It > really needs to be simpler, just a per-mapping specifier. As I've said, if you really need a per-vma granularity then make it a dedicated line in the output with a clear semantic. Do not make VMA flags even more confusing. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu 04-10-18 11:34:11, David Rientjes wrote: > On Thu, 4 Oct 2018, Michal Hocko wrote: > > > > And prior to the offending commit, there were three ways to control thp > > > but two ways to determine if a mapping was eligible for thp based on the > > > implementation detail of one of those ways. > > > > Yes, it is really unfortunate that we have ever allowed to leak such an > > internal stuff like VMA flags to userspace. > > > > Right, I don't like userspace dependencies on VmFlags in smaps myself, but > it's the only way we have available that shows whether a single mapping is > eligible to be backed by thp :/ Which is not the case due to reasons mentioned earlier. It only speaks about madvise status on the VMA. > > > If there are three ways to > > > control thp, userspace is still in the dark wrt which takes precedence > > > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set > > > to "always", or we have MADV_HUGEPAGE set per smaps but > > > PR_SET_THP_DISABLE > > > shown in /proc/pid/status, etc. > > > > > > Which one is the ultimate authority? > > > > Isn't our documentation good enough? If not then we should document it > > properly. > > > > No, because the offending commit actually changed the precedence itself: > PR_SET_THP_DISABLE used to be honored for future mappings and the commit > changed that for all current mappings. Which is the actual and the full point of the fix as described in the changelog. The original implementation was poor and inconsistent. > So as a result of the commit > itself we would have had to change the documentation and userspace can't > be expected to keep up with yet a fourth variable: kernel version. It > really needs to be simpler, just a per-mapping specifier. As I've said, if you really need a per-vma granularity then make it a dedicated line in the output with a clear semantic. Do not make VMA flags even more confusing. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu, 4 Oct 2018, Michal Hocko wrote: > > And prior to the offending commit, there were three ways to control thp > > but two ways to determine if a mapping was eligible for thp based on the > > implementation detail of one of those ways. > > Yes, it is really unfortunate that we have ever allowed to leak such an > internal stuff like VMA flags to userspace. > Right, I don't like userspace dependencies on VmFlags in smaps myself, but it's the only way we have available that shows whether a single mapping is eligible to be backed by thp :/ > > If there are three ways to > > control thp, userspace is still in the dark wrt which takes precedence > > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set > > to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE > > shown in /proc/pid/status, etc. > > > > Which one is the ultimate authority? > > Isn't our documentation good enough? If not then we should document it > properly. > No, because the offending commit actually changed the precedence itself: PR_SET_THP_DISABLE used to be honored for future mappings and the commit changed that for all current mappings. So as a result of the commit itself we would have had to change the documentation and userspace can't be expected to keep up with yet a fourth variable: kernel version. It really needs to be simpler, just a per-mapping specifier.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu, 4 Oct 2018, Michal Hocko wrote: > > And prior to the offending commit, there were three ways to control thp > > but two ways to determine if a mapping was eligible for thp based on the > > implementation detail of one of those ways. > > Yes, it is really unfortunate that we have ever allowed to leak such an > internal stuff like VMA flags to userspace. > Right, I don't like userspace dependencies on VmFlags in smaps myself, but it's the only way we have available that shows whether a single mapping is eligible to be backed by thp :/ > > If there are three ways to > > control thp, userspace is still in the dark wrt which takes precedence > > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set > > to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE > > shown in /proc/pid/status, etc. > > > > Which one is the ultimate authority? > > Isn't our documentation good enough? If not then we should document it > properly. > No, because the offending commit actually changed the precedence itself: PR_SET_THP_DISABLE used to be honored for future mappings and the commit changed that for all current mappings. So as a result of the commit itself we would have had to change the documentation and userspace can't be expected to keep up with yet a fourth variable: kernel version. It really needs to be simpler, just a per-mapping specifier.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu 04-10-18 02:15:38, David Rientjes wrote: > On Thu, 4 Oct 2018, Michal Hocko wrote: > > > > > > > So how about this? (not tested yet but it should be pretty > > > > > > straightforward) > > > > > > > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > > > > > > > /me confused. I thought you want to query for the flag on a > > > > _different_ process. > > > > > > Why would we want to check three locations (system wide setting, prctl > > > setting, madvise setting) to determine if a heap can be backed by thp? > > > > Because we simply have 3 different ways to control THP? Is this a real > > problem? > > > > And prior to the offending commit, there were three ways to control thp > but two ways to determine if a mapping was eligible for thp based on the > implementation detail of one of those ways. Yes, it is really unfortunate that we have ever allowed to leak such an internal stuff like VMA flags to userspace. > If there are three ways to > control thp, userspace is still in the dark wrt which takes precedence > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set > to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE > shown in /proc/pid/status, etc. > > Which one is the ultimate authority? Isn't our documentation good enough? If not then we should document it properly. > There's one way to specify it: in a > single per-mapping location that reveals whether that mapping is eligible > for thp or not. So I think it would be a very sane extension so that > smaps reveals if a mapping can be backed by hugepages or not depending on > the helper function thp uses itself to determine if it can fault > hugepages. I don't think we should have three locations to check and then > try to resolve which one takes precedence over the other for each > userspace implementation (and perhaps how the kernel implementation > evolves). But we really have three different ways to disable thp. Which one has caused the end result might be interesting/important because different entities might be under control. You either have to contact your admin for the global one, or whomever has launched you for the prctl thing. So the distinction might be important. Checking 3 different places and the precedence rules is not really trivial but I do not see any reason why this couldn't be implemented in a library so the user doesn't really have to scratch head. If you really insist to have per-vma thing then all right but do not conflate vma flags and the higher level logic and make it its own line in the smaps output and make sure it reports only THP able VMAs. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu 04-10-18 02:15:38, David Rientjes wrote: > On Thu, 4 Oct 2018, Michal Hocko wrote: > > > > > > > So how about this? (not tested yet but it should be pretty > > > > > > straightforward) > > > > > > > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > > > > > > > /me confused. I thought you want to query for the flag on a > > > > _different_ process. > > > > > > Why would we want to check three locations (system wide setting, prctl > > > setting, madvise setting) to determine if a heap can be backed by thp? > > > > Because we simply have 3 different ways to control THP? Is this a real > > problem? > > > > And prior to the offending commit, there were three ways to control thp > but two ways to determine if a mapping was eligible for thp based on the > implementation detail of one of those ways. Yes, it is really unfortunate that we have ever allowed to leak such an internal stuff like VMA flags to userspace. > If there are three ways to > control thp, userspace is still in the dark wrt which takes precedence > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set > to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE > shown in /proc/pid/status, etc. > > Which one is the ultimate authority? Isn't our documentation good enough? If not then we should document it properly. > There's one way to specify it: in a > single per-mapping location that reveals whether that mapping is eligible > for thp or not. So I think it would be a very sane extension so that > smaps reveals if a mapping can be backed by hugepages or not depending on > the helper function thp uses itself to determine if it can fault > hugepages. I don't think we should have three locations to check and then > try to resolve which one takes precedence over the other for each > userspace implementation (and perhaps how the kernel implementation > evolves). But we really have three different ways to disable thp. Which one has caused the end result might be interesting/important because different entities might be under control. You either have to contact your admin for the global one, or whomever has launched you for the prctl thing. So the distinction might be important. Checking 3 different places and the precedence rules is not really trivial but I do not see any reason why this couldn't be implemented in a library so the user doesn't really have to scratch head. If you really insist to have per-vma thing then all right but do not conflate vma flags and the higher level logic and make it its own line in the smaps output and make sure it reports only THP able VMAs. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu, 4 Oct 2018, Michal Hocko wrote: > > > > > So how about this? (not tested yet but it should be pretty > > > > > straightforward) > > > > > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > > > > > /me confused. I thought you want to query for the flag on a > > > _different_ process. > > > > Why would we want to check three locations (system wide setting, prctl > > setting, madvise setting) to determine if a heap can be backed by thp? > > Because we simply have 3 different ways to control THP? Is this a real > problem? > And prior to the offending commit, there were three ways to control thp but two ways to determine if a mapping was eligible for thp based on the implementation detail of one of those ways. If there are three ways to control thp, userspace is still in the dark wrt which takes precedence over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE shown in /proc/pid/status, etc. Which one is the ultimate authority? There's one way to specify it: in a single per-mapping location that reveals whether that mapping is eligible for thp or not. So I think it would be a very sane extension so that smaps reveals if a mapping can be backed by hugepages or not depending on the helper function thp uses itself to determine if it can fault hugepages. I don't think we should have three locations to check and then try to resolve which one takes precedence over the other for each userspace implementation (and perhaps how the kernel implementation evolves).
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Thu, 4 Oct 2018, Michal Hocko wrote: > > > > > So how about this? (not tested yet but it should be pretty > > > > > straightforward) > > > > > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > > > > > /me confused. I thought you want to query for the flag on a > > > _different_ process. > > > > Why would we want to check three locations (system wide setting, prctl > > setting, madvise setting) to determine if a heap can be backed by thp? > > Because we simply have 3 different ways to control THP? Is this a real > problem? > And prior to the offending commit, there were three ways to control thp but two ways to determine if a mapping was eligible for thp based on the implementation detail of one of those ways. If there are three ways to control thp, userspace is still in the dark wrt which takes precedence over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE shown in /proc/pid/status, etc. Which one is the ultimate authority? There's one way to specify it: in a single per-mapping location that reveals whether that mapping is eligible for thp or not. So I think it would be a very sane extension so that smaps reveals if a mapping can be backed by hugepages or not depending on the helper function thp uses itself to determine if it can fault hugepages. I don't think we should have three locations to check and then try to resolve which one takes precedence over the other for each userspace implementation (and perhaps how the kernel implementation evolves).
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed 03-10-18 15:51:05, David Rientjes wrote: > On Wed, 3 Oct 2018, Michal Hocko wrote: > > > > > So how about this? (not tested yet but it should be pretty > > > > straightforward) > > > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > > > /me confused. I thought you want to query for the flag on a > > _different_ process. > > Why would we want to check three locations (system wide setting, prctl > setting, madvise setting) to determine if a heap can be backed by thp? Because we simply have 3 different ways to control THP? Is this a real problem? > If the nh flag being exported to VmFlag is to be extended beyond what my > patch did, I suggest (1) it does it for the system wide setting as well > and/or (2) calling a helper function to determine if the vma could be > backed by thp in the first place regardless of any setting to determine if > nh/hg is important. > > The last thing I suggest is done is adding a third place to check. But conflating the three ways into a single exported symbol (be it nh or something else) just makes the api more confusing longterm. I am pretty sure we have made that mistake in the past already. What if somebody really wants to check for PR_SET_THP_DISABLE? There is currently no way to do that on a remote process right now AFAICS. So it makes sense to export the state in general. Any exported API should be about consistency. If you want to combine all three checks then just do that in the userspace or in a library function. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed 03-10-18 15:51:05, David Rientjes wrote: > On Wed, 3 Oct 2018, Michal Hocko wrote: > > > > > So how about this? (not tested yet but it should be pretty > > > > straightforward) > > > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > > > /me confused. I thought you want to query for the flag on a > > _different_ process. > > Why would we want to check three locations (system wide setting, prctl > setting, madvise setting) to determine if a heap can be backed by thp? Because we simply have 3 different ways to control THP? Is this a real problem? > If the nh flag being exported to VmFlag is to be extended beyond what my > patch did, I suggest (1) it does it for the system wide setting as well > and/or (2) calling a helper function to determine if the vma could be > backed by thp in the first place regardless of any setting to determine if > nh/hg is important. > > The last thing I suggest is done is adding a third place to check. But conflating the three ways into a single exported symbol (be it nh or something else) just makes the api more confusing longterm. I am pretty sure we have made that mistake in the past already. What if somebody really wants to check for PR_SET_THP_DISABLE? There is currently no way to do that on a remote process right now AFAICS. So it makes sense to export the state in general. Any exported API should be about consistency. If you want to combine all three checks then just do that in the userspace or in a library function. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed, 3 Oct 2018, Michal Hocko wrote: > > > So how about this? (not tested yet but it should be pretty > > > straightforward) > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > /me confused. I thought you want to query for the flag on a > _different_ process. Why would we want to check three locations (system wide setting, prctl setting, madvise setting) to determine if a heap can be backed by thp? If the nh flag being exported to VmFlag is to be extended beyond what my patch did, I suggest (1) it does it for the system wide setting as well and/or (2) calling a helper function to determine if the vma could be backed by thp in the first place regardless of any setting to determine if nh/hg is important. The last thing I suggest is done is adding a third place to check.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed, 3 Oct 2018, Michal Hocko wrote: > > > So how about this? (not tested yet but it should be pretty > > > straightforward) > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > /me confused. I thought you want to query for the flag on a > _different_ process. Why would we want to check three locations (system wide setting, prctl setting, madvise setting) to determine if a heap can be backed by thp? If the nh flag being exported to VmFlag is to be extended beyond what my patch did, I suggest (1) it does it for the system wide setting as well and/or (2) calling a helper function to determine if the vma could be backed by thp in the first place regardless of any setting to determine if nh/hg is important. The last thing I suggest is done is adding a third place to check.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue, Oct 02, 2018 at 01:29:42PM -0700, David Rientjes wrote: > On Tue, 2 Oct 2018, Michal Hocko wrote: > > > On Wed 26-09-18 08:06:24, Michal Hocko wrote: > > > On Tue 25-09-18 15:04:06, Andrew Morton wrote: > > > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes > > > > wrote: > > > > > > > > > > > It is also used in > > > > > > > automated testing to ensure that vmas get disabled for thp > > > > > > > appropriately > > > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously > > > > > > > enforced > > > > > > > this, and those tests now break. > > > > > > > > > > > > This sounds like a bit of an abuse to me. It shows how an internal > > > > > > implementation detail leaks out to the userspace which is something > > > > > > we > > > > > > should try to avoid. > > > > > > > > > > > > > > > > Well, it's already how this has worked for years before commit > > > > > 1860033237d4 broke it. Changing the implementation in the kernel is > > > > > fine > > > > > as long as you don't break userspace who relies on what is exported > > > > > to it > > > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it > > > > > from > > > > > being backed by hugepages. > > > > > > > > 1860033237d4 was over a year ago so perhaps we don't need to be > > > > too worried about restoring the old interface. In which case > > > > we have an opportunity to make improvements such as that suggested > > > > by Michal? > > > > > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace > > > somehow? E.g. /proc//status. It is a process wide thing so > > > reporting it per VMA sounds strange at best. > > > > So how about this? (not tested yet but it should be pretty > > straightforward) > > Umm, prctl(PR_GET_THP_DISABLE)? > ~/git/linux$ git grep PR_GET_THP_DISABLE include/uapi/linux/prctl.h:#define PR_GET_THP_DISABLE 42 kernel/sys.c: case PR_GET_THP_DISABLE: tools/include/uapi/linux/prctl.h:#define PR_GET_THP_DISABLE 42 -- Sincerely yours, Mike.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue, Oct 02, 2018 at 01:29:42PM -0700, David Rientjes wrote: > On Tue, 2 Oct 2018, Michal Hocko wrote: > > > On Wed 26-09-18 08:06:24, Michal Hocko wrote: > > > On Tue 25-09-18 15:04:06, Andrew Morton wrote: > > > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes > > > > wrote: > > > > > > > > > > > It is also used in > > > > > > > automated testing to ensure that vmas get disabled for thp > > > > > > > appropriately > > > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously > > > > > > > enforced > > > > > > > this, and those tests now break. > > > > > > > > > > > > This sounds like a bit of an abuse to me. It shows how an internal > > > > > > implementation detail leaks out to the userspace which is something > > > > > > we > > > > > > should try to avoid. > > > > > > > > > > > > > > > > Well, it's already how this has worked for years before commit > > > > > 1860033237d4 broke it. Changing the implementation in the kernel is > > > > > fine > > > > > as long as you don't break userspace who relies on what is exported > > > > > to it > > > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it > > > > > from > > > > > being backed by hugepages. > > > > > > > > 1860033237d4 was over a year ago so perhaps we don't need to be > > > > too worried about restoring the old interface. In which case > > > > we have an opportunity to make improvements such as that suggested > > > > by Michal? > > > > > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace > > > somehow? E.g. /proc//status. It is a process wide thing so > > > reporting it per VMA sounds strange at best. > > > > So how about this? (not tested yet but it should be pretty > > straightforward) > > Umm, prctl(PR_GET_THP_DISABLE)? > ~/git/linux$ git grep PR_GET_THP_DISABLE include/uapi/linux/prctl.h:#define PR_GET_THP_DISABLE 42 kernel/sys.c: case PR_GET_THP_DISABLE: tools/include/uapi/linux/prctl.h:#define PR_GET_THP_DISABLE 42 -- Sincerely yours, Mike.
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue 02-10-18 13:29:42, David Rientjes wrote: > On Tue, 2 Oct 2018, Michal Hocko wrote: > > > On Wed 26-09-18 08:06:24, Michal Hocko wrote: > > > On Tue 25-09-18 15:04:06, Andrew Morton wrote: > > > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes > > > > wrote: > > > > > > > > > > > It is also used in > > > > > > > automated testing to ensure that vmas get disabled for thp > > > > > > > appropriately > > > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously > > > > > > > enforced > > > > > > > this, and those tests now break. > > > > > > > > > > > > This sounds like a bit of an abuse to me. It shows how an internal > > > > > > implementation detail leaks out to the userspace which is something > > > > > > we > > > > > > should try to avoid. > > > > > > > > > > > > > > > > Well, it's already how this has worked for years before commit > > > > > 1860033237d4 broke it. Changing the implementation in the kernel is > > > > > fine > > > > > as long as you don't break userspace who relies on what is exported > > > > > to it > > > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it > > > > > from > > > > > being backed by hugepages. > > > > > > > > 1860033237d4 was over a year ago so perhaps we don't need to be > > > > too worried about restoring the old interface. In which case > > > > we have an opportunity to make improvements such as that suggested > > > > by Michal? > > > > > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace > > > somehow? E.g. /proc//status. It is a process wide thing so > > > reporting it per VMA sounds strange at best. > > > > So how about this? (not tested yet but it should be pretty > > straightforward) > > Umm, prctl(PR_GET_THP_DISABLE)? /me confused. I thought you want to query for the flag on a _different_ process. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue 02-10-18 13:29:42, David Rientjes wrote: > On Tue, 2 Oct 2018, Michal Hocko wrote: > > > On Wed 26-09-18 08:06:24, Michal Hocko wrote: > > > On Tue 25-09-18 15:04:06, Andrew Morton wrote: > > > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes > > > > wrote: > > > > > > > > > > > It is also used in > > > > > > > automated testing to ensure that vmas get disabled for thp > > > > > > > appropriately > > > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously > > > > > > > enforced > > > > > > > this, and those tests now break. > > > > > > > > > > > > This sounds like a bit of an abuse to me. It shows how an internal > > > > > > implementation detail leaks out to the userspace which is something > > > > > > we > > > > > > should try to avoid. > > > > > > > > > > > > > > > > Well, it's already how this has worked for years before commit > > > > > 1860033237d4 broke it. Changing the implementation in the kernel is > > > > > fine > > > > > as long as you don't break userspace who relies on what is exported > > > > > to it > > > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it > > > > > from > > > > > being backed by hugepages. > > > > > > > > 1860033237d4 was over a year ago so perhaps we don't need to be > > > > too worried about restoring the old interface. In which case > > > > we have an opportunity to make improvements such as that suggested > > > > by Michal? > > > > > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace > > > somehow? E.g. /proc//status. It is a process wide thing so > > > reporting it per VMA sounds strange at best. > > > > So how about this? (not tested yet but it should be pretty > > straightforward) > > Umm, prctl(PR_GET_THP_DISABLE)? /me confused. I thought you want to query for the flag on a _different_ process. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue, 2 Oct 2018, Michal Hocko wrote: > On Wed 26-09-18 08:06:24, Michal Hocko wrote: > > On Tue 25-09-18 15:04:06, Andrew Morton wrote: > > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes > > > wrote: > > > > > > > > > It is also used in > > > > > > automated testing to ensure that vmas get disabled for thp > > > > > > appropriately > > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously > > > > > > enforced > > > > > > this, and those tests now break. > > > > > > > > > > This sounds like a bit of an abuse to me. It shows how an internal > > > > > implementation detail leaks out to the userspace which is something we > > > > > should try to avoid. > > > > > > > > > > > > > Well, it's already how this has worked for years before commit > > > > 1860033237d4 broke it. Changing the implementation in the kernel is > > > > fine > > > > as long as you don't break userspace who relies on what is exported to > > > > it > > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it > > > > from > > > > being backed by hugepages. > > > > > > 1860033237d4 was over a year ago so perhaps we don't need to be > > > too worried about restoring the old interface. In which case > > > we have an opportunity to make improvements such as that suggested > > > by Michal? > > > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace > > somehow? E.g. /proc//status. It is a process wide thing so > > reporting it per VMA sounds strange at best. > > So how about this? (not tested yet but it should be pretty > straightforward) Umm, prctl(PR_GET_THP_DISABLE)?
Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Tue, 2 Oct 2018, Michal Hocko wrote: > On Wed 26-09-18 08:06:24, Michal Hocko wrote: > > On Tue 25-09-18 15:04:06, Andrew Morton wrote: > > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes > > > wrote: > > > > > > > > > It is also used in > > > > > > automated testing to ensure that vmas get disabled for thp > > > > > > appropriately > > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously > > > > > > enforced > > > > > > this, and those tests now break. > > > > > > > > > > This sounds like a bit of an abuse to me. It shows how an internal > > > > > implementation detail leaks out to the userspace which is something we > > > > > should try to avoid. > > > > > > > > > > > > > Well, it's already how this has worked for years before commit > > > > 1860033237d4 broke it. Changing the implementation in the kernel is > > > > fine > > > > as long as you don't break userspace who relies on what is exported to > > > > it > > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it > > > > from > > > > being backed by hugepages. > > > > > > 1860033237d4 was over a year ago so perhaps we don't need to be > > > too worried about restoring the old interface. In which case > > > we have an opportunity to make improvements such as that suggested > > > by Michal? > > > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace > > somehow? E.g. /proc//status. It is a process wide thing so > > reporting it per VMA sounds strange at best. > > So how about this? (not tested yet but it should be pretty > straightforward) Umm, prctl(PR_GET_THP_DISABLE)?
[RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed 26-09-18 08:06:24, Michal Hocko wrote: > On Tue 25-09-18 15:04:06, Andrew Morton wrote: > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes > > wrote: > > > > > > > It is also used in > > > > > automated testing to ensure that vmas get disabled for thp > > > > > appropriately > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously > > > > > enforced > > > > > this, and those tests now break. > > > > > > > > This sounds like a bit of an abuse to me. It shows how an internal > > > > implementation detail leaks out to the userspace which is something we > > > > should try to avoid. > > > > > > > > > > Well, it's already how this has worked for years before commit > > > 1860033237d4 broke it. Changing the implementation in the kernel is fine > > > as long as you don't break userspace who relies on what is exported to it > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it from > > > being backed by hugepages. > > > > 1860033237d4 was over a year ago so perhaps we don't need to be > > too worried about restoring the old interface. In which case > > we have an opportunity to make improvements such as that suggested > > by Michal? > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace > somehow? E.g. /proc//status. It is a process wide thing so > reporting it per VMA sounds strange at best. So how about this? (not tested yet but it should be pretty straightforward) --- >From 048b29102de326900b54cce78b614345cd77a230 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 2 Oct 2018 10:53:48 +0200 Subject: [PATCH] mm, proc: report PR_SET_THP_DISABLE in proc David Rientjes has reported that 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") has changed the way how we report THPable VMAs to the userspace. Their monitoring tool is triggering false alarms on PR_SET_THP_DISABLE tasks because it considers an insufficient THP usage as a memory fragmentation resp. memory pressure issue. Before the said commit each newly created VMA inherited VM_NOHUGEPAGE flag and that got exposed to the userspace via /proc//smaps file. This implementation had its downsides as explained in the commit message but it is true that the userspace doesn't have any means to query for the process wide THP enabled/disabled status. PR_SET_THP_DISABLE is a process wide flag so it makes a lot of sense to export in the process wide context rather than per-vma. Introduce a new field to /proc//status which export this status. If PR_SET_THP_DISABLE is used the it reports false same as when the THP is not compiled in. It doesn't consider the global THP status because we already export that information via sysfs Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") Signed-off-by: Michal Hocko --- Documentation/filesystems/proc.txt | 3 +++ fs/proc/array.c| 10 ++ 2 files changed, 13 insertions(+) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 22b4b00dee31..bafa5cb1685a 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -182,6 +182,7 @@ For example, to get the status information of a process, all you have to do is VmSwap:0 kB HugetlbPages: 0 kB CoreDumping:0 + THP_enabled: 1 Threads:1 SigQ: 0/28578 SigPnd: @@ -256,6 +257,8 @@ Table 1-2: Contents of the status files (as of 4.8) HugetlbPagessize of hugetlb memory portions CoreDumping process's memory is currently being dumped (killing the process may lead to a corrupted core) + THP_enabledprocess is allowed to use THP (returns 0 when +PR_SET_THP_DISABLE is set on the process Threads number of threads SigQnumber of signals queued/max. number for queue SigPnd bitmap of pending signals for the thread diff --git a/fs/proc/array.c b/fs/proc/array.c index 0ceb3b6b37e7..9d428d5a0ac8 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -392,6 +392,15 @@ static inline void task_core_dumping(struct seq_file *m, struct mm_struct *mm) seq_putc(m, '\n'); } +static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm) +{ + bool thp_enabled = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE); + + if (thp_enabled) + thp_enabled = !test_bit(MMF_DISABLE_THP, >flags); + seq_printf(m, "THP_enabled:\t%d\n", thp_enabled); +} + int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { @@ -406,6 +415,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, if (mm) { task_mem(m, mm); task_core_dumping(m, mm); + task_thp_status(m, mm);
[RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
On Wed 26-09-18 08:06:24, Michal Hocko wrote: > On Tue 25-09-18 15:04:06, Andrew Morton wrote: > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes > > wrote: > > > > > > > It is also used in > > > > > automated testing to ensure that vmas get disabled for thp > > > > > appropriately > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously > > > > > enforced > > > > > this, and those tests now break. > > > > > > > > This sounds like a bit of an abuse to me. It shows how an internal > > > > implementation detail leaks out to the userspace which is something we > > > > should try to avoid. > > > > > > > > > > Well, it's already how this has worked for years before commit > > > 1860033237d4 broke it. Changing the implementation in the kernel is fine > > > as long as you don't break userspace who relies on what is exported to it > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it from > > > being backed by hugepages. > > > > 1860033237d4 was over a year ago so perhaps we don't need to be > > too worried about restoring the old interface. In which case > > we have an opportunity to make improvements such as that suggested > > by Michal? > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace > somehow? E.g. /proc//status. It is a process wide thing so > reporting it per VMA sounds strange at best. So how about this? (not tested yet but it should be pretty straightforward) --- >From 048b29102de326900b54cce78b614345cd77a230 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 2 Oct 2018 10:53:48 +0200 Subject: [PATCH] mm, proc: report PR_SET_THP_DISABLE in proc David Rientjes has reported that 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") has changed the way how we report THPable VMAs to the userspace. Their monitoring tool is triggering false alarms on PR_SET_THP_DISABLE tasks because it considers an insufficient THP usage as a memory fragmentation resp. memory pressure issue. Before the said commit each newly created VMA inherited VM_NOHUGEPAGE flag and that got exposed to the userspace via /proc//smaps file. This implementation had its downsides as explained in the commit message but it is true that the userspace doesn't have any means to query for the process wide THP enabled/disabled status. PR_SET_THP_DISABLE is a process wide flag so it makes a lot of sense to export in the process wide context rather than per-vma. Introduce a new field to /proc//status which export this status. If PR_SET_THP_DISABLE is used the it reports false same as when the THP is not compiled in. It doesn't consider the global THP status because we already export that information via sysfs Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") Signed-off-by: Michal Hocko --- Documentation/filesystems/proc.txt | 3 +++ fs/proc/array.c| 10 ++ 2 files changed, 13 insertions(+) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 22b4b00dee31..bafa5cb1685a 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -182,6 +182,7 @@ For example, to get the status information of a process, all you have to do is VmSwap:0 kB HugetlbPages: 0 kB CoreDumping:0 + THP_enabled: 1 Threads:1 SigQ: 0/28578 SigPnd: @@ -256,6 +257,8 @@ Table 1-2: Contents of the status files (as of 4.8) HugetlbPagessize of hugetlb memory portions CoreDumping process's memory is currently being dumped (killing the process may lead to a corrupted core) + THP_enabledprocess is allowed to use THP (returns 0 when +PR_SET_THP_DISABLE is set on the process Threads number of threads SigQnumber of signals queued/max. number for queue SigPnd bitmap of pending signals for the thread diff --git a/fs/proc/array.c b/fs/proc/array.c index 0ceb3b6b37e7..9d428d5a0ac8 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -392,6 +392,15 @@ static inline void task_core_dumping(struct seq_file *m, struct mm_struct *mm) seq_putc(m, '\n'); } +static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm) +{ + bool thp_enabled = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE); + + if (thp_enabled) + thp_enabled = !test_bit(MMF_DISABLE_THP, >flags); + seq_printf(m, "THP_enabled:\t%d\n", thp_enabled); +} + int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { @@ -406,6 +415,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, if (mm) { task_mem(m, mm); task_core_dumping(m, mm); + task_thp_status(m, mm);