Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On 06/02/2017 11:10 PM, Andrew Morton wrote: > On Fri, 2 Jun 2017 22:55:12 +0200 Vlastimil Babkawrote: > >> On 06/02/2017 10:40 PM, Andrew Morton wrote: >>> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka wrote: > Perhaps we should be adding new prctl modes to select this new > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? I think we can reasonably assume that most users of the prctl do just the fork() & exec() thing, so they will be unaffected. >>> >>> That sounds optimistic. Perhaps people are using the current behaviour >>> to set on particular mapping to MMF_DISABLE_THP, with >>> >>> prctl(PR_SET_THP_DISABLE) >>> mmap() >>> prctl(PR_CLR_THP_DISABLE) >>> >>> ? >>> >>> Seems a reasonable thing to do. >> >> Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same >> effect. And it's older (2.6.38). >> >>> But who knows - people do all sorts of >>> inventive things. >> >> Yeah :( but we can hope they don't even know that the prctl currently >> behaves they way it does - man page doesn't suggest it would, and most >> of us in this thread found it surprising. > > Well. There might be such people and sometimes we do make people > unhappy. it partly depends on how traumatic it would be to leave the > current behaviour as-is. Have you evaluated such a patch? You mean introducing a new prctl instead of changing the existing one? I can evaluate that as being ugly :) Well, maybe we could use arg3, because currently we have: case PR_SET_THP_DISABLE: if (arg3 || arg4 || arg5) return -EINVAL; We could make non-zero arg3 (or specific value of arg3) set the new "immediate" behavior. This would also take care of the discovery of kernels that support the fixed/altered behavior, without having to check uname etc - just check if we got -EINVAL. I'm just not sure how to implement PR_GET_THP_DISABLE properly in such scenario. Or what happens when somebody calls SET with arg3==0 and then arg3==1 (or vice versa). But we would have to think about it even when we introduced a newly named option. Reminds me of the MLOCK_ONFAULT discussions... And as usual, if somebody does complain in the end, we revert and try the other way? >>> >>> But by then it's too late - the new behaviour will be out in the field. >> >> Revert in stable then? >> But I don't think this patch should go to stable. I understand right >> that CRIU will switch to the UFFDIO_COPY approach and doesn't need the >> prctl change/new madvise anymore? > > What I mean is that the new behaviour will go out in 4.12 and it may > be many months before we find out that we broke someone. By then, we > can't go back because others may be assuming the new behaviour. >
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On 06/02/2017 11:10 PM, Andrew Morton wrote: > On Fri, 2 Jun 2017 22:55:12 +0200 Vlastimil Babka wrote: > >> On 06/02/2017 10:40 PM, Andrew Morton wrote: >>> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka wrote: > Perhaps we should be adding new prctl modes to select this new > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? I think we can reasonably assume that most users of the prctl do just the fork() & exec() thing, so they will be unaffected. >>> >>> That sounds optimistic. Perhaps people are using the current behaviour >>> to set on particular mapping to MMF_DISABLE_THP, with >>> >>> prctl(PR_SET_THP_DISABLE) >>> mmap() >>> prctl(PR_CLR_THP_DISABLE) >>> >>> ? >>> >>> Seems a reasonable thing to do. >> >> Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same >> effect. And it's older (2.6.38). >> >>> But who knows - people do all sorts of >>> inventive things. >> >> Yeah :( but we can hope they don't even know that the prctl currently >> behaves they way it does - man page doesn't suggest it would, and most >> of us in this thread found it surprising. > > Well. There might be such people and sometimes we do make people > unhappy. it partly depends on how traumatic it would be to leave the > current behaviour as-is. Have you evaluated such a patch? You mean introducing a new prctl instead of changing the existing one? I can evaluate that as being ugly :) Well, maybe we could use arg3, because currently we have: case PR_SET_THP_DISABLE: if (arg3 || arg4 || arg5) return -EINVAL; We could make non-zero arg3 (or specific value of arg3) set the new "immediate" behavior. This would also take care of the discovery of kernels that support the fixed/altered behavior, without having to check uname etc - just check if we got -EINVAL. I'm just not sure how to implement PR_GET_THP_DISABLE properly in such scenario. Or what happens when somebody calls SET with arg3==0 and then arg3==1 (or vice versa). But we would have to think about it even when we introduced a newly named option. Reminds me of the MLOCK_ONFAULT discussions... And as usual, if somebody does complain in the end, we revert and try the other way? >>> >>> But by then it's too late - the new behaviour will be out in the field. >> >> Revert in stable then? >> But I don't think this patch should go to stable. I understand right >> that CRIU will switch to the UFFDIO_COPY approach and doesn't need the >> prctl change/new madvise anymore? > > What I mean is that the new behaviour will go out in 4.12 and it may > be many months before we find out that we broke someone. By then, we > can't go back because others may be assuming the new behaviour. >
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Sat, Jun 03, 2017 at 01:34:52PM +0300, Mike Rapoprt wrote: > > > On June 2, 2017 11:55:12 PM GMT+03:00, Vlastimil Babkawrote: > >On 06/02/2017 10:40 PM, Andrew Morton wrote: > >> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka > >wrote: > Perhaps we should be adding new prctl modes to select this new > behaviour and leave the existing PR_SET_THP_DISABLE behaviour > >as-is? > >>> > >>> I think we can reasonably assume that most users of the prctl do > >just > >>> the fork() & exec() thing, so they will be unaffected. > >> > >> That sounds optimistic. Perhaps people are using the current > >behaviour > >> to set on particular mapping to MMF_DISABLE_THP, with > >> > >>prctl(PR_SET_THP_DISABLE) > >>mmap() > >>prctl(PR_CLR_THP_DISABLE) > >> > >> ? > >> > >> Seems a reasonable thing to do. > > > >Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same > >effect. And it's older (2.6.38). > > > >> But who knows - people do all sorts of > >> inventive things. > > > >Yeah :( but we can hope they don't even know that the prctl currently > >behaves they way it does - man page doesn't suggest it would, and most > >of us in this thread found it surprising. > > > >>> And as usual, if > >>> somebody does complain in the end, we revert and try the other way? > >> > >> But by then it's too late - the new behaviour will be out in the > >field. > > > >Revert in stable then? > >But I don't think this patch should go to stable. I understand right > >that CRIU will switch to the UFFDIO_COPY approach and doesn't need the > >prctl change/new madvise anymore? > > Yes, we are going to use UFFDIO_COPY. We still might want to have control > over THP in the future without changing per-VMA flags, though. Unfortunately, I was over optimistic about ability of CRIU to use UFFDIO_COPY for pre-copy part :( I was too concentrated on the simplified flow and overlooked some important details. After I've spent some time trying to actually implement usage of UFFDIO_COPY, I realized that registering memory with userfault at that point of the restore flow quite contradicts CRIU architecture :( That said, we would really want to have the interface this patch proposes. -- Sincerely yours, Mike.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Sat, Jun 03, 2017 at 01:34:52PM +0300, Mike Rapoprt wrote: > > > On June 2, 2017 11:55:12 PM GMT+03:00, Vlastimil Babka wrote: > >On 06/02/2017 10:40 PM, Andrew Morton wrote: > >> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka > >wrote: > Perhaps we should be adding new prctl modes to select this new > behaviour and leave the existing PR_SET_THP_DISABLE behaviour > >as-is? > >>> > >>> I think we can reasonably assume that most users of the prctl do > >just > >>> the fork() & exec() thing, so they will be unaffected. > >> > >> That sounds optimistic. Perhaps people are using the current > >behaviour > >> to set on particular mapping to MMF_DISABLE_THP, with > >> > >>prctl(PR_SET_THP_DISABLE) > >>mmap() > >>prctl(PR_CLR_THP_DISABLE) > >> > >> ? > >> > >> Seems a reasonable thing to do. > > > >Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same > >effect. And it's older (2.6.38). > > > >> But who knows - people do all sorts of > >> inventive things. > > > >Yeah :( but we can hope they don't even know that the prctl currently > >behaves they way it does - man page doesn't suggest it would, and most > >of us in this thread found it surprising. > > > >>> And as usual, if > >>> somebody does complain in the end, we revert and try the other way? > >> > >> But by then it's too late - the new behaviour will be out in the > >field. > > > >Revert in stable then? > >But I don't think this patch should go to stable. I understand right > >that CRIU will switch to the UFFDIO_COPY approach and doesn't need the > >prctl change/new madvise anymore? > > Yes, we are going to use UFFDIO_COPY. We still might want to have control > over THP in the future without changing per-VMA flags, though. Unfortunately, I was over optimistic about ability of CRIU to use UFFDIO_COPY for pre-copy part :( I was too concentrated on the simplified flow and overlooked some important details. After I've spent some time trying to actually implement usage of UFFDIO_COPY, I realized that registering memory with userfault at that point of the restore flow quite contradicts CRIU architecture :( That said, we would really want to have the interface this patch proposes. -- Sincerely yours, Mike.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On June 2, 2017 10:50:59 PM GMT+03:00, Andrew Mortonwrote: >On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport" > wrote: > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect >any >> existing mapping because it only updated mm->def_flags which is a >template >> for new mappings. The mappings created after >prctl(PR_SET_THP_DISABLE) have >> VM_NOHUGEPAGE flag set. This can be quite surprising for all those >> applications which do not do prctl(); fork() & exec() and want to >control >> their own THP behavior. >> >> Another usecase when the immediate semantic of the prctl might be >useful is >> a combination of pre- and post-copy migration of containers with >CRIU. In >> this case CRIU populates a part of a memory region with data that was >saved >> during the pre-copy stage. Afterwards, the region is registered with >> userfaultfd and CRIU expects to get page faults for the parts of the >region >> that were not yet populated. However, khugepaged collapses the pages >and >> the expected page faults do not occur. >> >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as >a >> temporary mechanism for enabling/disabling THP process wide. >> >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag >is >> tested when decision whether to use huge pages is taken either during >page >> fault of at the time of THP collapse. >> >> It should be noted, that the new implementation makes >PR_SET_THP_DISABLE >> master override to any per-VMA setting, which was not the case >previously. >> >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and >PRCTL_THP_DISABLE") > >"Fixes" is a bit strong. I'd say "alters". And significantly altering >the runtime behaviour of a three-year-old interface is rather a worry, >no? Well, there are people that consider current behavior as bug :) One can argue we alter the implementation​details and users should not rely on that... >Perhaps we should be adding new prctl modes to select this new >behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? -- Sincerely yours, Mike.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On June 2, 2017 10:50:59 PM GMT+03:00, Andrew Morton wrote: >On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport" > wrote: > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect >any >> existing mapping because it only updated mm->def_flags which is a >template >> for new mappings. The mappings created after >prctl(PR_SET_THP_DISABLE) have >> VM_NOHUGEPAGE flag set. This can be quite surprising for all those >> applications which do not do prctl(); fork() & exec() and want to >control >> their own THP behavior. >> >> Another usecase when the immediate semantic of the prctl might be >useful is >> a combination of pre- and post-copy migration of containers with >CRIU. In >> this case CRIU populates a part of a memory region with data that was >saved >> during the pre-copy stage. Afterwards, the region is registered with >> userfaultfd and CRIU expects to get page faults for the parts of the >region >> that were not yet populated. However, khugepaged collapses the pages >and >> the expected page faults do not occur. >> >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as >a >> temporary mechanism for enabling/disabling THP process wide. >> >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag >is >> tested when decision whether to use huge pages is taken either during >page >> fault of at the time of THP collapse. >> >> It should be noted, that the new implementation makes >PR_SET_THP_DISABLE >> master override to any per-VMA setting, which was not the case >previously. >> >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and >PRCTL_THP_DISABLE") > >"Fixes" is a bit strong. I'd say "alters". And significantly altering >the runtime behaviour of a three-year-old interface is rather a worry, >no? Well, there are people that consider current behavior as bug :) One can argue we alter the implementation​details and users should not rely on that... >Perhaps we should be adding new prctl modes to select this new >behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? -- Sincerely yours, Mike.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On June 2, 2017 11:55:12 PM GMT+03:00, Vlastimil Babkawrote: >On 06/02/2017 10:40 PM, Andrew Morton wrote: >> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka >wrote: Perhaps we should be adding new prctl modes to select this new behaviour and leave the existing PR_SET_THP_DISABLE behaviour >as-is? >>> >>> I think we can reasonably assume that most users of the prctl do >just >>> the fork() & exec() thing, so they will be unaffected. >> >> That sounds optimistic. Perhaps people are using the current >behaviour >> to set on particular mapping to MMF_DISABLE_THP, with >> >> prctl(PR_SET_THP_DISABLE) >> mmap() >> prctl(PR_CLR_THP_DISABLE) >> >> ? >> >> Seems a reasonable thing to do. > >Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same >effect. And it's older (2.6.38). > >> But who knows - people do all sorts of >> inventive things. > >Yeah :( but we can hope they don't even know that the prctl currently >behaves they way it does - man page doesn't suggest it would, and most >of us in this thread found it surprising. > >>> And as usual, if >>> somebody does complain in the end, we revert and try the other way? >> >> But by then it's too late - the new behaviour will be out in the >field. > >Revert in stable then? >But I don't think this patch should go to stable. I understand right >that CRIU will switch to the UFFDIO_COPY approach and doesn't need the >prctl change/new madvise anymore? Yes, we are going to use UFFDIO_COPY. We still might want to have control over THP in the future without changing per-VMA flags, though. -- Sincerely yours, Mike.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On June 2, 2017 11:55:12 PM GMT+03:00, Vlastimil Babka wrote: >On 06/02/2017 10:40 PM, Andrew Morton wrote: >> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka >wrote: Perhaps we should be adding new prctl modes to select this new behaviour and leave the existing PR_SET_THP_DISABLE behaviour >as-is? >>> >>> I think we can reasonably assume that most users of the prctl do >just >>> the fork() & exec() thing, so they will be unaffected. >> >> That sounds optimistic. Perhaps people are using the current >behaviour >> to set on particular mapping to MMF_DISABLE_THP, with >> >> prctl(PR_SET_THP_DISABLE) >> mmap() >> prctl(PR_CLR_THP_DISABLE) >> >> ? >> >> Seems a reasonable thing to do. > >Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same >effect. And it's older (2.6.38). > >> But who knows - people do all sorts of >> inventive things. > >Yeah :( but we can hope they don't even know that the prctl currently >behaves they way it does - man page doesn't suggest it would, and most >of us in this thread found it surprising. > >>> And as usual, if >>> somebody does complain in the end, we revert and try the other way? >> >> But by then it's too late - the new behaviour will be out in the >field. > >Revert in stable then? >But I don't think this patch should go to stable. I understand right >that CRIU will switch to the UFFDIO_COPY approach and doesn't need the >prctl change/new madvise anymore? Yes, we are going to use UFFDIO_COPY. We still might want to have control over THP in the future without changing per-VMA flags, though. -- Sincerely yours, Mike.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Fri 02-06-17 13:40:38, Andrew Morton wrote: > On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babkawrote: > > > On 06/02/2017 09:50 PM, Andrew Morton wrote: > > > On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport" > > > wrote: > > > > > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any > > >> existing mapping because it only updated mm->def_flags which is a > > >> template > > >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) > > >> have > > >> VM_NOHUGEPAGE flag set. This can be quite surprising for all those > > >> applications which do not do prctl(); fork() & exec() and want to control > > >> their own THP behavior. > > >> > > >> Another usecase when the immediate semantic of the prctl might be useful > > >> is > > >> a combination of pre- and post-copy migration of containers with CRIU. > > >> In > > >> this case CRIU populates a part of a memory region with data that was > > >> saved > > >> during the pre-copy stage. Afterwards, the region is registered with > > >> userfaultfd and CRIU expects to get page faults for the parts of the > > >> region > > >> that were not yet populated. However, khugepaged collapses the pages and > > >> the expected page faults do not occur. > > >> > > >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a > > >> temporary mechanism for enabling/disabling THP process wide. > > >> > > >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is > > >> tested when decision whether to use huge pages is taken either during > > >> page > > >> fault of at the time of THP collapse. > > >> > > >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE > > >> master override to any per-VMA setting, which was not the case > > >> previously. > > >> > > >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and > > >> PRCTL_THP_DISABLE") > > > > > > "Fixes" is a bit strong. I'd say "alters". And significantly altering > > > the runtime behaviour of a three-year-old interface is rather a worry, > > > no? > > > > > > Perhaps we should be adding new prctl modes to select this new > > > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? > > > > I think we can reasonably assume that most users of the prctl do just > > the fork() & exec() thing, so they will be unaffected. > > That sounds optimistic. Perhaps people are using the current behaviour > to set on particular mapping to MMF_DISABLE_THP, with > > prctl(PR_SET_THP_DISABLE) > mmap() > prctl(PR_CLR_THP_DISABLE) > > ? > > Seems a reasonable thing to do. Is it? The documentation is not very specific but it is clear about the scope being thread (I would argue process would be more approapriate but whatever) "Set the state of the "THP disable" flag for the calling thread." So the above seems like an incorrect usage to me. > But who knows - people do all sorts of inventive things. well yes. > > And as usual, if > > somebody does complain in the end, we revert and try the other way? > > But by then it's too late - the new behaviour will be out in the field. Well, the interface is currently broken for anything other than prctl & exec. And those will work properly even with the patch. So I am not really sure whether keeping the current status quo is reasonable. -- Michal Hocko SUSE Labs
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Fri 02-06-17 13:40:38, Andrew Morton wrote: > On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka wrote: > > > On 06/02/2017 09:50 PM, Andrew Morton wrote: > > > On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport" > > > wrote: > > > > > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any > > >> existing mapping because it only updated mm->def_flags which is a > > >> template > > >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) > > >> have > > >> VM_NOHUGEPAGE flag set. This can be quite surprising for all those > > >> applications which do not do prctl(); fork() & exec() and want to control > > >> their own THP behavior. > > >> > > >> Another usecase when the immediate semantic of the prctl might be useful > > >> is > > >> a combination of pre- and post-copy migration of containers with CRIU. > > >> In > > >> this case CRIU populates a part of a memory region with data that was > > >> saved > > >> during the pre-copy stage. Afterwards, the region is registered with > > >> userfaultfd and CRIU expects to get page faults for the parts of the > > >> region > > >> that were not yet populated. However, khugepaged collapses the pages and > > >> the expected page faults do not occur. > > >> > > >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a > > >> temporary mechanism for enabling/disabling THP process wide. > > >> > > >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is > > >> tested when decision whether to use huge pages is taken either during > > >> page > > >> fault of at the time of THP collapse. > > >> > > >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE > > >> master override to any per-VMA setting, which was not the case > > >> previously. > > >> > > >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and > > >> PRCTL_THP_DISABLE") > > > > > > "Fixes" is a bit strong. I'd say "alters". And significantly altering > > > the runtime behaviour of a three-year-old interface is rather a worry, > > > no? > > > > > > Perhaps we should be adding new prctl modes to select this new > > > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? > > > > I think we can reasonably assume that most users of the prctl do just > > the fork() & exec() thing, so they will be unaffected. > > That sounds optimistic. Perhaps people are using the current behaviour > to set on particular mapping to MMF_DISABLE_THP, with > > prctl(PR_SET_THP_DISABLE) > mmap() > prctl(PR_CLR_THP_DISABLE) > > ? > > Seems a reasonable thing to do. Is it? The documentation is not very specific but it is clear about the scope being thread (I would argue process would be more approapriate but whatever) "Set the state of the "THP disable" flag for the calling thread." So the above seems like an incorrect usage to me. > But who knows - people do all sorts of inventive things. well yes. > > And as usual, if > > somebody does complain in the end, we revert and try the other way? > > But by then it's too late - the new behaviour will be out in the field. Well, the interface is currently broken for anything other than prctl & exec. And those will work properly even with the patch. So I am not really sure whether keeping the current status quo is reasonable. -- Michal Hocko SUSE Labs
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Fri, 2 Jun 2017 22:55:12 +0200 Vlastimil Babkawrote: > On 06/02/2017 10:40 PM, Andrew Morton wrote: > > On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka wrote: > >>> Perhaps we should be adding new prctl modes to select this new > >>> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? > >> > >> I think we can reasonably assume that most users of the prctl do just > >> the fork() & exec() thing, so they will be unaffected. > > > > That sounds optimistic. Perhaps people are using the current behaviour > > to set on particular mapping to MMF_DISABLE_THP, with > > > > prctl(PR_SET_THP_DISABLE) > > mmap() > > prctl(PR_CLR_THP_DISABLE) > > > > ? > > > > Seems a reasonable thing to do. > > Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same > effect. And it's older (2.6.38). > > > But who knows - people do all sorts of > > inventive things. > > Yeah :( but we can hope they don't even know that the prctl currently > behaves they way it does - man page doesn't suggest it would, and most > of us in this thread found it surprising. Well. There might be such people and sometimes we do make people unhappy. it partly depends on how traumatic it would be to leave the current behaviour as-is. Have you evaluated such a patch? > >> And as usual, if > >> somebody does complain in the end, we revert and try the other way? > > > > But by then it's too late - the new behaviour will be out in the field. > > Revert in stable then? > But I don't think this patch should go to stable. I understand right > that CRIU will switch to the UFFDIO_COPY approach and doesn't need the > prctl change/new madvise anymore? What I mean is that the new behaviour will go out in 4.12 and it may be many months before we find out that we broke someone. By then, we can't go back because others may be assuming the new behaviour.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Fri, 2 Jun 2017 22:55:12 +0200 Vlastimil Babka wrote: > On 06/02/2017 10:40 PM, Andrew Morton wrote: > > On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka wrote: > >>> Perhaps we should be adding new prctl modes to select this new > >>> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? > >> > >> I think we can reasonably assume that most users of the prctl do just > >> the fork() & exec() thing, so they will be unaffected. > > > > That sounds optimistic. Perhaps people are using the current behaviour > > to set on particular mapping to MMF_DISABLE_THP, with > > > > prctl(PR_SET_THP_DISABLE) > > mmap() > > prctl(PR_CLR_THP_DISABLE) > > > > ? > > > > Seems a reasonable thing to do. > > Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same > effect. And it's older (2.6.38). > > > But who knows - people do all sorts of > > inventive things. > > Yeah :( but we can hope they don't even know that the prctl currently > behaves they way it does - man page doesn't suggest it would, and most > of us in this thread found it surprising. Well. There might be such people and sometimes we do make people unhappy. it partly depends on how traumatic it would be to leave the current behaviour as-is. Have you evaluated such a patch? > >> And as usual, if > >> somebody does complain in the end, we revert and try the other way? > > > > But by then it's too late - the new behaviour will be out in the field. > > Revert in stable then? > But I don't think this patch should go to stable. I understand right > that CRIU will switch to the UFFDIO_COPY approach and doesn't need the > prctl change/new madvise anymore? What I mean is that the new behaviour will go out in 4.12 and it may be many months before we find out that we broke someone. By then, we can't go back because others may be assuming the new behaviour.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On 06/02/2017 10:40 PM, Andrew Morton wrote: > On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babkawrote: >>> Perhaps we should be adding new prctl modes to select this new >>> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? >> >> I think we can reasonably assume that most users of the prctl do just >> the fork() & exec() thing, so they will be unaffected. > > That sounds optimistic. Perhaps people are using the current behaviour > to set on particular mapping to MMF_DISABLE_THP, with > > prctl(PR_SET_THP_DISABLE) > mmap() > prctl(PR_CLR_THP_DISABLE) > > ? > > Seems a reasonable thing to do. Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same effect. And it's older (2.6.38). > But who knows - people do all sorts of > inventive things. Yeah :( but we can hope they don't even know that the prctl currently behaves they way it does - man page doesn't suggest it would, and most of us in this thread found it surprising. >> And as usual, if >> somebody does complain in the end, we revert and try the other way? > > But by then it's too late - the new behaviour will be out in the field. Revert in stable then? But I don't think this patch should go to stable. I understand right that CRIU will switch to the UFFDIO_COPY approach and doesn't need the prctl change/new madvise anymore?
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On 06/02/2017 10:40 PM, Andrew Morton wrote: > On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka wrote: >>> Perhaps we should be adding new prctl modes to select this new >>> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? >> >> I think we can reasonably assume that most users of the prctl do just >> the fork() & exec() thing, so they will be unaffected. > > That sounds optimistic. Perhaps people are using the current behaviour > to set on particular mapping to MMF_DISABLE_THP, with > > prctl(PR_SET_THP_DISABLE) > mmap() > prctl(PR_CLR_THP_DISABLE) > > ? > > Seems a reasonable thing to do. Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same effect. And it's older (2.6.38). > But who knows - people do all sorts of > inventive things. Yeah :( but we can hope they don't even know that the prctl currently behaves they way it does - man page doesn't suggest it would, and most of us in this thread found it surprising. >> And as usual, if >> somebody does complain in the end, we revert and try the other way? > > But by then it's too late - the new behaviour will be out in the field. Revert in stable then? But I don't think this patch should go to stable. I understand right that CRIU will switch to the UFFDIO_COPY approach and doesn't need the prctl change/new madvise anymore?
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babkawrote: > On 06/02/2017 09:50 PM, Andrew Morton wrote: > > On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport" > > wrote: > > > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any > >> existing mapping because it only updated mm->def_flags which is a template > >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have > >> VM_NOHUGEPAGE flag set. This can be quite surprising for all those > >> applications which do not do prctl(); fork() & exec() and want to control > >> their own THP behavior. > >> > >> Another usecase when the immediate semantic of the prctl might be useful is > >> a combination of pre- and post-copy migration of containers with CRIU. In > >> this case CRIU populates a part of a memory region with data that was saved > >> during the pre-copy stage. Afterwards, the region is registered with > >> userfaultfd and CRIU expects to get page faults for the parts of the region > >> that were not yet populated. However, khugepaged collapses the pages and > >> the expected page faults do not occur. > >> > >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a > >> temporary mechanism for enabling/disabling THP process wide. > >> > >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is > >> tested when decision whether to use huge pages is taken either during page > >> fault of at the time of THP collapse. > >> > >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE > >> master override to any per-VMA setting, which was not the case previously. > >> > >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE") > > > > "Fixes" is a bit strong. I'd say "alters". And significantly altering > > the runtime behaviour of a three-year-old interface is rather a worry, > > no? > > > > Perhaps we should be adding new prctl modes to select this new > > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? > > I think we can reasonably assume that most users of the prctl do just > the fork() & exec() thing, so they will be unaffected. That sounds optimistic. Perhaps people are using the current behaviour to set on particular mapping to MMF_DISABLE_THP, with prctl(PR_SET_THP_DISABLE) mmap() prctl(PR_CLR_THP_DISABLE) ? Seems a reasonable thing to do. But who knows - people do all sorts of inventive things. > And as usual, if > somebody does complain in the end, we revert and try the other way? But by then it's too late - the new behaviour will be out in the field.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka wrote: > On 06/02/2017 09:50 PM, Andrew Morton wrote: > > On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport" > > wrote: > > > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any > >> existing mapping because it only updated mm->def_flags which is a template > >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have > >> VM_NOHUGEPAGE flag set. This can be quite surprising for all those > >> applications which do not do prctl(); fork() & exec() and want to control > >> their own THP behavior. > >> > >> Another usecase when the immediate semantic of the prctl might be useful is > >> a combination of pre- and post-copy migration of containers with CRIU. In > >> this case CRIU populates a part of a memory region with data that was saved > >> during the pre-copy stage. Afterwards, the region is registered with > >> userfaultfd and CRIU expects to get page faults for the parts of the region > >> that were not yet populated. However, khugepaged collapses the pages and > >> the expected page faults do not occur. > >> > >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a > >> temporary mechanism for enabling/disabling THP process wide. > >> > >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is > >> tested when decision whether to use huge pages is taken either during page > >> fault of at the time of THP collapse. > >> > >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE > >> master override to any per-VMA setting, which was not the case previously. > >> > >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE") > > > > "Fixes" is a bit strong. I'd say "alters". And significantly altering > > the runtime behaviour of a three-year-old interface is rather a worry, > > no? > > > > Perhaps we should be adding new prctl modes to select this new > > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? > > I think we can reasonably assume that most users of the prctl do just > the fork() & exec() thing, so they will be unaffected. That sounds optimistic. Perhaps people are using the current behaviour to set on particular mapping to MMF_DISABLE_THP, with prctl(PR_SET_THP_DISABLE) mmap() prctl(PR_CLR_THP_DISABLE) ? Seems a reasonable thing to do. But who knows - people do all sorts of inventive things. > And as usual, if > somebody does complain in the end, we revert and try the other way? But by then it's too late - the new behaviour will be out in the field.
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On 06/02/2017 09:50 PM, Andrew Morton wrote: > On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport"> wrote: > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any >> existing mapping because it only updated mm->def_flags which is a template >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have >> VM_NOHUGEPAGE flag set. This can be quite surprising for all those >> applications which do not do prctl(); fork() & exec() and want to control >> their own THP behavior. >> >> Another usecase when the immediate semantic of the prctl might be useful is >> a combination of pre- and post-copy migration of containers with CRIU. In >> this case CRIU populates a part of a memory region with data that was saved >> during the pre-copy stage. Afterwards, the region is registered with >> userfaultfd and CRIU expects to get page faults for the parts of the region >> that were not yet populated. However, khugepaged collapses the pages and >> the expected page faults do not occur. >> >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a >> temporary mechanism for enabling/disabling THP process wide. >> >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is >> tested when decision whether to use huge pages is taken either during page >> fault of at the time of THP collapse. >> >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE >> master override to any per-VMA setting, which was not the case previously. >> >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE") > > "Fixes" is a bit strong. I'd say "alters". And significantly altering > the runtime behaviour of a three-year-old interface is rather a worry, > no? > > Perhaps we should be adding new prctl modes to select this new > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? I think we can reasonably assume that most users of the prctl do just the fork() & exec() thing, so they will be unaffected. And as usual, if somebody does complain in the end, we revert and try the other way?
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On 06/02/2017 09:50 PM, Andrew Morton wrote: > On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport" > wrote: > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any >> existing mapping because it only updated mm->def_flags which is a template >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have >> VM_NOHUGEPAGE flag set. This can be quite surprising for all those >> applications which do not do prctl(); fork() & exec() and want to control >> their own THP behavior. >> >> Another usecase when the immediate semantic of the prctl might be useful is >> a combination of pre- and post-copy migration of containers with CRIU. In >> this case CRIU populates a part of a memory region with data that was saved >> during the pre-copy stage. Afterwards, the region is registered with >> userfaultfd and CRIU expects to get page faults for the parts of the region >> that were not yet populated. However, khugepaged collapses the pages and >> the expected page faults do not occur. >> >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a >> temporary mechanism for enabling/disabling THP process wide. >> >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is >> tested when decision whether to use huge pages is taken either during page >> fault of at the time of THP collapse. >> >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE >> master override to any per-VMA setting, which was not the case previously. >> >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE") > > "Fixes" is a bit strong. I'd say "alters". And significantly altering > the runtime behaviour of a three-year-old interface is rather a worry, > no? > > Perhaps we should be adding new prctl modes to select this new > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is? I think we can reasonably assume that most users of the prctl do just the fork() & exec() thing, so they will be unaffected. And as usual, if somebody does complain in the end, we revert and try the other way?
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport"wrote: > PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any > existing mapping because it only updated mm->def_flags which is a template > for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have > VM_NOHUGEPAGE flag set. This can be quite surprising for all those > applications which do not do prctl(); fork() & exec() and want to control > their own THP behavior. > > Another usecase when the immediate semantic of the prctl might be useful is > a combination of pre- and post-copy migration of containers with CRIU. In > this case CRIU populates a part of a memory region with data that was saved > during the pre-copy stage. Afterwards, the region is registered with > userfaultfd and CRIU expects to get page faults for the parts of the region > that were not yet populated. However, khugepaged collapses the pages and > the expected page faults do not occur. > > In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a > temporary mechanism for enabling/disabling THP process wide. > > Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is > tested when decision whether to use huge pages is taken either during page > fault of at the time of THP collapse. > > It should be noted, that the new implementation makes PR_SET_THP_DISABLE > master override to any per-VMA setting, which was not the case previously. > > Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE") "Fixes" is a bit strong. I'd say "alters". And significantly altering the runtime behaviour of a three-year-old interface is rather a worry, no? Perhaps we should be adding new prctl modes to select this new behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?
Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active
On Fri, 2 Jun 2017 18:03:22 +0300 "Mike Rapoport" wrote: > PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any > existing mapping because it only updated mm->def_flags which is a template > for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have > VM_NOHUGEPAGE flag set. This can be quite surprising for all those > applications which do not do prctl(); fork() & exec() and want to control > their own THP behavior. > > Another usecase when the immediate semantic of the prctl might be useful is > a combination of pre- and post-copy migration of containers with CRIU. In > this case CRIU populates a part of a memory region with data that was saved > during the pre-copy stage. Afterwards, the region is registered with > userfaultfd and CRIU expects to get page faults for the parts of the region > that were not yet populated. However, khugepaged collapses the pages and > the expected page faults do not occur. > > In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a > temporary mechanism for enabling/disabling THP process wide. > > Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is > tested when decision whether to use huge pages is taken either during page > fault of at the time of THP collapse. > > It should be noted, that the new implementation makes PR_SET_THP_DISABLE > master override to any per-VMA setting, which was not the case previously. > > Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE") "Fixes" is a bit strong. I'd say "alters". And significantly altering the runtime behaviour of a three-year-old interface is rather a worry, no? Perhaps we should be adding new prctl modes to select this new behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?