Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-05 Thread Vlastimil Babka
On 06/02/2017 11:10 PM, Andrew Morton wrote:
> On Fri, 2 Jun 2017 22:55:12 +0200 Vlastimil Babka  wrote:
> 
>> On 06/02/2017 10:40 PM, Andrew Morton wrote:
>>> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:
> Perhaps we should be adding new prctl modes to select this new
> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?

 I think we can reasonably assume that most users of the prctl do just
 the fork() & exec() thing, so they will be unaffected.
>>>
>>> That sounds optimistic.  Perhaps people are using the current behaviour
>>> to set on particular mapping to MMF_DISABLE_THP, with
>>>
>>> prctl(PR_SET_THP_DISABLE)
>>> mmap()
>>> prctl(PR_CLR_THP_DISABLE)
>>>
>>> ?
>>>
>>> Seems a reasonable thing to do.
>>
>> Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
>> effect. And it's older (2.6.38).
>>
>>> But who knows - people do all sorts of
>>> inventive things.
>>
>> Yeah :( but we can hope they don't even know that the prctl currently
>> behaves they way it does - man page doesn't suggest it would, and most
>> of us in this thread found it surprising.
> 
> Well.  There might be such people and sometimes we do make people
> unhappy.  it partly depends on how traumatic it would be to leave the
> current behaviour as-is.  Have you evaluated such a patch?

You mean introducing a new prctl instead of changing the existing one? I
can evaluate that as being ugly :)
Well, maybe we could use arg3, because currently we have:
case PR_SET_THP_DISABLE:
if (arg3 || arg4 || arg5)
return -EINVAL;

We could make non-zero arg3 (or specific value of arg3) set the new
"immediate" behavior. This would also take care of the discovery of
kernels that support the fixed/altered behavior, without having to check
uname etc - just check if we got -EINVAL.

I'm just not sure how to implement PR_GET_THP_DISABLE properly in such
scenario. Or what happens when somebody calls SET with arg3==0 and then
arg3==1 (or vice versa). But we would have to think about it even when
we introduced a newly named option. Reminds me of the MLOCK_ONFAULT
discussions...

 And as usual, if
 somebody does complain in the end, we revert and try the other way?
>>>
>>> But by then it's too late - the new behaviour will be out in the field.
>>
>> Revert in stable then?
>> But I don't think this patch should go to stable. I understand right
>> that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
>> prctl change/new madvise anymore?
> 
> What I mean is that the new behaviour will go out in 4.12 and it may
> be many months before we find out that we broke someone.  By then, we
> can't go back because others may be assuming the new behaviour.
> 



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-05 Thread Vlastimil Babka
On 06/02/2017 11:10 PM, Andrew Morton wrote:
> On Fri, 2 Jun 2017 22:55:12 +0200 Vlastimil Babka  wrote:
> 
>> On 06/02/2017 10:40 PM, Andrew Morton wrote:
>>> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:
> Perhaps we should be adding new prctl modes to select this new
> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?

 I think we can reasonably assume that most users of the prctl do just
 the fork() & exec() thing, so they will be unaffected.
>>>
>>> That sounds optimistic.  Perhaps people are using the current behaviour
>>> to set on particular mapping to MMF_DISABLE_THP, with
>>>
>>> prctl(PR_SET_THP_DISABLE)
>>> mmap()
>>> prctl(PR_CLR_THP_DISABLE)
>>>
>>> ?
>>>
>>> Seems a reasonable thing to do.
>>
>> Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
>> effect. And it's older (2.6.38).
>>
>>> But who knows - people do all sorts of
>>> inventive things.
>>
>> Yeah :( but we can hope they don't even know that the prctl currently
>> behaves they way it does - man page doesn't suggest it would, and most
>> of us in this thread found it surprising.
> 
> Well.  There might be such people and sometimes we do make people
> unhappy.  it partly depends on how traumatic it would be to leave the
> current behaviour as-is.  Have you evaluated such a patch?

You mean introducing a new prctl instead of changing the existing one? I
can evaluate that as being ugly :)
Well, maybe we could use arg3, because currently we have:
case PR_SET_THP_DISABLE:
if (arg3 || arg4 || arg5)
return -EINVAL;

We could make non-zero arg3 (or specific value of arg3) set the new
"immediate" behavior. This would also take care of the discovery of
kernels that support the fixed/altered behavior, without having to check
uname etc - just check if we got -EINVAL.

I'm just not sure how to implement PR_GET_THP_DISABLE properly in such
scenario. Or what happens when somebody calls SET with arg3==0 and then
arg3==1 (or vice versa). But we would have to think about it even when
we introduced a newly named option. Reminds me of the MLOCK_ONFAULT
discussions...

 And as usual, if
 somebody does complain in the end, we revert and try the other way?
>>>
>>> But by then it's too late - the new behaviour will be out in the field.
>>
>> Revert in stable then?
>> But I don't think this patch should go to stable. I understand right
>> that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
>> prctl change/new madvise anymore?
> 
> What I mean is that the new behaviour will go out in 4.12 and it may
> be many months before we find out that we broke someone.  By then, we
> can't go back because others may be assuming the new behaviour.
> 



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-05 Thread Mike Rapoport
On Sat, Jun 03, 2017 at 01:34:52PM +0300, Mike Rapoprt wrote:
> 
> 
> On June 2, 2017 11:55:12 PM GMT+03:00, Vlastimil Babka  wrote:
> >On 06/02/2017 10:40 PM, Andrew Morton wrote:
> >> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka 
> >wrote:
>  Perhaps we should be adding new prctl modes to select this new
>  behaviour and leave the existing PR_SET_THP_DISABLE behaviour
> >as-is?
> >>>
> >>> I think we can reasonably assume that most users of the prctl do
> >just
> >>> the fork() & exec() thing, so they will be unaffected.
> >> 
> >> That sounds optimistic.  Perhaps people are using the current
> >behaviour
> >> to set on particular mapping to MMF_DISABLE_THP, with
> >> 
> >>prctl(PR_SET_THP_DISABLE)
> >>mmap()
> >>prctl(PR_CLR_THP_DISABLE)
> >> 
> >> ?
> >> 
> >> Seems a reasonable thing to do.
> >
> >Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
> >effect. And it's older (2.6.38).
> >
> >> But who knows - people do all sorts of
> >> inventive things.
> >
> >Yeah :( but we can hope they don't even know that the prctl currently
> >behaves they way it does - man page doesn't suggest it would, and most
> >of us in this thread found it surprising.
> >
> >>> And as usual, if
> >>> somebody does complain in the end, we revert and try the other way?
> >> 
> >> But by then it's too late - the new behaviour will be out in the
> >field.
> >
> >Revert in stable then?
> >But I don't think this patch should go to stable. I understand right
> >that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
> >prctl change/new madvise anymore?
> 
> Yes, we are going to use UFFDIO_COPY. We still might want to have control
> over THP in the future without changing per-VMA flags, though.

Unfortunately, I was over optimistic about ability of CRIU to use
UFFDIO_COPY for pre-copy part :(
I was too concentrated on the simplified flow and overlooked some important
details. After I've spent some time trying to actually implement usage of
UFFDIO_COPY, I realized that registering memory with userfault at that
point of the restore flow quite contradicts CRIU architecture :(

That said, we would really want to have the interface this patch proposes.
 
-- 
Sincerely yours,
Mike.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-05 Thread Mike Rapoport
On Sat, Jun 03, 2017 at 01:34:52PM +0300, Mike Rapoprt wrote:
> 
> 
> On June 2, 2017 11:55:12 PM GMT+03:00, Vlastimil Babka  wrote:
> >On 06/02/2017 10:40 PM, Andrew Morton wrote:
> >> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka 
> >wrote:
>  Perhaps we should be adding new prctl modes to select this new
>  behaviour and leave the existing PR_SET_THP_DISABLE behaviour
> >as-is?
> >>>
> >>> I think we can reasonably assume that most users of the prctl do
> >just
> >>> the fork() & exec() thing, so they will be unaffected.
> >> 
> >> That sounds optimistic.  Perhaps people are using the current
> >behaviour
> >> to set on particular mapping to MMF_DISABLE_THP, with
> >> 
> >>prctl(PR_SET_THP_DISABLE)
> >>mmap()
> >>prctl(PR_CLR_THP_DISABLE)
> >> 
> >> ?
> >> 
> >> Seems a reasonable thing to do.
> >
> >Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
> >effect. And it's older (2.6.38).
> >
> >> But who knows - people do all sorts of
> >> inventive things.
> >
> >Yeah :( but we can hope they don't even know that the prctl currently
> >behaves they way it does - man page doesn't suggest it would, and most
> >of us in this thread found it surprising.
> >
> >>> And as usual, if
> >>> somebody does complain in the end, we revert and try the other way?
> >> 
> >> But by then it's too late - the new behaviour will be out in the
> >field.
> >
> >Revert in stable then?
> >But I don't think this patch should go to stable. I understand right
> >that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
> >prctl change/new madvise anymore?
> 
> Yes, we are going to use UFFDIO_COPY. We still might want to have control
> over THP in the future without changing per-VMA flags, though.

Unfortunately, I was over optimistic about ability of CRIU to use
UFFDIO_COPY for pre-copy part :(
I was too concentrated on the simplified flow and overlooked some important
details. After I've spent some time trying to actually implement usage of
UFFDIO_COPY, I realized that registering memory with userfault at that
point of the restore flow quite contradicts CRIU architecture :(

That said, we would really want to have the interface this patch proposes.
 
-- 
Sincerely yours,
Mike.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-03 Thread Mike Rapoprt


On June 2, 2017 10:50:59 PM GMT+03:00, Andrew Morton 
 wrote:
>On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport"
> wrote:
>
>> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect
>any
>> existing mapping because it only updated mm->def_flags which is a
>template
>> for new mappings. The mappings created after
>prctl(PR_SET_THP_DISABLE) have
>> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
>> applications which do not do prctl(); fork() & exec() and want to
>control
>> their own THP behavior.
>> 
>> Another usecase when the immediate semantic of the prctl might be
>useful is
>> a combination of pre- and post-copy migration of containers with
>CRIU.  In
>> this case CRIU populates a part of a memory region with data that was
>saved
>> during the pre-copy stage. Afterwards, the region is registered with
>> userfaultfd and CRIU expects to get page faults for the parts of the
>region
>> that were not yet populated. However, khugepaged collapses the pages
>and
>> the expected page faults do not occur.
>> 
>> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as
>a
>> temporary mechanism for enabling/disabling THP process wide.
>> 
>> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag
>is
>> tested when decision whether to use huge pages is taken either during
>page
>> fault of at the time of THP collapse.
>> 
>> It should be noted, that the new implementation makes
>PR_SET_THP_DISABLE
>> master override to any per-VMA setting, which was not the case
>previously.
>>
>> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and
>PRCTL_THP_DISABLE")
>
>"Fixes" is a bit strong.  I'd say "alters".  And significantly altering
>the runtime behaviour of a three-year-old interface is rather a worry,
>no?

Well, there are people that consider current behavior as bug :)
One can argue we alter the implementation​details and users should not rely on 
that...

>Perhaps we should be adding new prctl modes to select this new
>behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?



-- 
Sincerely yours,
Mike.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-03 Thread Mike Rapoprt


On June 2, 2017 10:50:59 PM GMT+03:00, Andrew Morton 
 wrote:
>On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport"
> wrote:
>
>> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect
>any
>> existing mapping because it only updated mm->def_flags which is a
>template
>> for new mappings. The mappings created after
>prctl(PR_SET_THP_DISABLE) have
>> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
>> applications which do not do prctl(); fork() & exec() and want to
>control
>> their own THP behavior.
>> 
>> Another usecase when the immediate semantic of the prctl might be
>useful is
>> a combination of pre- and post-copy migration of containers with
>CRIU.  In
>> this case CRIU populates a part of a memory region with data that was
>saved
>> during the pre-copy stage. Afterwards, the region is registered with
>> userfaultfd and CRIU expects to get page faults for the parts of the
>region
>> that were not yet populated. However, khugepaged collapses the pages
>and
>> the expected page faults do not occur.
>> 
>> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as
>a
>> temporary mechanism for enabling/disabling THP process wide.
>> 
>> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag
>is
>> tested when decision whether to use huge pages is taken either during
>page
>> fault of at the time of THP collapse.
>> 
>> It should be noted, that the new implementation makes
>PR_SET_THP_DISABLE
>> master override to any per-VMA setting, which was not the case
>previously.
>>
>> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and
>PRCTL_THP_DISABLE")
>
>"Fixes" is a bit strong.  I'd say "alters".  And significantly altering
>the runtime behaviour of a three-year-old interface is rather a worry,
>no?

Well, there are people that consider current behavior as bug :)
One can argue we alter the implementation​details and users should not rely on 
that...

>Perhaps we should be adding new prctl modes to select this new
>behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?



-- 
Sincerely yours,
Mike.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-03 Thread Mike Rapoprt


On June 2, 2017 11:55:12 PM GMT+03:00, Vlastimil Babka  wrote:
>On 06/02/2017 10:40 PM, Andrew Morton wrote:
>> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka 
>wrote:
 Perhaps we should be adding new prctl modes to select this new
 behaviour and leave the existing PR_SET_THP_DISABLE behaviour
>as-is?
>>>
>>> I think we can reasonably assume that most users of the prctl do
>just
>>> the fork() & exec() thing, so they will be unaffected.
>> 
>> That sounds optimistic.  Perhaps people are using the current
>behaviour
>> to set on particular mapping to MMF_DISABLE_THP, with
>> 
>>  prctl(PR_SET_THP_DISABLE)
>>  mmap()
>>  prctl(PR_CLR_THP_DISABLE)
>> 
>> ?
>> 
>> Seems a reasonable thing to do.
>
>Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
>effect. And it's older (2.6.38).
>
>> But who knows - people do all sorts of
>> inventive things.
>
>Yeah :( but we can hope they don't even know that the prctl currently
>behaves they way it does - man page doesn't suggest it would, and most
>of us in this thread found it surprising.
>
>>> And as usual, if
>>> somebody does complain in the end, we revert and try the other way?
>> 
>> But by then it's too late - the new behaviour will be out in the
>field.
>
>Revert in stable then?
>But I don't think this patch should go to stable. I understand right
>that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
>prctl change/new madvise anymore?

Yes, we are going to use UFFDIO_COPY. We still might want to have control over 
THP in the future without changing per-VMA flags, though.

-- 
Sincerely yours,
Mike.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-03 Thread Mike Rapoprt


On June 2, 2017 11:55:12 PM GMT+03:00, Vlastimil Babka  wrote:
>On 06/02/2017 10:40 PM, Andrew Morton wrote:
>> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka 
>wrote:
 Perhaps we should be adding new prctl modes to select this new
 behaviour and leave the existing PR_SET_THP_DISABLE behaviour
>as-is?
>>>
>>> I think we can reasonably assume that most users of the prctl do
>just
>>> the fork() & exec() thing, so they will be unaffected.
>> 
>> That sounds optimistic.  Perhaps people are using the current
>behaviour
>> to set on particular mapping to MMF_DISABLE_THP, with
>> 
>>  prctl(PR_SET_THP_DISABLE)
>>  mmap()
>>  prctl(PR_CLR_THP_DISABLE)
>> 
>> ?
>> 
>> Seems a reasonable thing to do.
>
>Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
>effect. And it's older (2.6.38).
>
>> But who knows - people do all sorts of
>> inventive things.
>
>Yeah :( but we can hope they don't even know that the prctl currently
>behaves they way it does - man page doesn't suggest it would, and most
>of us in this thread found it surprising.
>
>>> And as usual, if
>>> somebody does complain in the end, we revert and try the other way?
>> 
>> But by then it's too late - the new behaviour will be out in the
>field.
>
>Revert in stable then?
>But I don't think this patch should go to stable. I understand right
>that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
>prctl change/new madvise anymore?

Yes, we are going to use UFFDIO_COPY. We still might want to have control over 
THP in the future without changing per-VMA flags, though.

-- 
Sincerely yours,
Mike.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-03 Thread Michal Hocko
On Fri 02-06-17 13:40:38, Andrew Morton wrote:
> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:
> 
> > On 06/02/2017 09:50 PM, Andrew Morton wrote:
> > > On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport" 
> > >  wrote:
> > > 
> > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
> > >> existing mapping because it only updated mm->def_flags which is a 
> > >> template
> > >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) 
> > >> have
> > >> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
> > >> applications which do not do prctl(); fork() & exec() and want to control
> > >> their own THP behavior.
> > >>
> > >> Another usecase when the immediate semantic of the prctl might be useful 
> > >> is
> > >> a combination of pre- and post-copy migration of containers with CRIU.  
> > >> In
> > >> this case CRIU populates a part of a memory region with data that was 
> > >> saved
> > >> during the pre-copy stage. Afterwards, the region is registered with
> > >> userfaultfd and CRIU expects to get page faults for the parts of the 
> > >> region
> > >> that were not yet populated. However, khugepaged collapses the pages and
> > >> the expected page faults do not occur.
> > >>
> > >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
> > >> temporary mechanism for enabling/disabling THP process wide.
> > >>
> > >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
> > >> tested when decision whether to use huge pages is taken either during 
> > >> page
> > >> fault of at the time of THP collapse.
> > >>
> > >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE
> > >> master override to any per-VMA setting, which was not the case 
> > >> previously.
> > >>
> > >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and 
> > >> PRCTL_THP_DISABLE")
> > > 
> > > "Fixes" is a bit strong.  I'd say "alters".  And significantly altering
> > > the runtime behaviour of a three-year-old interface is rather a worry,
> > > no?
> > > 
> > > Perhaps we should be adding new prctl modes to select this new
> > > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?
> > 
> > I think we can reasonably assume that most users of the prctl do just
> > the fork() & exec() thing, so they will be unaffected.
> 
> That sounds optimistic.  Perhaps people are using the current behaviour
> to set on particular mapping to MMF_DISABLE_THP, with
> 
>   prctl(PR_SET_THP_DISABLE)
>   mmap()
>   prctl(PR_CLR_THP_DISABLE)
> 
> ?
> 
> Seems a reasonable thing to do.

Is it? The documentation is not very specific but it is clear about the
scope being thread (I would argue process would be more approapriate
but whatever) "Set the state of the "THP disable" flag for the calling
thread." So the above seems like an incorrect usage to me.

> But who knows - people do all sorts of inventive things.

well yes.

> > And as usual, if
> > somebody does complain in the end, we revert and try the other way?
> 
> But by then it's too late - the new behaviour will be out in the field.

Well, the interface is currently broken for anything other than prctl
& exec. And those will work properly even with the patch. So I am not
really sure whether keeping the current status quo is reasonable.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-03 Thread Michal Hocko
On Fri 02-06-17 13:40:38, Andrew Morton wrote:
> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:
> 
> > On 06/02/2017 09:50 PM, Andrew Morton wrote:
> > > On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport" 
> > >  wrote:
> > > 
> > >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
> > >> existing mapping because it only updated mm->def_flags which is a 
> > >> template
> > >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) 
> > >> have
> > >> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
> > >> applications which do not do prctl(); fork() & exec() and want to control
> > >> their own THP behavior.
> > >>
> > >> Another usecase when the immediate semantic of the prctl might be useful 
> > >> is
> > >> a combination of pre- and post-copy migration of containers with CRIU.  
> > >> In
> > >> this case CRIU populates a part of a memory region with data that was 
> > >> saved
> > >> during the pre-copy stage. Afterwards, the region is registered with
> > >> userfaultfd and CRIU expects to get page faults for the parts of the 
> > >> region
> > >> that were not yet populated. However, khugepaged collapses the pages and
> > >> the expected page faults do not occur.
> > >>
> > >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
> > >> temporary mechanism for enabling/disabling THP process wide.
> > >>
> > >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
> > >> tested when decision whether to use huge pages is taken either during 
> > >> page
> > >> fault of at the time of THP collapse.
> > >>
> > >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE
> > >> master override to any per-VMA setting, which was not the case 
> > >> previously.
> > >>
> > >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and 
> > >> PRCTL_THP_DISABLE")
> > > 
> > > "Fixes" is a bit strong.  I'd say "alters".  And significantly altering
> > > the runtime behaviour of a three-year-old interface is rather a worry,
> > > no?
> > > 
> > > Perhaps we should be adding new prctl modes to select this new
> > > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?
> > 
> > I think we can reasonably assume that most users of the prctl do just
> > the fork() & exec() thing, so they will be unaffected.
> 
> That sounds optimistic.  Perhaps people are using the current behaviour
> to set on particular mapping to MMF_DISABLE_THP, with
> 
>   prctl(PR_SET_THP_DISABLE)
>   mmap()
>   prctl(PR_CLR_THP_DISABLE)
> 
> ?
> 
> Seems a reasonable thing to do.

Is it? The documentation is not very specific but it is clear about the
scope being thread (I would argue process would be more approapriate
but whatever) "Set the state of the "THP disable" flag for the calling
thread." So the above seems like an incorrect usage to me.

> But who knows - people do all sorts of inventive things.

well yes.

> > And as usual, if
> > somebody does complain in the end, we revert and try the other way?
> 
> But by then it's too late - the new behaviour will be out in the field.

Well, the interface is currently broken for anything other than prctl
& exec. And those will work properly even with the patch. So I am not
really sure whether keeping the current status quo is reasonable.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Andrew Morton
On Fri, 2 Jun 2017 22:55:12 +0200 Vlastimil Babka  wrote:

> On 06/02/2017 10:40 PM, Andrew Morton wrote:
> > On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:
> >>> Perhaps we should be adding new prctl modes to select this new
> >>> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?
> >>
> >> I think we can reasonably assume that most users of the prctl do just
> >> the fork() & exec() thing, so they will be unaffected.
> > 
> > That sounds optimistic.  Perhaps people are using the current behaviour
> > to set on particular mapping to MMF_DISABLE_THP, with
> > 
> > prctl(PR_SET_THP_DISABLE)
> > mmap()
> > prctl(PR_CLR_THP_DISABLE)
> > 
> > ?
> > 
> > Seems a reasonable thing to do.
> 
> Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
> effect. And it's older (2.6.38).
> 
> > But who knows - people do all sorts of
> > inventive things.
> 
> Yeah :( but we can hope they don't even know that the prctl currently
> behaves they way it does - man page doesn't suggest it would, and most
> of us in this thread found it surprising.

Well.  There might be such people and sometimes we do make people
unhappy.  it partly depends on how traumatic it would be to leave the
current behaviour as-is.  Have you evaluated such a patch?


> >> And as usual, if
> >> somebody does complain in the end, we revert and try the other way?
> > 
> > But by then it's too late - the new behaviour will be out in the field.
> 
> Revert in stable then?
> But I don't think this patch should go to stable. I understand right
> that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
> prctl change/new madvise anymore?

What I mean is that the new behaviour will go out in 4.12 and it may
be many months before we find out that we broke someone.  By then, we
can't go back because others may be assuming the new behaviour.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Andrew Morton
On Fri, 2 Jun 2017 22:55:12 +0200 Vlastimil Babka  wrote:

> On 06/02/2017 10:40 PM, Andrew Morton wrote:
> > On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:
> >>> Perhaps we should be adding new prctl modes to select this new
> >>> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?
> >>
> >> I think we can reasonably assume that most users of the prctl do just
> >> the fork() & exec() thing, so they will be unaffected.
> > 
> > That sounds optimistic.  Perhaps people are using the current behaviour
> > to set on particular mapping to MMF_DISABLE_THP, with
> > 
> > prctl(PR_SET_THP_DISABLE)
> > mmap()
> > prctl(PR_CLR_THP_DISABLE)
> > 
> > ?
> > 
> > Seems a reasonable thing to do.
> 
> Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
> effect. And it's older (2.6.38).
> 
> > But who knows - people do all sorts of
> > inventive things.
> 
> Yeah :( but we can hope they don't even know that the prctl currently
> behaves they way it does - man page doesn't suggest it would, and most
> of us in this thread found it surprising.

Well.  There might be such people and sometimes we do make people
unhappy.  it partly depends on how traumatic it would be to leave the
current behaviour as-is.  Have you evaluated such a patch?


> >> And as usual, if
> >> somebody does complain in the end, we revert and try the other way?
> > 
> > But by then it's too late - the new behaviour will be out in the field.
> 
> Revert in stable then?
> But I don't think this patch should go to stable. I understand right
> that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
> prctl change/new madvise anymore?

What I mean is that the new behaviour will go out in 4.12 and it may
be many months before we find out that we broke someone.  By then, we
can't go back because others may be assuming the new behaviour.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Vlastimil Babka
On 06/02/2017 10:40 PM, Andrew Morton wrote:
> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:
>>> Perhaps we should be adding new prctl modes to select this new
>>> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?
>>
>> I think we can reasonably assume that most users of the prctl do just
>> the fork() & exec() thing, so they will be unaffected.
> 
> That sounds optimistic.  Perhaps people are using the current behaviour
> to set on particular mapping to MMF_DISABLE_THP, with
> 
>   prctl(PR_SET_THP_DISABLE)
>   mmap()
>   prctl(PR_CLR_THP_DISABLE)
> 
> ?
> 
> Seems a reasonable thing to do.

Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
effect. And it's older (2.6.38).

> But who knows - people do all sorts of
> inventive things.

Yeah :( but we can hope they don't even know that the prctl currently
behaves they way it does - man page doesn't suggest it would, and most
of us in this thread found it surprising.

>> And as usual, if
>> somebody does complain in the end, we revert and try the other way?
> 
> But by then it's too late - the new behaviour will be out in the field.

Revert in stable then?
But I don't think this patch should go to stable. I understand right
that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
prctl change/new madvise anymore?


Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Vlastimil Babka
On 06/02/2017 10:40 PM, Andrew Morton wrote:
> On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:
>>> Perhaps we should be adding new prctl modes to select this new
>>> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?
>>
>> I think we can reasonably assume that most users of the prctl do just
>> the fork() & exec() thing, so they will be unaffected.
> 
> That sounds optimistic.  Perhaps people are using the current behaviour
> to set on particular mapping to MMF_DISABLE_THP, with
> 
>   prctl(PR_SET_THP_DISABLE)
>   mmap()
>   prctl(PR_CLR_THP_DISABLE)
> 
> ?
> 
> Seems a reasonable thing to do.

Using madvise(MADV_NOHUGEPAGE) seems reasonabler to me, with the same
effect. And it's older (2.6.38).

> But who knows - people do all sorts of
> inventive things.

Yeah :( but we can hope they don't even know that the prctl currently
behaves they way it does - man page doesn't suggest it would, and most
of us in this thread found it surprising.

>> And as usual, if
>> somebody does complain in the end, we revert and try the other way?
> 
> But by then it's too late - the new behaviour will be out in the field.

Revert in stable then?
But I don't think this patch should go to stable. I understand right
that CRIU will switch to the UFFDIO_COPY approach and doesn't need the
prctl change/new madvise anymore?


Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Andrew Morton
On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:

> On 06/02/2017 09:50 PM, Andrew Morton wrote:
> > On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport" 
> >  wrote:
> > 
> >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
> >> existing mapping because it only updated mm->def_flags which is a template
> >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have
> >> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
> >> applications which do not do prctl(); fork() & exec() and want to control
> >> their own THP behavior.
> >>
> >> Another usecase when the immediate semantic of the prctl might be useful is
> >> a combination of pre- and post-copy migration of containers with CRIU.  In
> >> this case CRIU populates a part of a memory region with data that was saved
> >> during the pre-copy stage. Afterwards, the region is registered with
> >> userfaultfd and CRIU expects to get page faults for the parts of the region
> >> that were not yet populated. However, khugepaged collapses the pages and
> >> the expected page faults do not occur.
> >>
> >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
> >> temporary mechanism for enabling/disabling THP process wide.
> >>
> >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
> >> tested when decision whether to use huge pages is taken either during page
> >> fault of at the time of THP collapse.
> >>
> >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE
> >> master override to any per-VMA setting, which was not the case previously.
> >>
> >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")
> > 
> > "Fixes" is a bit strong.  I'd say "alters".  And significantly altering
> > the runtime behaviour of a three-year-old interface is rather a worry,
> > no?
> > 
> > Perhaps we should be adding new prctl modes to select this new
> > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?
> 
> I think we can reasonably assume that most users of the prctl do just
> the fork() & exec() thing, so they will be unaffected.

That sounds optimistic.  Perhaps people are using the current behaviour
to set on particular mapping to MMF_DISABLE_THP, with

prctl(PR_SET_THP_DISABLE)
mmap()
prctl(PR_CLR_THP_DISABLE)

?

Seems a reasonable thing to do.  But who knows - people do all sorts of
inventive things.

> And as usual, if
> somebody does complain in the end, we revert and try the other way?

But by then it's too late - the new behaviour will be out in the field.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Andrew Morton
On Fri, 2 Jun 2017 22:31:47 +0200 Vlastimil Babka  wrote:

> On 06/02/2017 09:50 PM, Andrew Morton wrote:
> > On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport" 
> >  wrote:
> > 
> >> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
> >> existing mapping because it only updated mm->def_flags which is a template
> >> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have
> >> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
> >> applications which do not do prctl(); fork() & exec() and want to control
> >> their own THP behavior.
> >>
> >> Another usecase when the immediate semantic of the prctl might be useful is
> >> a combination of pre- and post-copy migration of containers with CRIU.  In
> >> this case CRIU populates a part of a memory region with data that was saved
> >> during the pre-copy stage. Afterwards, the region is registered with
> >> userfaultfd and CRIU expects to get page faults for the parts of the region
> >> that were not yet populated. However, khugepaged collapses the pages and
> >> the expected page faults do not occur.
> >>
> >> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
> >> temporary mechanism for enabling/disabling THP process wide.
> >>
> >> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
> >> tested when decision whether to use huge pages is taken either during page
> >> fault of at the time of THP collapse.
> >>
> >> It should be noted, that the new implementation makes PR_SET_THP_DISABLE
> >> master override to any per-VMA setting, which was not the case previously.
> >>
> >> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")
> > 
> > "Fixes" is a bit strong.  I'd say "alters".  And significantly altering
> > the runtime behaviour of a three-year-old interface is rather a worry,
> > no?
> > 
> > Perhaps we should be adding new prctl modes to select this new
> > behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?
> 
> I think we can reasonably assume that most users of the prctl do just
> the fork() & exec() thing, so they will be unaffected.

That sounds optimistic.  Perhaps people are using the current behaviour
to set on particular mapping to MMF_DISABLE_THP, with

prctl(PR_SET_THP_DISABLE)
mmap()
prctl(PR_CLR_THP_DISABLE)

?

Seems a reasonable thing to do.  But who knows - people do all sorts of
inventive things.

> And as usual, if
> somebody does complain in the end, we revert and try the other way?

But by then it's too late - the new behaviour will be out in the field.



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Vlastimil Babka
On 06/02/2017 09:50 PM, Andrew Morton wrote:
> On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport"  
> wrote:
> 
>> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
>> existing mapping because it only updated mm->def_flags which is a template
>> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have
>> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
>> applications which do not do prctl(); fork() & exec() and want to control
>> their own THP behavior.
>>
>> Another usecase when the immediate semantic of the prctl might be useful is
>> a combination of pre- and post-copy migration of containers with CRIU.  In
>> this case CRIU populates a part of a memory region with data that was saved
>> during the pre-copy stage. Afterwards, the region is registered with
>> userfaultfd and CRIU expects to get page faults for the parts of the region
>> that were not yet populated. However, khugepaged collapses the pages and
>> the expected page faults do not occur.
>>
>> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
>> temporary mechanism for enabling/disabling THP process wide.
>>
>> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
>> tested when decision whether to use huge pages is taken either during page
>> fault of at the time of THP collapse.
>>
>> It should be noted, that the new implementation makes PR_SET_THP_DISABLE
>> master override to any per-VMA setting, which was not the case previously.
>>
>> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")
> 
> "Fixes" is a bit strong.  I'd say "alters".  And significantly altering
> the runtime behaviour of a three-year-old interface is rather a worry,
> no?
> 
> Perhaps we should be adding new prctl modes to select this new
> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?

I think we can reasonably assume that most users of the prctl do just
the fork() & exec() thing, so they will be unaffected. And as usual, if
somebody does complain in the end, we revert and try the other way?


Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Vlastimil Babka
On 06/02/2017 09:50 PM, Andrew Morton wrote:
> On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport"  
> wrote:
> 
>> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
>> existing mapping because it only updated mm->def_flags which is a template
>> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have
>> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
>> applications which do not do prctl(); fork() & exec() and want to control
>> their own THP behavior.
>>
>> Another usecase when the immediate semantic of the prctl might be useful is
>> a combination of pre- and post-copy migration of containers with CRIU.  In
>> this case CRIU populates a part of a memory region with data that was saved
>> during the pre-copy stage. Afterwards, the region is registered with
>> userfaultfd and CRIU expects to get page faults for the parts of the region
>> that were not yet populated. However, khugepaged collapses the pages and
>> the expected page faults do not occur.
>>
>> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
>> temporary mechanism for enabling/disabling THP process wide.
>>
>> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
>> tested when decision whether to use huge pages is taken either during page
>> fault of at the time of THP collapse.
>>
>> It should be noted, that the new implementation makes PR_SET_THP_DISABLE
>> master override to any per-VMA setting, which was not the case previously.
>>
>> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")
> 
> "Fixes" is a bit strong.  I'd say "alters".  And significantly altering
> the runtime behaviour of a three-year-old interface is rather a worry,
> no?
> 
> Perhaps we should be adding new prctl modes to select this new
> behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?

I think we can reasonably assume that most users of the prctl do just
the fork() & exec() thing, so they will be unaffected. And as usual, if
somebody does complain in the end, we revert and try the other way?


Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Andrew Morton
On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport"  
wrote:

> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
> existing mapping because it only updated mm->def_flags which is a template
> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have
> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
> applications which do not do prctl(); fork() & exec() and want to control
> their own THP behavior.
> 
> Another usecase when the immediate semantic of the prctl might be useful is
> a combination of pre- and post-copy migration of containers with CRIU.  In
> this case CRIU populates a part of a memory region with data that was saved
> during the pre-copy stage. Afterwards, the region is registered with
> userfaultfd and CRIU expects to get page faults for the parts of the region
> that were not yet populated. However, khugepaged collapses the pages and
> the expected page faults do not occur.
> 
> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
> temporary mechanism for enabling/disabling THP process wide.
> 
> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
> tested when decision whether to use huge pages is taken either during page
> fault of at the time of THP collapse.
> 
> It should be noted, that the new implementation makes PR_SET_THP_DISABLE
> master override to any per-VMA setting, which was not the case previously.
>
> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")

"Fixes" is a bit strong.  I'd say "alters".  And significantly altering
the runtime behaviour of a three-year-old interface is rather a worry,
no?

Perhaps we should be adding new prctl modes to select this new
behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?



Re: [PATCH] mm: make PR_SET_THP_DISABLE immediately active

2017-06-02 Thread Andrew Morton
On Fri,  2 Jun 2017 18:03:22 +0300 "Mike Rapoport"  
wrote:

> PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
> existing mapping because it only updated mm->def_flags which is a template
> for new mappings. The mappings created after prctl(PR_SET_THP_DISABLE) have
> VM_NOHUGEPAGE flag set.  This can be quite surprising for all those
> applications which do not do prctl(); fork() & exec() and want to control
> their own THP behavior.
> 
> Another usecase when the immediate semantic of the prctl might be useful is
> a combination of pre- and post-copy migration of containers with CRIU.  In
> this case CRIU populates a part of a memory region with data that was saved
> during the pre-copy stage. Afterwards, the region is registered with
> userfaultfd and CRIU expects to get page faults for the parts of the region
> that were not yet populated. However, khugepaged collapses the pages and
> the expected page faults do not occur.
> 
> In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
> temporary mechanism for enabling/disabling THP process wide.
> 
> Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
> tested when decision whether to use huge pages is taken either during page
> fault of at the time of THP collapse.
> 
> It should be noted, that the new implementation makes PR_SET_THP_DISABLE
> master override to any per-VMA setting, which was not the case previously.
>
> Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")

"Fixes" is a bit strong.  I'd say "alters".  And significantly altering
the runtime behaviour of a three-year-old interface is rather a worry,
no?

Perhaps we should be adding new prctl modes to select this new
behaviour and leave the existing PR_SET_THP_DISABLE behaviour as-is?